From Tracking Submarines in the East China Sea to Tracking Containers in the Cloud

The title of this article might seem a bit confusing since, at first glance, Docker containers and submarines do not appear to have much in common. However, take a closer look and you’ll realize otherwise. My past experience has convinced me that there is actually quite a bit of overlap between the two. Keep reading to find out what I mean…

I joined the Tutum team 18 months ago when Docker and the surrounding container environment were still in the foundational stages. Prior to joining Tutum, I worked for an international defense contractor. My responsibility was to create and adapt real-time monitoring systems of radars for submarines and battleships (exciting job, isn’t it?). As you can imagine, the software we built was mission-critical, often playing a key part in protecting the crew on board. In such systems, shaving off a millisecond in response time to new events is the difference between success and failure, with failure potentially resulting in hundreds of casualties. Tracking the position of boats, ships and natural occurrences, and calculating the criticalness of events in real time is very closely related to tracking and managing the events and states of containers, services, and nodes and their impact on an application.



Tutum container event system

My previous background helps us design a container event manager to ensure Tutum always has its database up-to-date with a container’s latest status. For example, if a container stops because its main process exited, we mark that container as STOPPED and set the corresponding exit code. Depending on configuration, this may be intended or unintended behavior. It’s Tutum’s responsibility to provide all the information to troubleshoot this issue if the expected behavior doesn’t occur. To do this, the user has plenty of data available, including the exact exit code for each and every STOPPED container, container logs, monitoring data, and API log.

To guarantee Tutum has always-up-to-date information, we have developed a custom two-way communication system: from the node to Tutum, and from Tutum to the node. This double check system helps us verify that Tutum has the real container state in our database, eliminating potential discrepancies with outdated states.

For the communication between the node and Tutum, we rely on a system container. Tutum launches this system container in each and every node. This container listens to all container events using docker events and sends the data back to Tutum. When possible, this container also performs a container inspect to share additional information about the container with Tutum.

Docker events report the create, destroy, die, export, kill, oom, pause, restart, start, stop, unpause events for containers and untag, delete for images. Today, Tutum listens for the die and start container events, as well as the aforementioned container inspect. The additional information from the inspect provides Tutum with information that would otherwise be unavailable, such us the container exit code. All nice and simple, right? But what if I told you that the event timestamp that docker generates is not as accurate as one would expect?

Let me further explain with an example:

$ docker run -d tutum/hello-world foo
FATA[0000] Error response from daemon: Cannot start container 6af7a3fa037689d9c3bd9d9caca3789dad247668695df799aba48618d9db246f: exec: "foo": executable file not found in $PATH

The command above launches a container with a run command not located in the path, so the container is created, started and terminated with an error.

$ docker events

2015-03-14T20:03:27.000000000+01:00 6af7a3fa037689d9c3bd9d9caca3789dad247668695df799aba48618d9db246f: (from tutum/hello-world:latest) create

2015-03-14T20:03:27.000000000+01:00 6af7a3fa037689d9c3bd9d9caca3789dad247668695df799aba48618d9db246f: (from tutum/hello-world:latest) start

2015-03-14T20:03:27.000000000+01:00 6af7a3fa037689d9c3bd9d9caca3789dad247668695df799aba48618d9db246f: (from tutum/hello-world:latest) die

Look at the timestamps. The generated timestamp for all three container events is the same! :-O  The events clearly followed a specific order, but judging from the timestamps above, the order in which the events took place cannot be determined. This sort of misinformation, when dealing with submarines, would have resulted in very undesirable consequences. Since we at Tutum have many customers making use of our platform to run their mission-critical applications, we also cannot settle for this sub-optimal behavior. So we fixed it.

Before we pipe container events (and their timestamp) back to the Tutum app over the network, our system container modifies each event’s timestamp with an additional precision digit. This allows us to reconstruct the correct sequence of events even if they arrive disordered, which is often the case.

The icing on the cake to eliminate any potential misinformation about what’s happening in remote nodes and what is displayed in our dashboard is Tutum’s cross reference system. Periodically, Tutum executes a task in a user’s nodes to gather information on all containers and statuses. With the information received, a cross reference check is then performed to validate all data matches that are stored in the system. This is necessary because the network we rely on can fail and messages can get lost. Additionally, because Tutum gives users full access to their nodes, users are able to [accidentally] remove our system containers (including the container events that Tutum launches). Even if that were to happen, Tutum is still able to cross-reference its data and maintain an up-to-date list of containers and states.

Empowering our crew (aka our users)

This is just one of the many use cases where Tutum worries, so our users don’t have to. There are and there will be others. At Tutum, it’s our objective to make complex tasks as simple and seamless as possible. At the end of the day, it’s the little things that guarantee your mission-critical applications run without a hiccup.

I really enjoy the creative problem solving involved with my daily work, and love spending extra time to simplify and improve Tutum for you. Thanks for your ongoing support while we build the best container platform!

Tagged with: , , ,
Posted in General
3 comments on “From Tracking Submarines in the East China Sea to Tracking Containers in the Cloud
  1. […] Now build & testing becomes an orchestration problem for what Tutum has already built a good solution. Tutum will deploy the tutum/builder image using the “emptiest node” deployment strategy for Github webhook calls that match your build settings. If there are nodes tagged with builder, only those nodes will be used for build executions. When the build container stops, Tutum is notified via our events container, which notifies the Tutum API for every docker event (more about this topic here). […]

Leave a Comment

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: