Anton Lindstrom (about, @twitter, @github)

Introduction to Apache Mesos

Published:

Apache Mesos has been an interest of mine for quite some time and the potential of the Mesos project and the ecosystem is huge. The project has gotten a lot of attention from bigger companies such as Amazon, eBay and Netflix. Mesos has also been sponsored and used by Twitter for quite some time.

Introduction and concepts

Mesos is a cluster manager, handling workloads in a distributed environment. For instance, to increase resource allocation in clusters, Mesos provides dynamic allocation of resources which means that you won't have to allocate one machine for Hadoop, one for the web server and another one for a database. All the resources that exists on the machines in the cluster will be put in a single pool which everything you would like to run on it can draw from. The database, Hadoop jobs and web server may run on any of the machines that have resources available.

A machine in the cluster may have multiple processes running and to keep them from interfering with each other, Mesos provides isolators that keep processes from disturbing each other. Examples of isolators are CGroupsMemIsolator and NetworkIsolator. A disk qouta isolator was introduced in Mesos 0.22.0.

There are also containerizers that implements isolators to add a container around a process. The most prominent examples are Cgroups and Docker. Docker may be used to isolate and run tasks and this is a model that simplifies the operability of a large scale infrastructure that runs in Docker.

Mesos has several components and the three most important bits are the masters, slaves and frameworks. The masters work to coordinate and manage the slave daemons. It's the master that decides how many resources it has to offer to the framework. The master knows how much resources it can offer by the amount that the slaves report to have freely available.

A master is a singular entity that coordinates the work, there are however multiple masters in a high availability set up. Only one master should be leader at any time and Zookeeper is being used for leader elections.

Mesos architecture

In the diagram above, Framework A wants to run a process (in Mesos called task) on the cluster. It first sends a request to the Master which in turn will get the resource information from the slaves. The master will then send an offer to Framework A and allow it to run on the slave that has resources available, in this case Slave 1.

In the architecture diagram, the following occurs:

  1. The slave sends the amount of resources it has available to the master.
  2. The master may then send out an offer with resources to the framework which it may accept or reject.
  3. If the framework accepts the offer, it sends a list of tasks it wants to run on the Mesos cluster.
  4. Tasks are being forwarded to the slave which runs the framework executor that will execute the tasks in the list received from the master.

The roles of masters and slaves can be summarized by the following:

  • Masters coordinate the work and gives it to the slave that has the resources to do it.
  • Slaves do the work and report how much more work they can do.

Frameworks

The frameworks is probably the component that you as a user will interface with the most. The framework is responsible for a few different things and there are a few different parts to the framework as well. Frameworks must consist of at least two things, a scheduler and an executor.

The scheduler is responsible for registering to the Mesos master and handles the offers. The executor is a program or command on the slaves which runs the tasks. If you have some constraints on your tasks that should be run, for instance you only want to run task X in datacenter Y, this is done in the framework. The scheduler in the framework knows the most about your task and may deny any offer it receives. To be able to know which slaves are in datacenter Y, Mesos slaves can be started with attributes. This is a key/value property list that are added to the slave. In the framework it's then possible to deny offers that doesn't have the datacenter Y attribute.

When the framework accepts an offer it sends the details of a task back to Mesos which will dispatch it to the slave.

There are several big frameworks at the time of this writing. The major ones being Apache Aurora, Apache Spark, Chronos and Mesosphere Marathon. Apache Aurora and Mesosphere Marathon aims both to be used with long-running services, for instance running your web server. For batch jobs, Apache Spark provides both standalone and Mesos powered cluster computing. Chronos does distributed cron.

There are many more frameworks out there, for instance to run Cassandra and Elasticsearch on Mesos. The high availability and self healing capabilities of Mesos makes it perfect to manage resources in the datacenter. See the framework as the application that runs on the Mesos cluster.

  • Framework schedulers take responsibility of offers to the Mesos master, the offers can be accepted or denied.
  • Framework executors run commands on the slaves.

Operating Apache Mesos

Most people reading this will probably come in contact with Mesos by operating it. Some of the people I've talked to have been a bit afraid of running and operating Mesos in their infrastructure. Most of the time this seems to be the lack of understanding about how the system works and not trusting the "magic" of how tasks are being run.

Key points that's the strength of the Mesos system:

  • Fault tolerance, when set up in high availability any individual part can break without bringing down the system.
  • Almost everything can be done from a web browser, no need to give everyone in the organization access to servers.
  • Using frameworks it's very easy to deploy applications.

I've got a very small cluster that I've been using for some time now. The entire time the cluster only went down partially one or two times. The downtime was caused by bugs in the Mesosphere Marathon framework which was due to a release that wasn't 100% mature and caused deploys to be locked. Currently I've upgraded the components successfully a few times and it's still stable and no downtime. As my cluster is very small, the failures will cause more impact (too many tasks and some won't get offers if a slave goes down).

The operability of Mesos itself is according to my experience good. It has been easy to set up and run, I spend less time on the infrastructure now than I did before I implemented it. Creating new tasks to try something out is very fast, thanks to Mesos, Marathon and Docker. The hardest part to operate is in my experience, Zookeeper. Since operating Mesos doesn't involve doing any particular operation directly to Zookeper yourself, it works very well.

As Twitter has been using Mesos for quite some time and are running a huge amount of their infrastructure on it, it's both battle-tested and proven stable in a demanding infrastructure.

Conclusions

Apache Mesos is a cluster manager that's robust, battle tested and designed for failures. The cluster will simplify and save money with dynamic allocations and you won't have to dedicate hardware to different systems. To provide security, Mesos will use isolators to keep processes from interfering with each other.

When a system is set up, it will need Zookeeper, Mesos master processes, slave processes and a framework. Masters coordinate the work and decides what resources to give the frameworks. The framework then decides if it wants to accept or reject an offer with resources. If the framework decides to accept the offer, it sends a list of tasks it wants to run to the master which then sends it through to the slave which will run the executor defined by the framework.

Related papers

  1. Autopilot: Automatic Data Center Management
  2. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center
  3. Omega: flexible, scalable schedulers for large compute clusters