Anton Lindstrom (about, @twitter, @github)

Moving onto Mesos


Instead of using Heroku for this site for a while but decided to build something to host it for myself. There are some apps I use that wasn't fast enough on Heroku so I wanted to try the stability and features of Mesos and Marathon.

What I wanted from the platform:

  1. Fault tolerant
  2. Easy to deploy
  3. Fast
  4. Minimum maintenance
  5. Isolated
  6. Work with a lot of different apps

Mesos is something I've been eyeballing for quite some time and decided it was time to use it to host something I really care about. Mesos also satisfies goal 1, 3, 4, 5 (with Docker isolators) and 6 (also with Docker).

Marathon basically meets all the goals with the help of Mesos.

There's a lot of information about how to run Mesos and Marathon so I'm not going to get into much details about the installation more than the pitfalls I have discovered so far with my three node cluster.

The pipeline to deploy this site is the following:

  1. Add a new file into git
  2. Make sure I have a Dockerfile and a file called .build.json
  3. Push it to my private git repository
  4. A git hook takes care of building the Docker image
  5. The hook also pushes the image to a private docker repository
  6. also, pushing the app definition to Marathon
  7. Marathon pulls the image and runs the instance
  8. Bamboo is used and listens to Marathon events.
  9. Everything is staged and ready to serve requests

The only thing I built myself for this is a git hook that takes a file, .build.json and interacts with Docker and Marathon. What it does is that it builds the Dockerfile provided, pushes the image to the private docker repository and then sends a POST to Marathon to create or update the application.

The only things that have been problematic for me has been that I didn't read enough about Zookeeper, so I forgot to purge old logs (this caused some interesting failures with Zookeeper). And then also that I've been a bit too fast on upgrading Mesos. When I used Mesos 0.20.0, Marathon and Bamboo did not support it, so I had to run on experimental branches for a while, which has worked so-so.

Another advice is to put a master cluster, with Mesos masters and Zookeeper separate from the slaves so that it wont interfere with the services running.

So far I am really happy with the setup and will continue to run it in an experimental phase until I feel that I'm comfortable running it in a somewhat production environment and trust it's stability.

The pipeline works really good and it's almost as fast as Heroku to deploy new applications. If I were to add some more caching in the build steps I'm sure I can speed this up a bit more.