Wednesday, 24 June 2015

Force12 - past, present, future

Force12 is still a very new project but this post is on why we’re building it and where we see it going. At the moment it’s just a demo but this post describes how we’re going to get to our final goal.

We want Force12 to be an open source container scheduler that provides microscaling. Microscaling to us means providing QoS (Quality of Service) for your containers. So you run the right mix of containers at the right time. We see Force12 as specialising in QoS. So Force12 will integrate with existing schedulers like Marathon that provide fault tolerance.

Background - Figleaves.com

Anne and I first started working together at Figleaves. During our time there one of the big changes was the move from physical to virtual machines. We followed a typical approach of virtualising dev and test first before finally migrating production.

We worked closely with the Ops team to help plan these migrations to minimise system and developer downtime. Finally we’d see the results; stacks of obsolete servers arriving at the office before disposal. These had been replaced by far fewer more modern servers running VMware.

Another big influence on Force12 is that Figleaves is a very seasonal business. The autumn would mean deploying final changes, capacity testing and a freeze on major changes. Peak would begin around Black Friday and continue until Valentines Day with Christmas and January sale in between. After Valentines Day the change freeze would be lifted.

Since we were running our own physical servers the capacity planning had to be for the busiest hour during peak. This would usually be on the last shipping day before Christmas. Using public cloud and auto scaling could have helped a lot.

What’sMySize

After Figleaves I worked with Anne again on What’sMySize. What’sMySize provides personalised size guides for customers of fashion retailers. It’s developed in Ruby on Rails and is hosted on AWS using Elastic Beanstalk and RDS Postgres. One of the reasons we chose this architecture is because it supports auto scaling.

However auto scaling with VMs is difficult because it can take 4 – 5 minutes for an instance to boot and be joined into an Elastic Beanstalk cluster. This means you need to implement workarounds like scaling up early and scaling down slowly. Or auto scaling based on time periods and just running fewer servers during quiet periods.

Docker

Last year I’d started hearing about Docker and was following it from a distance. I’d also looked at the Docker support in Elastic Beanstalk. At the time it only supported 1 container per VM. This helped with deployment but that wasn’t a problem I had. The launch of ECS (EC2 Container Service) at re:Invent in November last year got me more interested.

In February at the Barcelona on Rails meet-up I saw a great presentation by Jorge Dias on using Docker with Rails. At this stage I wanted to try something out. So I dipped my toe in the water by moving by blog (MiddleMan / Ruby) from a Vagrant VM to using Docker and Compose.

This was followed by another post on running a Rails app on Docker in development. Lastly I did a Hello World example using ECS.

Force12.io

I’d been talking with Anne about Docker and ECS as I’d been writing these posts. We both thought that auto scaling was a great use case for containers because its possible to scale up and scale down in close to real time. We also thought that auto scaling worked well with micro-services architectures. As the time of day and current conditions affect the load on each service and their relative priority.

However nobody was really talking about auto scaling. There are good reasons for this because there is still lots to do on containers for security, networking, storage, fault tolerance, etc.

So we decided to build a demo on ECS, which we launched in May. It shows how quickly containers could be scaled against a random demand metric. There are posts describing what is Force12, the Force12 demo and the ECS cluster design.

Scaling Up

We launched the demo very early and at the moment its taking 3-4 seconds to start containers and under a second to stop them. We think we can reduce that to 1-2 seconds. However ECS is a new service and we’ve hit some problems including a bug with the ECS Agent because we’re constantly starting and stopping containers.

Once that bug is fixed we’re going to scale up the demo to support more containers. We’re also blogging about what we learn, here are posts on networking problems with CoreOS and how to setup New Relic monitoring on CoreOS.

More platforms

We started with ECS because we’re very familiar with AWS. This meant we could build the demo quickly. However we don’t see Force12 being tied to any specific platform. So we’re looking into running another demo on bare metal servers from Packet.net and getting Force12 running on Mesos / Marathon.

Other platforms such as Kubernetes will follow. But with a small team we need to prioritise and we think Marathon is a good match to integrate with Force12.

Real metrics

To switch from being a demo to a usable product a key step is supporting real metrics. We intend to support a wide range but the first 2 metrics will be requests per second for load balancers and message queue length.

Currently the Force12 demo uses a REST API hosted on Heroku and auto scaled using AdeptScale. Another key step will be moving this API into the demo, the “eat your own dog food” approach.

Open source version

Once we have Force12 using real metrics and supporting multiple platforms we’ll make the code open source and publish the container image so you can run it.

Until then we’re going to continue to shepherd it as as closed source project. But we’re looking to expand our core team. So if you’re also exciting about microscaling and want to help develop Force12 please get in touch!

No comments:

Post a Comment