Monday, 14 December 2015

Microscaling on Azure - a walkthrough

Our instructions for Microscaling-in-a-Box focus on you running the containers to be scaled on your local machine, but of course you can spin the containers up and down on any machine you like, including a cloud-based virtual machine.

If you've been living under a rock you might not be aware that you can run Linux machines in Microsoft's Azure cloud.  Microsoft have generously given us some Azure credits, so we decided to give it a whirl and try out Microscaling-in-a-Box on the system.  Here's a walk-through using the new 'Portal' UI for Azure.

Installing Ubuntu machines on Azure

Pick Virtual Machines from the list of resources on the left, and click Add.  You can then pick your Linux - we're going for Ubuntu 14.04.



Azure walks you through setting up the basic settings like a machine name, and a username & password or key pair for an admin user.  In step 2 you'll pick the machine size - we've gone for the recommended DS1.



The wizard walks you through picking some additional features for your machine which are all all pretty straightforward. 





Once you're happy with the settings you'll be taken back to the Azure portal dashboard while the machine is provisioned.  When the machine is ready you'll be able to see its public IP address, which you can SSH into using the credentials you set up earlier.

ssh <username>@<ip address>

Install Docker

Now you can follow the instructions to install Docker on a Linux machine.  Here's a summary of the steps you need for a new Ubuntu 14.04 installation like the one we just created.


sudo apt-key adv --keyserver hkp://p80.pool.sks-keyservers.net:80 --recv-keys 58118E89F3A912897C070ADBF76221572C52609D
Edit the list of apt repositories ...
sudo vi /etc/apt/sources.list.d/docker.list
...and add the following line: 
deb https://apt.dockerproject.org/repo ubuntu-trusty main

...so that apt knows where to get Docker from.  Just a few more things to install. 

sudo apt-get update

sudo apt-cache policy docker-engine

sudo apt-get install linux-image-extra-$(uname -r)

sudo apt-get install docker-engine
Now we're ready to start Docker:
sudo service docker start
Test that the installation is up and running:
sudo docker run hello-world

Run Microscaling-in-a-box

You'll need to log in to Microscaling-in-a-box so that you have a user ID. Skip straight to step 3.

You'll need to prefix this command with 'sudo' (or you can configure your user as a member of the 'docker' group, as described on Docker's instructions for Ubuntu.)

sudo docker run -e "F12_USER_ID=<your ID>" -v "/var/run/docker.sock:/var/run/docker.sock:rw" -it force12io/force12:latest
Voilà - microscaling on Azure!

Not ready to send metrics

But wait... why do we keep seeing "Not ready to send metrics" in the Force12 logs? Our microscaling agent uses a web socket to send messages to the Force12 server, and it only allows one message to be outstanding at a time. When its timer pops every 500ms, it checks to see if a previous message is still being sent, and logs that message if so.  (You can see the code for this on GitHub.)

Run Force12 natively

Through experimentation we've discovered that performance improves significantly if we run the Force12 executable natively rather than inside a container.  

Clone the Force12 agent and build it as an executable for running on Linux.

GOOS=linux go build .

Copy the resulting force12 executable to the Azure machine if you didn't build it there, and then run it as follows.

sudo F12_USER_ID=<your user ID> ./force12

Our hunch is that there are issues with calling the Docker Remote API from within a container - if you've seen similar issues we'd love to hear about it! 

Next steps

There are two things we'll be doing next:
  • We've got some suspicions about performance - we'll be investigating and reporting back soon! Edit: updated with information about running the Force12 executable natively rather than in a container
  • We're also experimenting with running microscaling on the new Windows Containers on Windows Server 2016 

Thursday, 12 November 2015

Microservices, queues and the Internet of Things


One of the characteristics of Things is that, well, there are a lot of them. If you’re building a system that communicates with a large number of Things, you’re going to be dealing with potentially very large numbers of messages. 

The advantages of persistent queues

The traditional way to deal with a message is to put it onto a queue as soon as it arrives. Today we have scalable systems like Amazon’s Kinesis and Apache Kafka that are designed to cope with streams of data, and handle large numbers of messages in convenient ways. 
  • They provide persistence - put the message on the queue and you don’t have to worry about queue failures as the system automatically writes the item to storage. The items you queue will be there for up to, say, 24 hours so you can deal with them whenever your system is ready in that period. 
  • This persistence also makes it easy to handle failures.  Your code maintains a pointer to the current item to be processed. You move on to the next item only when you've completely processed that item, so that you know where to pick up again after a failure.  The effective length of the queue is the number of items that have yet to be processed - they actually remain in the system even after processing, only to be deleted when they time out after 24 hours (or whatever you've set the limit to).  
Queuing builds elasticity into your system so that the rate of processing doesn’t have to match the highest rate at which Things want to send messages to your system. If your Things are, say, lightbulbs, there’s a good chance that a lot of them will be telling you about a change in state around the same time of day, simply because they’re all turned on when it gets dark. So at that time the peak message rate will be substantially higher than during the middle of the day. 

The queue needs to cope with messages arriving at peak rate, but your processing doesn't necessarily have to process messages at the same rate. The delay that's acceptable between a message arriving and being processed defines the rate you'll need processing to happen - and the queue length can be a proxy measure for that delay.  

A typical microservices architecture for IoT

In a microservices architecture, you might have one or more services involved in the processing of the messages. If there’s more than one, the chances are that they are also connected with queues.  

We’ve seen quite a few companies with architectures that look something like this, with a "pod" of microservices that provide a pipeline of processing that needs to happen for each message. 



When the queue of unprocessed messages reaches a given size, it’s time to scale up the “pod” of containers that deal with messages on the queue so you can handle more in parallel. Likewise, you can scale down that pod when the queue length drops. Each pod can process messages independently and asynchronously. And while the purist in you might rebel at the idea, when you scale down you can simply terminate the pod without worrying about what has happened to any messages it is processing so far - this gets treated like a failure, so the messages will still be available on the queue to be picked up by one of the remaining pods. You’re already architecting the microservices such that they cope in the event of each others’ failures, right?

Scale for demand

The queue flattens out the demand for those batch processes, but can it flatten it entirely? In the case of the lightbulbs, it might not matter if you don’t process all the status update messages within, say, a second or two, but if your system doesn’t know that all the lights are on for an hour, it really doesn’t have a particularly useful view of the state of all your Things.  So even with a queue in place, you probably have some peaks and troughs in the amount of batch processing you need to do, and an “SLA” defining how quickly you need to process those messages. 

With Microscaling you can tie pod scaling to the SLA to make sure you’re meeting the needs of your business. And you can share more resources across different task types, getting better overall utilization from the system as a whole. 

  

Friday, 16 October 2015

Microscaling-in-a-Box

We’ve just launched our Microscaling-in-a-Box tool and open sourced the code.

You can try it out at https://app.force12.io, just log in and run a few quick Docker commands. It should take less than 5 minutes if you already have Docker installed.

What is Microscaling?

Microscaling is what we call scaling containers in real time in response to current demand. We use the term to differentiate it from traditional auto scaling where you are adding or removing capacity using Virtual Machines.

Real time scaling is possible because containers can be started in seconds or sub-seconds, whereas starting a Virtual Machine and joining it to a cluster takes minutes. This makes traditional auto scaling difficult and means it requires workarounds like scaling up quickly and scaling down slowly.

Microscaling-in-a-Box architecture


Our demo lets you experiment with microscaling using Docker on your local machine. We use the Docker Remote API as a simplistic single node scheduler.

The demo has 3 types of containers.

  • Force12 Agent - drives the scheduler, creates simulated randomized demand and reports the status of the demo to our API.
  • Priority 1 - a demonstration high priority app (e.g. a customer facing API).
  • Priority 2 - a demonstration low priority app (e.g. a worker process that can be interrupted).

In practice, for demo purposes the Priority 1 & 2 tasks both simply sleep.

Our tool lets you control the demo by configuring both the number of containers and parameters for the random demand.

Microscaling micro-services

In a real-world implementation there will be more than 2 types of container, each with its own relative priority. Each container type will be linked to real metrics, such as requests per second for a load balancer, or the length of a message queue.

Microscaling works well with the micro services approach. With multiple services some perform higher priority and more time critical tasks. Different services also get busy at different times depending on the business task they are performing.

At Force12 we’re focused on container scaling rather than scheduling containers. However containers need to be run on a cluster. This means a container scheduler is needed to provide functionality like fault tolerance by distributing containers across hosts.

There are many container schedulers out there and more being developed. So we’ve built demos of microscaling that integrate with the EC2 Container Service scheduler and the Marathon scheduler for Apache Mesos. We plan to support more such as Kubernetes, and Nomad which was recently released by Hashicorp.

You can read more about microscaling and our Marathon integration in this interview we’ve done with Daniel Bryant from InfoQ.

Now here are some of the technical details for the demo.

Force12 Agent

Our Force12 Agent is written in Go and is packaged as a Docker image. We’re using Alpine Linux as its base image. This is a minimal distribution of Linux that is only 5 MB in size. Our Go client is compiled as static binary and added to the image. This approach means our image is only 28 MB in size.

We really like the Alpine approach of a minimal Linux distribution with a good package manager to install extra packages as required. This is far better than using images based on traditional Linux distributions like Debian or CentOS. This generates huge images that are 600 MB or more and most of that code is never used.

The Docker Remote API we’re using to provide basic scheduling is the same API used by the Docker client. We access it by mounting the Docker socket running on the host within the Force12 container. This means that the demo app containers created are siblings of the Force12 container.

Originally we implemented the Demo using DIND (Docker in Docker) and using Docker Compose to link the DIND container to our Force12 container.  In this setup our demo app containers were children of the DIND container. This worked but DIND is mainly designed for testing Docker itself. So using it in this way isn’t recommended and can lead to data corruption.

See this blog post from Jérôme Petazzoni on why not to use DIND.

Priority 1 & Priority 2 containers

The demo containers are also based on the Alpine Linux image. They simply run a bash script with an infinite loop. This means the containers continue to run until they are stopped by the Force12 agent.

app.force12.io

 Server-side we built a Ruby on Rails application that handles receiving data from the Force12 agent, displaying the demo visualization and user login / signup.

Again we’re using Alpine Linux to keep our images small and our Rails app image is under 300 MB. This is pretty good as the image needs to include the Ruby interpreter and all the Rails gems. This blog post on minimal Ruby images with Alpine was very helpful in setting this up.

What’s Next

Now that we’ve open sourced our Force12 Agent we’ve got follow up releases planned with the integrations we’ve already done for demos with the EC2 Container Service and Marathon schedulers. There are also a lot more schedulers we’re planning to integrate with: Kubernetes, Nomad ….

If you follow us on Twitter (we’re @force12io) you’ll hear as soon as they are available. Or you can influence our roadmap by telling us which integrations you’d like us to prioritize next!







Friday, 25 September 2015

Force12 on Mesos

Today we’re launching a new version of our microscaling demo running on the Mesos platform. This adds to our existing demo for ECS (EC2 Container Service).

Microscaling is starting and stopping containers in real time based on current demand.  This is possible due to the much faster startup and shutdown times of containers compared with virtual or physical servers.

Mesos Demo

This post explains the architecture of the demo and how we’ve tuned it to launch containers 33% faster than on ECS.



CoreOS and Fleet

Our EC2 instances are running CoreOS, which is a Linux distribution that is optimized for running containers. We start a fleet cluster on these instances and use it to bootstrap our Mesos cluster.

We’ve released the setup code for this and you can run the cluster locally with Vagrant. I’ve written a guest post for Packet’s blog explaining it in more detail. You can also use the code to spin up a cluster on their high-end physical servers.

Mesos

We run all our Mesos components as Docker containers within CoreOS.  The Mesos Master instance runs the Mesos Master container and ZooKeeper. Each of the 3 “Agent” instances runs a Mesos Agent container.

Service discovery was the trickiest part of the setup.  Mesos has to use ZooKeeper for service discovery but it doesn’t have a DNS interface. This is a problem since EC2 instances are assigned random IPs. So we use the Consul service discovery tool from Hashicorp, which does have a DNS interface.

Marathon

The Mesos Master instance also runs a Marathon container. Marathon is a powerful scheduler from Mesosphere than runs on top of Mesos.  Our Force12 scheduler integrates with Marathon via its REST API.

We see Force12 as a container scheduler that cooperates with other schedulers. So we focus on microscaling and using your servers more efficiently. We integrate with schedulers like Marathon that provide fault tolerance for your services.

Force12 containers

Our Force12 scheduler runs within Marathon, and starts and stops Priority 1 (High priority) and Priority 2 (Low priority) containers based on the random (simulated) demand metric. This metric is set by the demand-rng container and stored in Consul using its Key / Value store.

Mesos / Marathon tuning

For the demo we want to push at the limits of what is possible with micro scaling - but still keep our cluster stable and without taking “heroic measures”. Here is what we’ve tuned so far.

Mesos Master – allocation interval

We reduced this command line parameter from 1 second to 100 ms to reduce the wait between allocations. Thanks to Daniel Bryant of OpenCredo for the tip on this. It’s worth noting this setting is only recommended on small Mesos clusters like ours.

https://open.mesosphere.com/reference/mesos-master/

Marathon – max tasks per offer

This is another command line parameter that we increased from 1 to 3 tasks.  This gave a big speed increase because our containers are launched in parallel rather than sequentially. We set the number of tasks to 3 because our random demand metric changes +- 3 containers. Task is the Mesos terminology but in our case each task is a single Docker container.

https://mesosphere.github.io/marathon/docs/command-line-flags.html

Private Registry

For the demo our Priority 1 & 2 containers use a small image based on the BusyBox Linux distribution. This image is only 1.1 MB but each time Mesos launches a task it does a docker pull to check the image is up to date.

This means we do around 45,000 docker pulls a day. So we’ve set up a private Docker repository for this image rather than pulling it from Quay.io, which we use for other images.

Restarting Mesos Agent containers

Our demo containers simply run a bash script in a loop as a work task. This means the load in the demo is from the microscaling rather than the containers themselves.
For the Mesos demo we found that the bottleneck on the cluster was CPU on the Mesos Agent servers. The CPU load would build up over time and after about 3 hours the cluster performance would degrade massively and stop tracking the demand metric.

As a workaround we restart a Mesos Agent container each hour across each of the 3 servers in turn. Mesos handles these restarts gracefully and all the demo containers can be run on just 2 cluster nodes.

Comparison with ECS

The Mesos demo can handle changing its demand by +- 3 containers rather than +- 2 for our ECS demo. The demand metric also changes every 3 secs instead of every 4 secs. So the Mesos demo has a 33% increase in speed and a big increase in the number of containers we launch each day over the ECS demo.

This was possible because using Mesos gives us more control over the cluster allowing us to do more tuning. With ECS we can only call the API and wait for the ECS Agent to start or stop our containers.

The tradeoff is that setting up Mesos is much more complex as we have to bootstrap the cluster ourselves. In contrast with ECS to form a cluster you simply launch some EC2 instances that are running the ECS Agent.

However we think microscaling can provide major efficiency savings on both platforms. This is why we’re developing Force12 as a platform agnostic solution that will run on all the major container platforms.

Tuesday, 25 August 2015

Running Marathon and Mini Mesos as containers

Mini Mesos is a great project recently released by Container Solutions for testing Mesos frameworks. It sets up a full Mesos cluster running inside a single Docker container. The cluster runs Zookeeper, Mesos Master and a configurable number of Mesos Agents. It also uses DIND (Docker in Docker) so that Docker containers can be run within the cluster.

The initial use case for Mini Mesos is for testing Mesos frameworks. So it’s designed to be run with Java as part of a JUnit test suite. However the Docker image has also been extracted to the mesos-local project.

At Force12.io we’re currently porting our microscaling demo from EC2 Container Service to Mesos and it will run on top of the Marathon framework. We’ve recently released coreos-marathon, which builds a 3-node Marathon / Mesos cluster locally with Vagrant or deployed to physical servers at packet.net.

This is great for testing and production but its quite heavy for development. So when we heard about Mini Mesos we were interested in integrating it into our development process. Here are the steps for setting this up.

Create Docker Machine

If you’re running OS X or Windows you first need to use Docker Machine to create a boot2docker VM for running the containers.

$ docker-machine create -d virtualbox --virtualbox-memory 4096 mini-mesos
$ eval $(docker-machine env mini-mesos)

Docker Compose

Next use Docker Compose to link the Marathon and Mini Mesos containers. The privileged flag needs to be true because we’re using DIND (Docker in Docker).

# docker-compose.yml

marathon:
  image: mesosphere/marathon:v0.9.0
  command: "--master zk://mesos:2181/mesos --zk zk://mesos:2181/marathon"
  links:
    - mesos
  ports:
    - "8080:8080"
mesos:
  env_file: .env
  privileged: true
  image: containersol/mesos-local
  ports:
    - "5050:5050"
  expose:
    - "2181"
  volumes:
    - "/sys/fs/cgroup:/sys/fs/cgroup:rw"

The .env file sets the necessary environment variables. In this case we’re running 2 Mesos Agents but this is configurable.

# .env

NUMBER_OF_SLAVES=2
MESOS_QUORUM=1
MESOS_ZK=zk://localhost:2181/mesos
MESOS_EXECUTOR_REGISTRATION_TIMEOUT=5mins
MESOS_CONTAINERIZERS=docker,mesos
MESOS_ISOLATOR=cgroups/cpu,cgroups/mem
MESOS_LOG_DIR=/var/log
SLAVE1_RESOURCES=ports(*):[9200-9200,9300-9300]
SLAVE2_RESOURCES=ports(*):[9201-9201,9301-9301]

Accessing Marathon & Mesos


You start the containers using Compose and use Machine to get the IP address of the boot2docker VM. The web UI’s for Mesos Master and Marathon are available on this IP address.

$ docker-compose up
$ docker-machine ip mini-mesos
192.168.99.100


  • Mesos UI: http://192.168.99.100:5050
  • Marathon UI: http://192.168.99.100:8080

Conclusion

Thanks again to Container Solutions for developing and releasing Mini Mesos. I can see it being useful for lots of projects. In our case it means we can develop our custom containers using Docker and have local access to a full Marathon / Mesos stack.

Wednesday, 24 June 2015

Force12 - past, present, future

Force12 is still a very new project but this post is on why we’re building it and where we see it going. At the moment it’s just a demo but this post describes how we’re going to get to our final goal.

We want Force12 to be an open source container scheduler that provides microscaling. Microscaling to us means providing QoS (Quality of Service) for your containers. So you run the right mix of containers at the right time. We see Force12 as specialising in QoS. So Force12 will integrate with existing schedulers like Marathon that provide fault tolerance.

Background - Figleaves.com

Anne and I first started working together at Figleaves. During our time there one of the big changes was the move from physical to virtual machines. We followed a typical approach of virtualising dev and test first before finally migrating production.

We worked closely with the Ops team to help plan these migrations to minimise system and developer downtime. Finally we’d see the results; stacks of obsolete servers arriving at the office before disposal. These had been replaced by far fewer more modern servers running VMware.

Another big influence on Force12 is that Figleaves is a very seasonal business. The autumn would mean deploying final changes, capacity testing and a freeze on major changes. Peak would begin around Black Friday and continue until Valentines Day with Christmas and January sale in between. After Valentines Day the change freeze would be lifted.

Since we were running our own physical servers the capacity planning had to be for the busiest hour during peak. This would usually be on the last shipping day before Christmas. Using public cloud and auto scaling could have helped a lot.

What’sMySize

After Figleaves I worked with Anne again on What’sMySize. What’sMySize provides personalised size guides for customers of fashion retailers. It’s developed in Ruby on Rails and is hosted on AWS using Elastic Beanstalk and RDS Postgres. One of the reasons we chose this architecture is because it supports auto scaling.

However auto scaling with VMs is difficult because it can take 4 – 5 minutes for an instance to boot and be joined into an Elastic Beanstalk cluster. This means you need to implement workarounds like scaling up early and scaling down slowly. Or auto scaling based on time periods and just running fewer servers during quiet periods.

Docker

Last year I’d started hearing about Docker and was following it from a distance. I’d also looked at the Docker support in Elastic Beanstalk. At the time it only supported 1 container per VM. This helped with deployment but that wasn’t a problem I had. The launch of ECS (EC2 Container Service) at re:Invent in November last year got me more interested.

In February at the Barcelona on Rails meet-up I saw a great presentation by Jorge Dias on using Docker with Rails. At this stage I wanted to try something out. So I dipped my toe in the water by moving by blog (MiddleMan / Ruby) from a Vagrant VM to using Docker and Compose.

This was followed by another post on running a Rails app on Docker in development. Lastly I did a Hello World example using ECS.

Force12.io

I’d been talking with Anne about Docker and ECS as I’d been writing these posts. We both thought that auto scaling was a great use case for containers because its possible to scale up and scale down in close to real time. We also thought that auto scaling worked well with micro-services architectures. As the time of day and current conditions affect the load on each service and their relative priority.

However nobody was really talking about auto scaling. There are good reasons for this because there is still lots to do on containers for security, networking, storage, fault tolerance, etc.

So we decided to build a demo on ECS, which we launched in May. It shows how quickly containers could be scaled against a random demand metric. There are posts describing what is Force12, the Force12 demo and the ECS cluster design.

Scaling Up

We launched the demo very early and at the moment its taking 3-4 seconds to start containers and under a second to stop them. We think we can reduce that to 1-2 seconds. However ECS is a new service and we’ve hit some problems including a bug with the ECS Agent because we’re constantly starting and stopping containers.

Once that bug is fixed we’re going to scale up the demo to support more containers. We’re also blogging about what we learn, here are posts on networking problems with CoreOS and how to setup New Relic monitoring on CoreOS.

More platforms

We started with ECS because we’re very familiar with AWS. This meant we could build the demo quickly. However we don’t see Force12 being tied to any specific platform. So we’re looking into running another demo on bare metal servers from Packet.net and getting Force12 running on Mesos / Marathon.

Other platforms such as Kubernetes will follow. But with a small team we need to prioritise and we think Marathon is a good match to integrate with Force12.

Real metrics

To switch from being a demo to a usable product a key step is supporting real metrics. We intend to support a wide range but the first 2 metrics will be requests per second for load balancers and message queue length.

Currently the Force12 demo uses a REST API hosted on Heroku and auto scaled using AdeptScale. Another key step will be moving this API into the demo, the “eat your own dog food” approach.

Open source version

Once we have Force12 using real metrics and supporting multiple platforms we’ll make the code open source and publish the container image so you can run it.

Until then we’re going to continue to shepherd it as as closed source project. But we’re looking to expand our core team. So if you’re also exciting about microscaling and want to help develop Force12 please get in touch!

Monday, 8 June 2015

CoreOS networking problems on ECS

This post describes some problems we’re having with our microscaling demo on ECS (EC2 Container Service). It describes why we switched from Amazon Linux to CoreOS and why today we’ve switched back to Amazon Linux.

The problems we’ve been having with CoreOS are around networking. We still really like CoreOS and we’d like to resolve the problems. So any feedback or fix suggestions would be greatly appreciated. Please add them in the comments.

Move from Amazon Linux to CoreOS

At the moment Amazon Linux and CoreOS are the 2 choices of operating system for running container instances in an ECS cluster. We started out using Amazon Linux and were happy with it until a new AMI was released with v1.1 of the ECS Agent and Docker 1.6.

This was just before we launched force12.io and we saw a massive slowdown in responsiveness of the demo. At this point we switched to CoreOS stable as it gave us more control and allowed us to continue using v1.0 of the ECS Agent and Docker 1.5.

Fixing issue with v1.1 Agent

After we’d launched we worked with AWS to resolve the performance problems with the v1.1 agent. It turned out the problem was at our end because we weren’t trapping the SIGKILL signal

The 1.1 agent stops containers in a more correct manner but because we weren’t trapping the correct signals our demo containers were hitting a 30 second timeout of the docker stop command. So our containers were being force killed (status 137). We fixed our demo containers and upgraded to the v1.1 agent on CoreOS stable.

Trying CoreOS beta for Docker 1.6

Another benefit of the CoreOS switch was we got to try an operating system optimised for containers. Overall I like the CoreOS approach but there is a steep learning curve due to systemd and we’re hit several problems with it being a new distro.

One thing we’ve been keen to try is the upgrade from Docker 1.5 to 1.6. This moved from the alpha to the beta channel last week. At that point we switched our staging rig to CoreOS beta to see the effect on performance and stability.

DHCP with multiple interfaces

On CoreOS beta we continued to see networking problems. We would see the message below and see network connectivity problems from our containers. These could be the ECS Agent becoming disconnected, the Force12 scheduler being unable to call the ECS API or our demand randomizer being unable to call our REST API on Heroku.

kernel: IPv6: ADDRCONF(NETDEV_UP): veth01426c4: link is not ready

I looked for networking issues on the CoreOS bugs repository on GitHub. There wasn’t a direct match but this issue about DHCP with multiple interfaces seemed related.

Since our problems were with networking I decided to simplify our configuration by not assigning public IPs to our container interfaces. This turned out to be quite involved and meant using a lot of VPC features I hadn’t used before. Our new VPC setup has public and private subnets.  All the ECS container instances run in the private subnet. The public subnet has a NAT instance, which provides outbound connectivity to the ECS instances for applying updates. This guide was very useful in setting it up.

Back to Amazon Linux and turning off checkpointing

Removing the public interfaces seemed to help but we continued to have connectivity problems. So we decided to move back to Amazon Linux since our problems with the v1.1 agent are fixed and we’re seeing good performance with Docker 1.6. This worked well on staging but we spotted an issue when a container instance had 0 containers running. There was a pause of around 30 seconds and the message “Saving state!” appears in the logs.

This is a feature called checkpointing, which saves the state of an ECS agent so it can be restored if an instance crashes. However for our use case we don’t need it so we turned it off by setting the ECS_CHECKPOINT environment variable to false.

Current Status

Now we’re back on Amazon Linux we’re seeing good performance. We’ve also been able to turn up the demand randomizer to make it a bit spikier. This is part of our continued plans to scale up the demo as we learn more about ECS and improve the Force12 scheduler. However we’re still seeing periods of 30 to 60 seconds where the ECS Agent becomes disconnected. We’re continuing to investigate this, as it’s the biggest stability problem with the demo at the moment.

Thursday, 4 June 2015

Force12 demo architecture

Force12.io is a demo of microscaling containers using ECS (EC2 Container Service) from AWS. It shows containers being rapidly stopped and started based on a randomized demand metric. To use a networking analogy Force12 is providing QoS (Quality of Service) for containers.

In a router QoS will prioritise voice data over downloads because the VOIP traffic is more demand sensitive. With containers a public API used by a mobile app would be more demand sensitive and higher priority than a worker process performing a background task.

Our previous post described the demo in more detail. This post is on the design of the ECS cluster. A later post will be on the wider architecture which includes a REST API hosted with Heroku and the front end which is an Angular app.

Generally building the demo has gone well considering how much new technology we’re using. However there have been some problems and changes of direction and they are described in this post.

Force12 ECS cluster

EC2 Auto Scaling Group

The cluster consists of 3 m3.medium instances running in an Auto Scaling Group. We use m3.medium because it’s the smallest instance type that isn’t throttled like the t2 series. We use spot instances to keep the costs down.

CoreOS

In ECS terminology a Container Instance is a VM that is a node in the ECS cluster. For these VM’s there are currently 2 choices of operating system Amazon Linux or CoreOS.

We originally started using Amazon Linux but then switched to CoreOS. Mainly this was because I was interested to try an operating system optimized for containers. I’ve been impressed with CoreOS and especially their documentation. These pages on ECS, EC2 and Vagrant have all been essential. At the moment we’re running CoreOS stable. Now that Docker 1.6 is in the beta channel we’ll be switching to that.

Quay.io

For the demo we decided early on that we weren’t going to open source the project when we launched. Instead we wanted to launch a demo as early as possible to see what interest there was in the idea. For that reason we’re using a private repository from Quay.io to host our containers.

This did cause some problems as initially we set up our ECS cluster in AWS’s EU (Ireland) region. Mainly we do projects for European clients and so we prefer to keep our data in the EU and close to their customers.

Since we don’t have any customer data on this project we moved everything to the US East region. Quay also seems to be hosted in US East and we got a noticeable increase in performance after the move. So we think the choice and location of your Docker repository is an important one. Since container launch speed is important for us we’re thinking about running our own repository bringing it even closer to our ECS cluster.

System Containers - ECS Agent & New Relic

The demo shows the containers running on our ECS cluster but it doesn’t show the 2 extra system containers installed on each node.

The ECS Agent is written in Go and it calls Docker to start and stop containers on behalf of the ECS Scheduler.

The New Relic container provides their server monitoring plus some extra metrics they’ve developed for Docker. The Docker socket running on the CoreOS VM is mounted in the New Relic container so it can be monitored.

I’ve written a post on how to install these containers as services when booting a container instance running CoreOS into an ECS cluster.

Force12 scheduler

The Force12 scheduler is written in Go. It polls a DynamoDB table to get the random demand metric. When the demand changes it stops containers to create capacity and then starts containers to match the desired quantity.

To stop and start these containers the scheduler calls the ECS API. In ECS terminology these are actually called tasks. Each task can have multiple containers but in our case each task has a single container.

The scheduler is written in Go rather than Ruby because we felt when we release it we’ll need the additional speed. The other reason is it’s my business partner Anne who does the scheduler development. She has a strong C background and so is much happier working with Go than Ruby.

Demand randomizer

The demand-RNG container is developed in Ruby. Its only responsibility is setting the random demand metric and updating the DynamoDB table. Both the force12 and demand-RNG containers run as ECS services. This means if a container dies it is replaced automatically.

Demo containers – priority1 & priority2

Our original idea for the demo was much more complex. We wanted the demo containers to be constantly generating a series of random visualizations. As we got into building the demo we realised this was over complex.

What we really wanted to show is that one of the properties of containers is ideal for autoscaling. Containers can be stopped and started in close to real time whereas with Virtual Machines this takes minutes. So the demo containers aren’t actually doing anything. They are based on the minimal busybox image and run sleeps of 1 second in an infinite loop.

However we did have a problem with the demo containers. We saw a big performance drop when we upgraded the ECS Agent from v1.0 to v1.1. The newer agent stops containers in a more correct way but this was causing timeouts when the agent calls the docker stop command.

The problem was we weren’t trapping the SIGTERM signal. This meant our containers were being force killed (Docker status 137) instead of stopping cleanly (Docker status 0). We got some great support from the ECS development team on GitHub who helped us find this.

Current Status

We’re still working on some issues with the current demo. The cluster mainly tracks the demand metric but we’re still seeing 30 second periods where the cluster stops responding.

A possible cause of this is the ECS Agent stops containers but it waits 3 hours before removing them. This is useful for debugging purposes but in our case it means up to 700 containers build up on each instance after 3 hours. So it could be a “garbage collection” problem when the stopped containers are being removed. The ECS team are working on an enhancement for the stopped containers issue and we’ve given our feedback on what works for us.

There are also several fixes / enhancements that we want to make to the front end. The front-end developers we usually work with are busy on other projects. We didn’t want to delay the launch so it meant I did most of the front-end development but I’m much happier working on the back-end and infrastructure parts of the stack.

What’s Next?

We’ve had some great reaction to the demo so we want to keep on showing that autoscaling is a great use case for containers. We’re also trying to do blog driven development as we make changes. For us getting people talking about micro scaling with containers is just as important as developing Force12.

For turning Force12 from a demo into an open source product the next step is to start using real demand metrics. At the moment our REST API is a Sinatra Ruby app hosted on Heroku and autoscaled using AdeptScale. We’re going to move that in house and host the API on the ECS cluster and auto scale it using CloudWatch metrics.

Other areas we’re looking at are scaling up the demo and moving it to other platforms such as Kubernetes or Mesos. We chose to develop the demo on ECS and the AWS platform for 2 reasons. We’re very familiar with AWS and we thought using AWS was the quickest way we could launch the demo. However the “batteries included, but removable” design approach is something we support and long term we don’t see Force12 being tied to a specific platform.

Tuesday, 19 May 2015

Monitoring Docker with New Relic on CoreOS and ECS

This post is on how to use New Relic server monitoring on ECS (EC2 Container Service) using CoreOS.

Update - 9 November 2015

This post is out of date as it was based on an early version of New Relic's monitoring for Docker. You can find an updated systemd unit file on GitHub (lorieri/coreos-newrelic). This worked for me with CoreOS stable 766.4.


Last Friday we launched our Force12 demo of container autoscaling. This is the first of a series of posts on how we're using ECS and what we've learnt from building the demo.

Docker Metrics in New Relic server monitoring


We're running our 3 EC2 instances in an Auto Scaling Group. The Launch Configuration for the EC2 instance installs 2 Docker containers as services. This reflects the architecture of CoreOS which is to keep the OS as minimal as possible and install extra components as containers.
  • ECS Agent - controls Docker on the EC2 instance and communicates with the ECS API
  • New Relic System Monitor - the New Relic sysmond service deployed as a Docker container
At the end of the post is our full cloud-config configuration. Its worth noting that the syntax has to be exact and issues like trailing whitespace will prevent the services being installed.

CoreOS with Quay.io Private Repository

The starting point for our CoreOS setup was their ECS example configuration. We're using private repositories from Quay.io so we configure this by adding the environment variables ECS_ENGINE_AUTH_TYPE and ECS_ENGINE_AUTH_DATA. To get the auth data run the docker login command on your local machine and use the data created in the .dockercfg file.

$ docker login quay.io

# .dockercfg
{"quay.io":{"auth":"***YOUR_AUTH_DATA***","email":"email@example.com"}}

New Relic

The extra Docker metrics for New Relic server monitoring are currently in beta. Initially when we installed the server monitoring it was working but there were no Docker metrics. This was fixed by adding this parameter -v /var/run/docker.sock:/var/run/docker.sock

This mounts the Docker socket running on the host in the New Relic container so it can monitor it. This forum post was very useful in getting this working and this issue is being worked on at New Relic.

cloud-config

Here is the full cloud-config file that runs when each EC2 instance is launched. To recap make sure you're setting the following:
  • YOUR_ECS_CLUSTER - make sure this matches your ECS cluster name.
  • YOUR_AUTH_DATA - set this if you're using private repositories.
  • YOUR_NEWRELIC_LICENSE_KEY - this should be your New Relic license key without quotes.
#cloud-config

coreos:
 units:
   -
     name: amazon-ecs-agent.service
     command: start
     runtime: true
     content: |
       [Unit]
       Description=Amazon ECS Agent
       After=docker.service
       Requires=docker.service

       [Service]
       Environment=ECS_CLUSTER=YOUR_ECS_CLUSTER
       Environment=ECS_LOGLEVEL=info
       Environment=ECS_ENGINE_AUTH_TYPE=dockercfg
       Environment=ECS_ENGINE_AUTH_DATA=YOUR_AUTH_DATA
       ExecStartPre=-/usr/bin/docker kill ecs-agent
       ExecStartPre=-/usr/bin/docker rm ecs-agent
       ExecStartPre=/usr/bin/docker pull amazon/amazon-ecs-agent
       ExecStart=/usr/bin/docker run --name ecs-agent --env=ECS_CLUSTER=${ECS_CLUSTER} --env=ECS_LOGLEVEL=${ECS_LOGLEVEL} --env=ECS_ENGINE_AUTH_TYPE --env=ECS_ENGINE_AUTH_DATA --publish=127.0.0.1:51678:51678 --volume=/var/run/docker.sock:/var/run/docker.sock amazon/amazon-ecs-agent

       ExecStop=/usr/bin/docker stop ecs-agent
   -
      name: newrelic-system-monitor.service
      command: start
      runtime: true
      content: |
        [Unit]
        Description=New Relic System Monitor (nrsysmond)
        After=amazon-ecs-agent.service
        Requires=docker.service

        [Service]
        TimeoutStartSec=10m
        ExecStartPre=-/usr/bin/docker kill nrsysmond
        ExecStartPre=-/usr/bin/docker rm nrsysmond
        ExecStartPre=/usr/bin/docker pull newrelic/nrsysmond:latest
        ExecStart=/usr/bin/docker run --name nrsysmond --rm \
          -v /proc:/proc -v /sys:/sys -v /dev:/dev -v /var/run/docker.sock:/var/run/docker.sock --privileged=true --net=host \
          -e NRSYSMOND_license_key=YOUR_NEWRELIC_LICENSE_KEY \
          -e NRSYSMOND_loglevel=info \
          -e NRSYSMOND_hostname=%H \
          newrelic/nrsysmond:latest
        ExecStop=/usr/bin/docker stop -t 30 nrsysmond

Saturday, 9 May 2015

About the Force12.io demo

The Force12 demo shows Linux containers being automatically created and destroyed to handle unpredictable demand in real time.

 It's a live, real time view of containers being created and destroyed to meet demand on our AWS cluster.

Container Demand = red, Containers = dark blue

It’s deliberately simple.

AWS Setup

  • There's a fixed pool of resources (3 EC2 VMs). 
  • 2 generic container types: Priority 1 (dark blue) and Priority 2 (lilac). 
  • 1 Demand_RNG container, which randomly generates demand for Priority1 containers (demand in red). 
  • 1 Force12 scheduler, which monitors demand and starts and stops Priority1 and Priority2 containers. 
The first bar chart show what’s happening right now (red demand vs blue Priority1 containers).

The second chart shows historical snapshots of the last few seconds (red demand, dark blue P1 containers and lilac P2 containers).

Below the charts you can see what’s happening now on each container instance.

Goal

Force12’s job is to

  • Meet demand by starting or stopping Priority1 containers. 
  • Use any leftover resources for Priority2 containers. 

Success is when the running Priority1 containers meet the demand AND the total number of running Priority1 + Priority2 containers = 9 (maximum utilisation of fixed resources).

So if there are 4 Priority1 containers running there should be 5 Priority2 containers running. If the demand for Priority1 services increases by 1, Force12 will stop a Priority2 container and start a new Priority1 container ASAP.

The Results

As you can see from the demo, on a fairly untuned environment you can get container instantiation speeds of around 3 seconds (with some normal-style spread around that). Shutdown is much faster. This untuned instantiation time is far higher than the sub-second startup times we know are possible in a more tuned environment.

To improve stability and speed we’ll evolve this basic set-up and we’ll blog about what we do and what effect it has.

Known Issues with the Demo

  • Demo container start time is 3-4 seconds. We want to achieve sub second speeds without heroic measures, i.e. with a standard cloud service setup.
  • The container instances can become unresponsive to starts and stops and they can take 30 seconds to recover (but they do recover). This seems to be related to the container networking. We're looking into this.

What are Containers?

Where Did Containers Come From?

In the beginning there were monolithic physical servers. They each ran a single operating system like Linux or Windows.

Then we devised virtual machines and we could run multiple guest operating systems on a single host server. This gave us huge flexibility - the ability to use physical servers more effectively (server density and multi tenancy) and change their use comparatively rapidly (in hours or even minutes).
Physical Server with 2 VMs

Finally, products like Vagrant, Chef and Puppet gave us the ability to script the creation of  VMs. That made it much easier to get consistency across development, test and production environments.

When combined with IaaS VMs became an amazingly effective way to get more from physical infrastructure and cut hosting costs.

Containers Are a Powerful New Take on VM Concepts

How are containers different to VMs?

VMs are great, but when you’re running several guest OSs on a host OS you’re duplicating a lot of functionality - multiple full network stacks for instance. That’s a waste.

Physical Server with 2 Containers

Containers are not VMs - but they kind of act like them. Containers are processes that run on your host OS, but behave conceptually much like a very lightweight VM. They focus on providing the separation and configurability of a VM with minimal duplication between container and host OS. This means you can fit more containers than VMs on a physical server (lower costs) and you get much faster launch speeds (a container could potentially be instantiated fast enough to handle a single network packet).

Each container could run several applications (a "fat" container) but they often just run a single application ("thin").

Containers are managed on your host OS using a container engine application (Docker for example). Like a hypervisor, a container engine routes network traffic to individual containers and divides up and controls access to shared Host resources like memory and disk. Docker also cleverly provisions a container’s contents via preconfigured images and scripts (in much the same way Vagrant allows you to script VM creation).

Each container then emulates a cut down “guest” that supports a restricted set of applications. For example, a web server or a database.

In order to make containers even faster, they can be hosted on a stripped down open source Linux variant like CoreOS or Snappy Ubuntu, but this isn’t necessary, many ordinary Linux variants support containers out of the box.

Are there Windows Containers?

Containers were originally developed for Linux, but Microsoft are in the process of developing similar function for Windows.

Why are Containers Better than VMs or Bare Metal?

Containers have most of the advantages of VMs: flexibility and scriptability. However, they can achieve higher server densities and faster instantiation than VMs because of the reduction in duplication between the host OS and guest OSes.

It is the extreme speed of instantiation and destruction of containers that we’re exploiting in Force12.

How are Containers Worse than VMs?

You can’t mix different OSes on the same host with containers (for example, you couldn’t have a Windows container on a Linux host). That potentially reduces the flexibility, although this often isn’t much of an issue in real world scenarios outside of dev and test.

You can’t give a container its own IP address. The container inherits the IP address of the host and you can only distinguish individual containers using port numbers (although there are some open source projects like Calico that can indeed allow full IPV4 or IPV6 addressing of containers).

Containers running on the Host are just processes. They are not as sandboxed in terms of disk, memory, cpu etc.. as a VM would be. This currently makes them less secure in a multi-tenant environment. Again, that’s being worked on.

Friday, 8 May 2015

What is Force12?

Force12 Dynamic Container Autoscaling

Force12 monitors demand on a cluster and then starts and stops containers in real time to repurpose your cluster to handle that demand.

Force12 is designed to optimize the use of an existing cluster in realtime without manual intervention.

VMs cannot be scaled in real time and neither can physical machines but containers can be started or stopped at sub second speeds. This potentially allows a cluster to adapt itself in real time, producing the optimal configuration to meet current demand.

For example, in response to a traffic peak worker services performing low urgency tasks
can be stopped and web services started. When the traffic peak ends the cluster can reconfigure itself to kill off web services and create more worker instances again.




Force12 optimises the use of the cluster resources available right now - existing VMs or physical servers.

Router Analogy

The Force12 approach is analogous to the way that a network router dynamically optimises the use of a physical network. A router is limited by the capacity of the lines physically connected to it - adding additional capacity is a physical process and takes a long time.

Network routers therefore make decisions in real time about how to best use their current local capacity. They do this by deciding which packets will be prioritized on a particular line based on the packet's priority (SLA). For example, at times of high bandwidth usage a router might automatically prioritize VOIP traffic over web browsing or file transfer.

Force12 can make similar instant judgements on service prioritisation within your cluster because using containersit can start and stop services near real-time.

Network routers can only make very simplistic prioritization judgments because they have limited time and cpu and they act at a per packet level. Force12 has the capacity to make far more sophisticated judgements but to start with it won't - simple judgements are proven to work in a network so let's start there and worry about greater sophistication later.

The Force12 demo is a bare-bones implementation that recognises only 1 demand type: randomised demand for a priority 1 service. When this fluctuating priority 1 (P1) demand has been met, then a priority 2 (P2) service will utilize whatever cluster resource remains.

The demo demand type example has been chosen purely for simplicity.

Force12

Force12 can be configured to actively monitor real time system analytics (queue lengths, load balancer requests etc) and then instantly reconfigure your systems to respond to current conditions.

Force12 will allow your cluster to adapt in an organic, real time fashion to handle whatever unpredictable events the outside world throws at it, without you having to anticipate those events.

Only the incredible speed of container startup and shutdown makes this possible.