One of the characteristics of Things is that, well, there are a lot of them. If you’re building a system that communicates with a large number of Things, you’re going to be dealing with potentially very large numbers of messages.
The advantages of persistent queues
The traditional way to deal with a message is to put it onto a queue as soon as it arrives. Today we have scalable systems like Amazon’s Kinesis and Apache Kafka that are designed to cope with streams of data, and handle large numbers of messages in convenient ways.
- They provide persistence - put the message on the queue and you don’t have to worry about queue failures as the system automatically writes the item to storage. The items you queue will be there for up to, say, 24 hours so you can deal with them whenever your system is ready in that period.
- This persistence also makes it easy to handle failures. Your code maintains a pointer to the current item to be processed. You move on to the next item only when you've completely processed that item, so that you know where to pick up again after a failure. The effective length of the queue is the number of items that have yet to be processed - they actually remain in the system even after processing, only to be deleted when they time out after 24 hours (or whatever you've set the limit to).
Queuing builds elasticity into your system so that the rate of processing doesn’t have to match the highest rate at which Things want to send messages to your system. If your Things are, say, lightbulbs, there’s a good chance that a lot of them will be telling you about a change in state around the same time of day, simply because they’re all turned on when it gets dark. So at that time the peak message rate will be substantially higher than during the middle of the day.
The queue needs to cope with messages arriving at peak rate, but your processing doesn't necessarily have to process messages at the same rate. The delay that's acceptable between a message arriving and being processed defines the rate you'll need processing to happen - and the queue length can be a proxy measure for that delay.
A typical microservices architecture for IoT
In a microservices architecture, you might have one or more services involved in the processing of the messages. If there’s more than one, the chances are that they are also connected with queues.
We’ve seen quite a few companies with architectures that look something like this, with a "pod" of microservices that provide a pipeline of processing that needs to happen for each message.
When the queue of unprocessed messages reaches a given size, it’s time to scale up the “pod” of containers that deal with messages on the queue so you can handle more in parallel. Likewise, you can scale down that pod when the queue length drops. Each pod can process messages independently and asynchronously. And while the purist in you might rebel at the idea, when you scale down you can simply terminate the pod without worrying about what has happened to any messages it is processing so far - this gets treated like a failure, so the messages will still be available on the queue to be picked up by one of the remaining pods. You’re already architecting the microservices such that they cope in the event of each others’ failures, right?
Scale for demand
The queue flattens out the demand for those batch processes, but can it flatten it entirely? In the case of the lightbulbs, it might not matter if you don’t process all the status update messages within, say, a second or two, but if your system doesn’t know that all the lights are on for an hour, it really doesn’t have a particularly useful view of the state of all your Things. So even with a queue in place, you probably have some peaks and troughs in the amount of batch processing you need to do, and an “SLA” defining how quickly you need to process those messages.
With Microscaling you can tie pod scaling to the SLA to make sure you’re meeting the needs of your business. And you can share more resources across different task types, getting better overall utilization from the system as a whole.