The ninth factor in a 12-factor app is that the app should use “Disposable Processes.” Disposable processes allow us to recover from failures quickly, and if our application’s Docker containers have fast start-up times and they shut down gracefully, then we can scale our application quickly too.
This is the ninth video in our 12-factor Application Modernisation series. In this video, Marc Firth (Managing Director at Firney) explains software engineering best practices for building scalable, reliable web services that are efficient to work with.
A transcript of the video above is included below.
What are disposable processes?
Marc: So today we’re talking about how 12-factor apps should aim to have disposable processes, which means they should have a fast start-up and graceful shutdown. This helps our apps be more reliable and faster to deploy and spin up.
This is our series on application modernisation, where we aim to make applications more scalable, reliable and efficient to work with. The ninth factor in a 12-factor app is that it should have disposable processes. Put simply it’s the ability to create and destroy processes at a moment’s notice.
Now, in order to do that, we need the processes to be stateless. If you haven’t seen my video on stateless processes, I’ll link to that here.
Fast start-up and graceful shutdown
Disposable processes need to have a fast start-up and a graceful shutdown and not rely on any other processes in order to do their workloads. Starting fast and shutting down gracefully enables us to scale rapidly. This means our cloud environment can quickly spin up new instances of our processes using the stateless image that we created as a part of our build pipeline.
Each instance is automatically deployed into the environment with the correct config applied.
Scaling Application Processes
When it comes to scaling application processes, we nearly always use Docker and Kubernetes for this task. We build an image of our app using Docker and put that image into a registry that our Kubernetes cluster can access.
In order to scale up our processing, our Kubernetes cluster needs to have some config applied to it to ensure that it knows when and how to scale up our processes. In order to do that, we use an object in Kubernetes known as a “horizontal pod autoscaler”.
In that autoscaler, we may use something such as a target of 60% CPU utilisation averaged across all of the pods in the Kubernetes cluster. Kubernetes will continuously assess CPU utilisation in order to make sure that it maintains the correct number of containers and pods in the cluster.
If you’re not familiar with containers and pods, Kubernetes spins up an image of our application in a container which runs inside a pod. You may have many containers running inside a pod and many pods running inside your Kubernetes cluster.
Metrics for scaling
CPU utilisation isn’t the only metric you can use. There are loads of metrics. For instance, if you’re working with a queue, you might choose to use a target “maximum number of items” in that queue at any time. Kubernetes will continuously assess the number of items remaining in the queue in order to spin up new pods and containers that it needs in order to satisfy that requirement.
How Kubernetes scales our application containers
When Kubernetes spins up in your instance of our process, it spins that up inside a container, and if there isn’t enough room inside the pod for those containers, it will spin up additional pods to make sure that it has enough to hold all the containers.
The second part of disposable processes is that we want to make sure that they shut down gracefully.
So say, for instance, we have a queue and we have a background worker that’s doing some processing of jobs on that queue. We want to make sure that if Kubernetes decides to spin down the number of processes that are running in our containers and pods if those processes are doing any processing of jobs on that queue, they either finish their job or they return their job to the queue so that another worker can pick it up later.
This makes sure that we don’t end up with any unprocessed or partially processed data.
For a web server service such as Nginx or Apache, where we’re serving Web requests, we want to make sure that the user gets a response to their request before that container shuts down so they don’t end up getting an error.
So when it comes to scaling up, we want to make sure that our containers start in seconds so that they can quickly respond to the increase in the number of requests. If they don’t spin up fast enough, we may end up with some users getting an error response to their request.
Put installation of dependencies into the build step
Make sure that any long installation times are pushed into the build step. We can use a build cache to speed up that step. That makes sure that we’re not waiting for software to install whilst our image is spinning up on the cluster, and our image remains immutable.
Lean container images
Make sure your application images are as lean as possible, so only include what your application needs and nothing more.
Lightweight base images
Use a lightweight base in your Docker image. Such as Slim or Alpine, so that the image that we deploy is as small as possible and can spin up as fast as possible.
Set minimum/maximum pod instances
When it comes to scaling pods and containers, Kubernetes does a few clever calculations in order to assess how many containers or pods it should scale up or down at that time. So my fourth tip is to make sure that we set a minimum, a maximum number of pods that Kubernetes can scale between and this ensures that we have enough resources to deal with sudden spikes in traffic, for example, a successful marketing campaign on a website. It also ensures that Kubernetes doesn’t go nuts and decide to spin up a billion pods due to some error in our processing.
My fifth tip for making more scalable processes is to actually use microservices. If you’re using microservices rather than a monolithic architecture, you’ll only need to spin up that portion of the app. You won’t need to spin up entire instances of the whole monolithic architecture in order to satisfy those requests. You’re just dealing with a smaller part of the application that needs to scale.
When it comes to scaling down Kubernetes, this is quite clever because it will stop sending traffic through to our pods or containers. So our current requests will be allowed to finish, such as responding to an HTTP request.
Don’t scale down containers too quickly
My first tip when it comes to scaling down is that we don’t scale down too quickly. So we might set a rule to wait 15 minutes before scaling down so that we don’t scale down those pods and containers and suddenly need them a moment later.
This will help with any sporadic increases or decreases that are coming through smaller application within a short timeline.
Working with process workers
My second tip is that the background worker processes should return their job back onto the queue and release any locks that they might have obtained. This enables Kubernetes to spin down those containers and pods rapidly and enable another worker to come along and pick up that job off the queue. So we want to make sure that we wrap our worker processes inside a transaction, but be aware that there is a ten second “acknowledge time out” for PubSub.
We might also make our operations idempotent so that the result is the same no matter how many times those processes run.
Setting this up is a fair amount of work, so we might just let those processes finish if they’re short lived.
Processes should be robust against unexpected termination
My third tip and part of the 12-factor app is that processes should be robust against sudden death. So along with transactions, we would also set up replaying of messages and failed message queues in order to handle those failed transactions.
Basically, you need to think of graceful shutdown as being wishful thinking, as in “a graceful shutdown may or may not happen”. So you should try and save data cleanly where you can so that another worker can pick up that workload. You can also adopt that MapReduce start of processing by doing your processing in stages. So make sure you use failed item queues with their own workers and have a way of replaying those messages should there be an issue.
Locks and Exponetial Back-off
If you’re working with obtaining locks, make sure you release those locks after a timeout.
You can also try coding patterns such as exponential back-off to retry interactions with external services. If you’re unfamiliar with exponential back off, it’s basically a way of making a request. and if that request fails, then we try again after say, one minute and then 3 minutes and then 5 minutes and 10 minutes until we eventually give an error message.
We give that external service the opportunity to come back online and resolve our request.
Put simply, as you’re coding your app, make sure that you account for the fact that it might be an external service that fails.
I hope that helps you build more disposable processes that will enable you to scale your services faster and more reliably.
I hope you have an amazing day! Don’t forget to like subscribe and share and I’ll see you in the next video.