Developers, who approach the world of microservices and Kubernetes, typically inquire: is it better to separate microservices by layer or purpose? How many gateways should I have? Which languages would be more appropriate to use? Do I need a pub/sub to make microservices communicate with each other?
In this article, we will look at some architectural styles and best practices of testing, deployment, monitoring, and business continuity to create robust and scalable platforms.
The article comes from the talk held at the GDG DevFest Italia 2020 by Giulio Roggero, CTO of Mia-Platform, available in Italian at this link.
First of all, let's try to explain why we call them architectural “styles”. We chose to use a word related to the world of art because we think that, as artists interpret reality through their own styles, computer scientists interpret the reality of data according to a style.
Introduction to Kubernetes
So let's start from the beginning, briefly reviewing the basics of Kubernetes and architectural styles. The styles, as we describe them in this article, are the descriptions of complex, articulated and distributed computer systems.
It is often said that easy things are difficult to do. The architectural style is what allows you to simplify something very complex and that lies at the base of our systems.
So, going back to our initial simile: as artists have colors and brushes, computer scientists have Kubernetes.
Kubernetes could be defined as a “distributed operating system” for modern computer systems.
To understand how Kubernetes works, let's take a step back and think about how an operating system works.
A process
Let's take a process running on our computer (if you use Linux, do a ps-ef and you will see the processes running).
For example, consider one process named the Yellow process: it needs a CPU, Ram, disk, and networking to run.
The machine we use and that supplies this hardware, coupled with the operating system, make our Yellow process run. In addition, the machine needs to make other processes running at the same time. In a nutshell, it allocates appropriately all the resources to get the processes together.
What if we have more machines?
What if these machines should run multiple replicas of each process? For example two replicas of the green process, four replicas of the yellow process, etc.
At this point, a single operating system for a single machine would no longer be enough, but we would need a way to orchestrate everything: something which defines what should happen and where to schedule it.
For example, an orchestrator takes charge of two instances of the green process, and schedules them according to policies which are similar to those belonging to the operating systems of machine A and machine B.
However, we know that things don't always go smoothly. For this reason, let's assume that the machine A breaks down.
Our orchestrator resumes those processes and reallocates them in the other machines: in a nutshell, here is a distributed operating system. The orchestrator reallocates the processes among the available resources dynamically and with precise rules.
What we have just described is what Kubernetes does, but at a very high level: it performs very complex operations that can be synthesized with the expression of a distributed operating system.
In this article we're going to see how to use Kubernetes and with which styles.
What Kubernetes consist of:
- Namespaces isolate a group of services at the RBac level for developers or operations: at the management level they can access only a group of isolated services within the namespace. Within the namespace we have one or more pods;
- A pod is the smallest unit managed by K8s. The pod is much more complex than a single process, and inside the pod we have numerous containers;
- Containers: inside the containers there is a single service;
- A single service runs within the system.
Inside the pod we avoid adding various containers with different duties. The pod has a very specific and precise duty within the business logic, but you can associate a sidecar, a smaller container that performs management operations.
So far we have shown the starting tools.
Kubernetes architecture best practices: different architectural styles
The CNCF, the worldwide foundation that deals with the development of cloud native technologies, has aggregated in its landscape many technologies that help Kubernetes to work better.
It can be chaotic to approach all these technologies. To better understand, you should follow the cardinal principles that we can identify, precisely, as architectural styles.
To fully understand architectural styles, let's take a concrete case.
In this example we have some channels: web App, home IoT, mobile app, smartwatch, connected car, etc. By channel we mean anything that collects information and interacts with an information system.
When using Kubernetes to manage the information system, we need to:
- Create the namespace;
- Create the pod;
- Create the container and our service.
As a first example application, we could create a monolith within Kubernetes: this would be the simplest thing to do.
In this case, Kubernetes might not even be the ideal solution, but it would help us to have a first interpretation.
It's like the first brushstroke on the canvas and the painting is finished.
So let's try to add complexity to our case.
A Kubernetes application architecture example: Single-page applications
The first architectural style that we will see in this article, and the most basic of our overview, is that of single-page applications.
Today, we talk a lot about single-page applications, which are applications that run in the browser interacting with the server and APIs.
A Gateway decouples this interaction: on the one hand the server part with the business logic of the app. On the other hand we have the static assets, HTML / CSS / JavaScript, which run within the web browser with their presentation and user interaction logics that call the APIs.
Each of the boxes outlined in orange is a different pod managed by Kubernetes. The dashes represent the namespace.
What we have just described could be a first monolithic application. However, if we wanted to scale this application, we would probably find within the server some very slow logics that should scale first to avoid overloading. At this point, it might be useful to start dividing the application into microservices.
So let's take our server application and try to divide it into microservices, each one with its own responsibility.
How do we make microservices communicate with each other?
Back-end for Front-end
We suggest to approach the style that Sam Newman has called Back-end for Front-end (BFF): the BFF takes care of exposing APIs that simplify the interaction with the user, while microservices take care of managing business logics, well aggregated in bounded context.
The BFF can coordinate the calls of many microservices and expose the information aggregated to the frontend channel.
Our static asset on the right of the picture remains unchanged:

What if we had multiple channels and different user interactions?
In this case, a single BFF with a single API may not be enough. It would be safer to have different BFFs (mobile, web, IoT) that expose different interactions based on the reference channel: the mobile BFF will display smaller data, or paginated, or otherwise made to fit inside the screen; the IoT BFF could be minimal, etc. The underlying microservices, on the other hand, will always continue working the same way.
This diagram could represent an application, at a business level, that is running on Kubernetes:
The complexity does not end there: within the same Kubernetes cluster there can be multiple namespaces with different applications. There can be multiple teams working with Kubernetes that see their application partition.
In these cases, to ensure proper governance, our advice is to put a layer on top of the applications: an API Gateway that intermediates and manages security, privacy, performance of the exposed APIs for everyone. This is a shared multi-tenant service to use for all applications.
To better understand , let's see the application in detail.
It is often thought that in this type of microservice application, everything must be asynchronous.
However, asynchronicity is not always correct because it can lead to complexities of management, debugging, troubleshooting, etc.
Therefore, what we recommend is to have an architecture like the one described above, which scales and works.
Saga Pattern
A system like the one just described may not be appropriate in all contexts: then it may be necessary to build a pub/sub system with a message broker.
Instead of having a point-to-point communication, where services need to know each other, we have services that publish messages, and other services that subscribe to these messages to perform operations.
For example, we could have a microservice that manages the products catalogue, another one in charge of the cart and a simple front-end app that sends the command “allocate this product”. When the product is allocated, the cart - which is subscribed - performs the action on its database.
The problem in this case is that the user, who holds the mobile phone and interacts with the frontend, waits for the product to be allocated: this is a synchronous type of communication, while http is a communication that typically sends a callback.
When we publish a message from the frontend saying “allocate the product”, we expect a response from the HTTP request ok / ko.
In this case we have an “ok, I sent the message” instead. But what really happened?
One of the options, perhaps the most complex, applicable in these cases, is to adopt a Saga Pattern. In this way, every time there is a user interaction with a distributed transaction, a new saga begins and is saved on a proprietary database. The saga orchestrates all messages among them.
At this point, we can make the asynchronous synchronous: for example in cases where I need to collect data from a card to make a payment.
And if the payment is not successful, the saga rolls back the product and the cart too.
This scenario shows an approximately complex application.
Integration with legacy systems
We are not always lucky enough to start from scratch, from the so-called green field. Sometimes it happens to start from operational legacy systems.
For example, it happens that we need to show paid / unpaid invoices referring to a pre-existing billing system on a new touchpoint.
One solution could be to roll out the legacy systems, expose the APIs on them, and call the APIs directly from the created microservice.
Hypothetically we could have 1 million users interacting with the new channel requesting the invoices, while we used to accept only a hundred of them on the previous application.
Rolling out the billing system should be avoided, and it is not even an easy solution to apply. The direct API path may be interrupted by many layers, may not be tracked, or may not be notorious.
In a nutshell, what we need is opening a B2C area in a system that was designed and used up to now for B2B only.
How can we avoid this?
A solution that can be approached within a distributed system such as Kubernetes is to start collecting events in real time from all legacy systems and capture them with a message broker. The message broker can also be used as a data stream: we take all the events and aggregate them into a database, which already has the aggregations of all the underlying systems and is updated in real time, as the events change.
In this way, if we want to see the invoices we no longer need to ask for them from the underlying systems, but we have them on the database, always updated.
A database of this type has the great advantage of being able to scale indefinitely, without creating problems and lightening the underlying systems.
A system like this one has far more reading than writing operations (for example 80/10): this is a great way to guarantee good performance.
Each event created is aggregated into a documentary NoSQL database with nested JSON schemas. A reader can scan these schemas thanks to a Rest or GraphQL, in a simpler way than they used to do it.
This mechanism protects us from rolling out our billing system.
The Canary Deploy
Now that we have our system up and running, and not rolled out, are we happy?
Not so much!
The real project doesn't end when it's released. Indeed, it starts right when it is in production. The true costs of an IT project are not related to its implementation, but to its maintenance and evolution over time.
Let's see it with an actual example.
We have a service with 10 million active users per day and 500,000 active users per hour, and we want to change the pricing system. The system is still in the testing phase and we want to prevent products at no cost from appearing in the catalog by mistake.
How can we manage the evolution of the service?
We can use sidecar pods (we mentioned them a few paragraphs above). Each pod acts as a proxy, and the product catalogue will no longer communicate directly with the pricing service but with the proxy, which in turn calls the pricing proxy, which calls the proxy of the cart, which calls the payment gateway proxy, etc.
In this way, we can convey information from one proxy to another and we can set a release for which 90% of requests go to the previous system and 10% go to the new one. This mechanism is called canary deploy, and it makes it much easier to increase or decrease the management of different versions on the same system.
The canary deploy is a salvation in many situations because it minimizes the unforeseen events that can arise with a new release. It can be applied to traffic percentages or to more complex logics as well, such as the user agent: we can route requests from iOS devices to the new system while Android ones continue to call the old one.
Now we have our app, and it's distributed. How do we make sure everything is working well?
What we recommend is to always put some control routes, such as health and readiness routes. The readiness route tells us when the pod is ready to receive traffic. Until it's ready, Kubernetes doesn't route traffic to that pod. The health route, on the other hand, communicates whether the pod is healthy and functioning properly.
As a matter of fact, the pod may be up and running but not working properly (unable to queue messages, etc). In this case, Kubernetes restarts the system on our behalf and everything starts up again - including the readiness route.
In addition to asking how it is going, we may also want to monitor what exactly our service is doing in production.
We can measure some relevant information at the business level: how many messages am I queuing? How many payments have I made? How many active users do I have right now?
These metrics can be hosted on a database on which we can build monitoring dashboards, and set alarms. For example, we could set an alarm that warns us when there are too many messages in the queue and there may be a slowdown.
When the system crashes, it is (relatively) easy to locate and solve the problem, while it is much more difficult to intervene when the system slows down and does not give signals. This is why it's important to act in advance.
One tool to do this could be Prometheus.
The Logs are another important aspect to constantly monitor. For each worker node, we can collect the logs of all the pods, put them in a database, view them on the dashboard and set alarms on the logs.
Within the logs we could insert a request id, a conversation tracking detached from our gateway and propagated with an http request on all services, which allows us to monitor all calls among all services.
There are several ways to observe the communication among services: just put a rec on the id on both logs and propagate them on all logs at the extra header level.
Conclusion
Finally, let's see a slightly more complex style that somehow includes them all.
Starting from our legacy management systems, which are common, we can create an ecosystem of microservices, each with its own clear and defined responsibility: the product catalog, the stock status, the product tracking and all the ones we need.
Each one performs an action which alone would be useless, but, when connected to other features, executes a business logic. Once these logics are exposed, they create the services of our application.
A further step could be adopting Fast Data and machine learning within our system.
In fact, if the legacy systems had problems, by adopting Fast Data, and observing all the connections among all the microservices with machine learning, we could identify repeated problems both in operations and business areas.
With this data we could think of strategies to further stimulate our end users to make the most of all the product features they are using.
What we have just described is an architecture within Kubernetes that can be built over time in an incremental and evolutionary way: an application architecture that evolves with the business.
© MIA s.r.l. All rights reserved