Home Java Are Docker, microservices and reactive programming always necessary?

Are Docker, microservices and reactive programming always necessary?

by admin

Are Docker, microservices and reactive programming always necessary?
Author : Denis Cyplakov , Solution Architect, DataArt
At DataArt I work in two areas. In the first, I help people fix systems that are broken in one way or another and for a variety of reasons. In the second direction I help to design new systems so that they won’t be broken in the future or, to put it more realistically, so that it will be harder to break them.
Unless you are doing something brand new, such as the world’s first Internet search engine or artificial intelligence to control the launch of nuclear missiles, it is quite easy to design a good system. It’s enough to consider all the requirements, look at the design of similar systems and do about the same without making gross mistakes. This sounds like an oversimplification of the question, but let’s remember that it’s 2019, and there are "typical recipes" for system design for almost everything. Businesses can throw up complex technical challenges – say, processing a million heterogeneous PDF files and pulling tables of cost data out of them – but the architecture of systems rarely has much originality. The main thing here is not to make a mistake in determining what kind of system we are building, and not to miss the choice of technology.
The last point regularly makes typical mistakes, some of which I will talk about in the article.
What is the difficulty of choosing a tech stack? Adding any technology to a project makes it more complex and brings some limitations. Accordingly, you should only add a new tool (framework, library) when that tool does more good than harm. When talking to team members about adding libraries and frameworks, I often use the following trick as a joke: "If you want to add a new dependency to the project, give the team a case of beer. If you don’t think this dependency is worth a case of beer, don’t add it."
Let’s say we create some application in, say, Java and we add the TimeMagus library to the project to manipulate dates (fictional example). The library is great, it gives us a lot of features which are absent in the standard class library. How can this solution be harmful? Let’s go through the possible scenarios point by point :

  1. Not all developers know the non-standard library, the threshold for new developers will be higher. The chance that a new developer will make a mistake when manipulating the date with a library he doesn’t know.
  2. Increases the size of the distribution. When the size of the average application on Spring Boot can easily grow to 100 MB, that’s not nothing at all. I’ve seen cases where a 30Mb library was pulled into the distribution for the sake of one method. The reasoning behind it was: "I used this library in the last project, and it has a convenient method".
  3. Depending on the library, startup time can increase noticeably.
  4. The developer of the library can abandon their child, then the library conflicts with a new version of Java, or a bug is detected in it (caused, for example, by changing time zones), and no patch will be released.
  5. The license of the library will conflict with the license of your product at some point (you check the licenses for all the products you use, right?).
  6. Jar hell – the TimeMagus library needs the latest version of the SuperCollections library, then in a few months you need to plug in a library to integrate with a third party API, which doesn’t work with the latest version of SuperCollections, it only works with version 2.x. You can’t plug in the API in any way and there is no other library to work with that API.

On the other hand, the standard library gives us enough tools to manipulate dates, and if you don’t need for example to maintain some exotic calendar or calculate the number of days from today to "the second day of the third new moon in the previous year of soaring eagle", you probably should refrain from using the third-party library. Even if it’s perfectly fine and on a project scale will save you as much as 50 lines of code.
The example we looked at is quite simple, and I think it’s not hard to make a decision here. But there are a number of technologies that are widespread, everyone has heard of, and their benefits are obvious, which makes the choice more difficult-they do provide serious advantages to the developer. But that shouldn’t always be a reason to pull them into your project. Let’s take a look at some of them.

Docker

Before the advent of this really cool technology, there were a lot of unpleasant and complicated issues with version conflicts and obscure dependencies when deploying systems. Docker allows you to package a snapshot of the system state, roll it out into production and run it there. This avoids these conflicts, which of course is great.
It used to do this in some monstrous way, and some tasks couldn’t be solved at all. For example, you have a PHP application that uses the ImageMagick library to handle images, also your application needs specific php.ini settings, and the application itself is hosted by Apache httpd. But there is a problem: some regular routines are implemented by running Python scripts from cron, and the library used by these scripts conflicts with the library versions used in your application. Docker allows you to pack your entire application along with the settings, libraries, and HTTP server into one container that serves requests on port 80, and routines into another container. Everything will work fine together, and you can forget about library conflicts.
Should I use Docker to package each application? My opinion : no, you shouldn’t. Here is a typical composition of a pre-packaged application deployed in AWS. The rectangles here indicate the isolation layers we have.
Are Docker, microservices and reactive programming always necessary?
The largest rectangle is the physical machine. Next is the operating system of the physical machine. Then the Amazon virtualizer, then the virtual machine OS, then the docker container, followed by the container OS, the JVM, then the Servlet container (if it is a web application), and already inside that is your application code. So we already see quite a few layers of isolation.
The situation looks even worse if we look at the acronym JVM. JVM is, oddly enough, Java Virtual Machine, which means, actually, that we always have at least one virtual machine in Java. Adding an extra Docker container here, firstly, often does not give that much of an advantage, because the JVM by itself already isolates us from the external environment quite well, and secondly, it does not cost much.
I took the numbers from an IBM study, if I’m not mistaken, from two years ago. Briefly, if we’re talking about disk operations, CPU usage, or memory access, Docker adds almost no overhead (literally fractions of a percent), but if we’re talking about network latency, the delays are quite noticeable. It’s not huge, but depending on what kind of application you have, it can surprise you.
Are Docker, microservices and reactive programming always necessary?
On top of that, Docker eats up extra disk space, takes up some memory, and adds start up time. All three points are not critical for most systems – there is usually plenty of disk space and memory. As a rule, startup time is not a critical problem either, the most important thing is that the application will start. Nevertheless, there are situations when memory may be insufficient, and the total start-up time of a system consisting of twenty dependent services is already quite long. In addition, it affects the cost of hosting. And if you are doing any high-frequency trading, Docker is definitely not suitable for you. In general, any application that is sensitive to network latency up to 250-500ms is better off not dockerized.
Also with the docker it becomes noticeably more difficult to sort out problems in network protocols, not only do delays grow, but all timings become different.

When is Docker really needed?

When we have different versions of the JRE, and it’s a good idea to carry the JRE with us. There are times when you need a specific version of Java to run (not "the latest Java 8", but something more specific). In this case it is a good idea to pack the JRE along with the application and run it as a container. In principle, it is clear that different versions of Java can be put on the target system at the expense of JAVA_HOME, etc. But Docker is noticeably more convenient in this sense, because you know the exact version of the JRE, everything is packaged together and with another JRE even by accident the application won’t run.
Docker is also necessary if you have dependencies on some binary libraries, e.g. for image processing. In this case it might be a good idea to pack all the required libraries together with the Java application itself.
The following case is about a system that is a complex composite of different services written in different languages. You have a piece in Node.js, a piece in Java, a library in Go, and some Machine Learning in Python. This whole zoo has to be long and carefully set up to teach its elements to see each other. Dependencies, paths, IP addresses – all of this needs to be painted and neatly brought up in production. Of course, in this case Docker will help you a lot. What’s more, doing it without it is simply excruciating.
Docker can provide some comfort when you have to specify many different parameters to start an application on the command line. On the other hand, bash scripts, often on a single line, do a great job. You decide what you want to use.
The last thing that comes to mind off the top of my head is a situation where you’re using, say, Kubernetes, and you need to do system orchestration, that is, bring up some number of different microservices that automatically scale according to certain rules.
In all other cases, Spring Boot turns out to be enough to pack everything into a single jar file. And, in principle, a Springboot jar is not a bad metaphor for a Docker container. They’re obviously not the same thing, but they’re really similar in terms of ease of deployment.

Kubernetes

What to do if we use Kubernetes? To begin with, this technology allows you to deplocate a large number of microservices on different machines, manage them, do autoscaling, etc. However, there are a lot of applications that allow you to manage orchestration, such as Puppet, CF engine, SaltStack and others. Kubernetes itself is certainly good, but it can add significant overhead that not every project is ready to live with.
My favorite tool is Ansible combined with Terraform where needed. Ansible is a fairly simple declarative lightweight tool. It requires no special agent installations and has a fairly self-explanatory configuration file syntax. If you’re familiar with Docker compose, you’ll immediately see the overlapping sections. And if you use Ansible, there is no need to dockerize – you can deploy systems with more classic means.
Clearly, they are different technologies anyway, but there are some set of tasks in which they are interchangeable. And a conscientious approach to design requires an analysis of which technology is better suited to the system being developed. And how it will be better suited to it in a few years.
If you have a small number of different services in your system and their configuration is relatively simple, for example, you only have one jar-file, and you don’t see any sudden, explosive growth of complexity, it might be worth doing with the classical deployment mechanisms.
This begs the question "wait, how is one jar file? The system should consist of as many atomic microservices as possible! Let’s break down who and what the system owes with microservices.

Microservices

First of all, microservices allow for more flexibility and scalability, allowing for flexible versioning of individual parts of the system. Let’s say we have some application which has been in production for many years. Its functionality is growing, but we cannot develop it endlessly in an extensive way. For example.
We have an application on Spring Boot 1 and Java 8. It’s a nice, stable combination. But it’s 2019 and, whether we like it or not, we need to move toward Spring Boot 2 and Java 12. Even a relatively easy transition of a large system to a new version of Spring Boot can be quite time-consuming, and I don’t even want to talk about jumping over the precipice from Java 8 to Java 12. So in theory it’s easy: migrate, fix the problems, test everything and start production. In practice it may mean a few months of work without bringing new functionality to the business. As you understand, a little migration to Java 12 won’t work either. This is where a microservice architecture can help us.
We can isolate some compact group of functions of our application into a separate service, migrate this group of functions to a new technical stack, and roll it out into production in a relatively short time. Repeat the process piece by piece until the old technology is exhausted.
Microservices also allow for fault isolation, where one dropped component doesn’t crash the whole system.
Microservices allow us to have a flexible technical stack, i.e. not to write everything monolithically in one language and one version, but to use a different technical stack for separate components when needed. Of course it is better when you use a homogeneous technical stack, but this is not always possible, in which case microservices can help.
Microservices also allow you to solve a number of managerial problems in a technical way. For example, when your large team consists of separate groups working in different companies (sitting in different time zones and speaking different languages). Microservices help isolate this organizational diversity by components that will develop separately. The problems of one part of the team will stay within one service, rather than sprawling throughout the application.
But microservices are not the only way to solve the listed problems. Oddly enough, a few decades ago people invented classes for half of them, and a little later, components and the Inversion of Control pattern.
If we look at Spring, we see that it’s actually a microservice architecture inside a Java process. We can declare a component that is essentially a service. We have the ability to do lookups through @Autowired, we have tools to manage the lifecycle of the component, and the ability to separately configure components from a dozen different sources. Basically, we get almost everything that we have with microservices – only inside a single process, which significantly reduces costs. A normal Java-class is the same API-contract, which in the same way allows you to isolate implementation details.
Strictly speaking, in the Java world microservices are most similar to OSGi – there we have almost an exact copy of everything in microservices, except that we can use different programming languages and run code on different servers. But even staying within the capabilities of Java classes, we have quite a powerful tool to solve a lot of isolation problems.
Even in a "manager" scenario with team isolation, we can create a separate repository that contains a separate Java module with a clear external contract and test suite. This will greatly reduce the opportunity for one team to inadvertently complicate the life of another team.
Are Docker, microservices and reactive programming always necessary?
I’ve heard repeatedly that there’s no way to isolate implementation details without microservices. But I can answer that the whole software industry is just about isolating the implementation. First subprograms were invented for this (in the 1950s), then functions, procedures, classes, even later microservices. But the fact that microservices were the last to appear in this series does not make them the highest point of development and does not oblige you and me to always resort to their help.
When using microservices, you should also take into account that the calls between them take some time. Often this is unimportant, but I have seen a case where the customer needed to fit the system response time into 3 seconds. This was a contractual obligation to connect to a third-party system. The call chain went through several dozen atomic microservices, and there was no way the overhead of making HTTP calls would shrink to 3 seconds. In general, you have to understand that any division of monolithic code into a certain number of services inevitably degrades the overall performance of the system. Simply because data cannot be teleported between processes and servers "for free".

When are microservices still needed?

When do you really need to break a monolithic application into multiple microservices? First, when there is unbalanced resource usage in functional areas.
For example, we have a group of API calls that perform calculations that require a lot of CPU time. And there is a group of API calls that run very fast, but require holding a cumbersome 64GB data structure in memory for execution. For the first group we need a group of machines with a total of 32 processors, for the second one (OK, let it be two machines for fault tolerance) with 64 GB of memory is enough. If we have a monolithic application, we will need 64 GB of memory on each machine, which increases the cost of each machine. If these functions are split into two separate services, however, we can save resources by optimizing the server for a particular function. A server configuration might look like this :
Are Docker, microservices and reactive programming always necessary?
We also need microservices if we need to seriously scale some narrow functional area. For example, a hundred API-methods are called 10 times per second, while, say, four API-methods are called 10 thousand times per second. It is often not necessary to scale the whole system, i.e. we can, of course, reproduce all 100 methods on many servers, but it is usually much more expensive and complicated than scaling a narrow group of methods. We can separate these four calls into a separate service and scale only it to a large number of servers.
It’s also clear that we may need a microservice if we have a separate functional area written in Python, for example. Because some library (say, for Machine Learning) happens to be available only in Python, and we want to isolate it into a separate service. It also makes sense to make a microservice if some part of the system is prone to failures. It is good to write code in such a way that there will be no failures at all, but the reasons may be external. No one is safe from his own mistakes either. In this case the bug can be isolated inside a separate process.
If your application doesn’t do any of the above and is not expected to do so in the foreseeable future, a monolithic application is probably best suited for you. The only thing I recommend is to write it in such a way that unlinked functional areas do not depend on each other in the code. So that unrelated functional areas can be separated from each other when necessary. However, this is always a good recommendation, following which increases internal consistency and teaches you to formulate module contracts carefully.

Reactive architecture and reactive programming

The reactive approach is a relatively new thing. The moment of its emergence can be considered 2014, when the The Reactive Manifesto. Within two years of the manifesto’s publication, it was already on everyone’s radar. It really is a revolutionary approach to system design. Its individual elements were used decades ago, but all the principles of the reactive approach together, as laid out in the manifesto, have allowed the industry to take a major step forward toward designing more reliable and higher performance systems.
Are Docker, microservices and reactive programming always necessary?
Unfortunately, the reactive approach to design is often confused with reactive programming. When asked why a project should use a reactive library, I’ve heard the answer: "That’s the reactive approach, haven’t you read the reactive manifesto?!" I have read and signed the manifesto, but the trouble is that reactive programming has nothing to do with the reactive approach to system design, except that both have the word "reactive" in their names. You can easily make a reactive system using a 100% traditional set of tools, and create a completely non-reactive system using the latest in functional programming.
The reactive approach to system design is a fairly general principle applicable to a great many systems – it definitely deserves its own article. Here I would like to talk about the applicability of reactive programming.
What is the essence of reactive programming? First, let’s look at how an ordinary non-reactive program works.
The thread executes some code doing some calculations. Then comes the need to do some input/output operation, such as an HTTP request. The code sends a packet over the network, and the thread is blocked waiting for a response. The context is switched and another thread starts executing on the processor. When a response arrives on the network, the context switches again, and the first thread continues execution, processing the response.
How would the same code snippet work in a reactive style? The thread executes calculations, sends an HTTP request and instead of blocking and processing the result synchronously, it describes code (leaves a callback) which should be executed as a reaction (hence the word reactive) to the result. After that the thread continues working, doing some other calculations (maybe just processing the results of other HTTP requests) without switching context.
The main advantage here is that there is no context switching. Depending on the system architecture, this operation can take several thousand clock cycles. I.e. for a 3 Ghz processor, context switching will take at least a microsecond, in fact, due to cache invalidation, etc., it is more likely to take a few tens of microseconds. Practically speaking, for an average Java application processing a lot of short HTTP requests – the performance gain could be 5-10%. Not to say that it’s crucially much, but let’s say if you rent 100 servers at $50/month each – you can save $500 a month on hosting. Not supermuch, but enough to get your team drunk on beer a few times.
So, onward to beer? Let’s look at the situation in detail.
A program in the classical imperative style is much easier to read, understand, and as a consequence, debug and modify. In principle, a well-written reactive program also looks clear enough, the problem is that writing a good, understandable not only to the author of the code here and now, but also to another person in a year and a half, reactive program is much harder. But this is a rather weak argument; I have no doubt that for readers of this article writing simple and understandable reactive code is not a problem. Let’s look at other aspects of reactive programming.
Not all I/O operations support non-blocking calls. For example, JDBC currently does not support it (see ADA and R2DBC for work in this direction). ADA, R2DBC, but it has not reached release level yet). Since 90% of all applications now go to databases, using a reactive framework automatically turns from an advantage to a disadvantage. The solution to this situation is to handle HTTP calls in one thread pool and database calls in another thread pool. But it makes the process much more complicated, and I wouldn’t do it if it wasn’t absolutely necessary.
Are Docker, microservices and reactive programming always necessary?

When should you use a reactive framework?

Use a framework that allows reactive handling of requests when there are a lot of requests (several hundred per second or more) and very few processor cycles are required to process each one. The simplest example is proxying requests or balancing requests between services, or some rather lightweight handling of responses coming from another service. Where by service we mean something that can be requested asynchronously, e.g. via HTTP.
But if you need to lock the thread while waiting for a response, or if processing requests takes a relatively long time, e.g. you need to convert an image from one format to another, it might not be a good idea to write a reactive style program.
Also, don’t unnecessarily write complex, multi-step data processing algorithms in reactive style. For example, the task "find files with specific properties in a directory and all its subdirectories, convert their contents and send them to another service" can be implemented as a set of asynchronous calls, but depending on the details of the task, such an implementation can look completely opaque and still not give any noticeable advantages over the classical sequential algorithm. For example, if this operation has to be run once a day and it doesn’t make much difference if it takes 10 or 11 minutes, it might be a good idea to choose a simpler implementation rather than the best one.

Conclusion

In conclusion, I would like to say that any technology is always designed to solve specific problems. And if you don’t have these tasks in the foreseeable future when designing a system, you probably don’t need this technology here and now, no matter how great it is.

You may also like