What is a network service? It’s a program that takes incoming requests over the network and processes them, possibly returning responses.
There are many aspects in which network services differ. In this article, I focus on the way incoming requests are handled.
Choosing how to handle requests has far-reaching consequences. How to make a chat service that can handle 100, 000 simultaneous connections? Which approach to choose to extract data from a stream of loosely structured files? The wrong choice will lead to a waste of effort and time.
This article discusses such approaches as process/thread pooling, event-driven processing, half sync/half async pattern and many others. Numerous examples are given, the pros and cons of the approaches, their features and applications are considered.
The topic of how queries are handled is not new, see for example : one , two However, most articles address it only partially. This article seeks to fill in the gaps and provide a coherent account of the issue.
The following approaches will be considered :
- serial processing
- query process
- process on request
- process/thread pool
- event-driven processing (reactor pattern)
- half sync/half async pattern
- conveyor processing
Note that the service that handles queries is not necessarily a network service. It can be a service that gets new tasks from the database or a task queue. This article is referring specifically to network services, but it should be understood that the approaches under consideration have a broader scope.
A list with brief descriptions of each approach is included at the end of the article.
The application consists of a single thread in a single process. All requests are processed only sequentially. There is no parallelism. If several requests come to a service at the same time, one of them is processed, the rest go to the queue.
The advantage of this approach is the simplicity of implementation. There is no blocking and no competition for resources. The obvious disadvantage is the inability to scale with a large number of clients.
Process to request
The application consists of a main process, which accepts incoming requests, and workflows. For each new request, the main process creates a workflow that processes the request. The scaling by number of requests is simple:each request gets its own process.
There is nothing complicated about this architecture either, but it has
problems restrictions :
- Process consumes a lot of resources.
Try creating 10, 000 simultaneous connections to the PostgreSQL RDBMS and see the result.
- Processes have no shared memory (by default). If you need access to shared data or a shared cache, you have to map the shared memory (linux call mmap, munmap) or use external storage (memcahed, redis)
These problems are by no means stop problems. The following will show how they are handled in PostgeSQL RDBMS.
Pros of this architecture :
- The failure of one of the processes will not affect the rest. For example, an error handling a rare case won’t kill the whole application, only the request being handled will be affected
- Access rights differentiation at operating system level. Since the process is an entity of the operating system, you can use its standard mechanisms to delimit access rights to operating system resources
- It is possible to change the launched process on the fly. For example, if you use a separate script to process the request, you only need to change the script to change the processing algorithm. Below we will consider the following example
- Multi-core machines are used efficiently
- PostgreSQL RDBMS creates a new process for each new connection. Shared memory is used to handle shared data. The problem of high resource consumption by PostgreSQL processes can be solved in different ways. If there are few clients (a dedicated analytics bench), there is no such problem. If there is only one application that accesses the database, you can create a pool of connections to the database at the application level. If there are many applications, you can use pgbouncer
- sshd listens for incoming requests on port 22 and forks on every connection. Each ssh connection is a fork of the sshd daemon that receives and executes user commands sequentially. Because of this architecture, the resources of the OS itself are used to differentiate access rights
- An example from my own practice. There is a stream of unstructured files from which we need to get metadata. The main process of the service distributes files to process handlers. Each process-worker is a script accepting path to a file as a parameter. The file is processed in a separate process, so the whole service does not crash because of a processing error. To update the processing algorithm, just change the processing scripts without stopping the service.
In general, it must be said that this approach has its advantages, which determine its scope, but the scalability is very limited.
This approach is much like the previous one. The difference is that instead of processes, threads are used. This allows to use shared memory "out-of-the-box". But the other advantages of the previous approach are no longer available, while the resource consumption is still high.
- Total memory "out of the box"
- Ease of implementation
- Efficient use of multicore CPUs
- A thread consumes a lot of resources. On unix-like operating systems, a thread consumes almost as many resources as a process
As an example of how to use MySQL. But it should be noted that MySQL uses a mixed approach, so this example will be discussed in the next section.
Pool of processes/threads
Threads (processes) are expensive and time consuming to create. To avoid wasting resources, you can use the same thread many times. By additionally limiting the maximum number of threads, we get a pool of threads (processes). Now the main thread accepts incoming requests and queues them. The worker processes take requests from the queue and process them. This approach can be seen as a natural scaling of sequential processing of requests: each worker thread can process threads only sequentially, combining them into a pool allows to process requests in parallel. If each thread can handle 1000 rps, then 5 threads will handle a load close to 5000 rps (assuming minimal competition for shared resources).
The pool can be created in advance at the start of the service or formed gradually. Using a thread pool is more common because it allows for shared memory.
The size of the thread pool does not have to be limited. The service can use free threads from the pool, or create a new thread if there are none. After a request has been processed, the thread joins the pool and waits for the next request. This variant is a combination of the thread-per-request and thread pooling approaches. An example will be given below.
- Multiple CPU cores
- Cost reduction for thread/process creation
- Limited scalability by the number of concurrent clients. Using a pool allows us to reuse the same thread multiple times without additional resource consumption, but it does not solve the fundamental problem of a large number of resources consumed by the thread/process. Creating a chat service that can handle 100, 000 simultaneous connections with this approach is impossible.
- Scalability is limited by shared resources, for example if threads use shared memory, regulating access to it with semaphores/mutexes. This is a limitation of all approaches that use shared resources.
- A Python application running with uWSGI and nginx. The main uWSGI process receives incoming requests from nginx and distributes them among the Python interpreter processes that process the requests. The application can be written in any uWSGI-compatible framework – Django, Flask, etc.
- MySQL uses a thread pool : every new connection is handled by one of the free threads from the pool. If there are no free threads, MySQL creates a new thread. The size of the free thread pool and the maximum number of threads (connections) are limited by the settings.
This is probably one of the most common approaches to building network services, if not the most common. It allows for good scalability, achieving large rps. The main limitation of the approach is the number of network connections handled simultaneously. In fact, this approach only works well if requests are short or there are few clients.
Event-oriented processing (reactor pattern)
The two paradigms, synchronous and asynchronous, are eternal competitors for each other. So far we have only talked about synchronous approaches, but it would be wrong to ignore the asynchronous approach. Event-driven or reactive request processing is an approach in which each IO operation is performed asynchronously and a handler is called at the end of the operation. Typically, processing each request consists of multiple asynchronous calls followed by handler execution. At any given moment the single-threaded application executes only one handler, but the execution of handlers for different requests alternates with each other, allowing simultaneous (pseudo-parallel) processing of many parallel requests.
A full discussion of this approach is beyond the scope of this article. For a more in-depth look, we can recommend Reactor. , What is the secret of NodeJS speed? , Inside NGINX Here we will limit ourselves to considering the pros and cons of this approach.
- Efficient scaling by rps and number of concurrent connections. Reactive service can handle a large number of connections (tens of thousands) at a time if most connections are waiting for I/O to complete
- Development complexity. Programming in asynchronous style is more difficult than in synchronous. The logic of query processing is more complex, debugging is also more difficult than in synchronous code.
- Errors that lead to blocking the whole service.If the language or runtime is not originally designed for asynchronous processing, a single synchronous operation can lock the entire service, nullifying the scalability capabilities.
- It is difficult to scale across CPU cores. This approach assumes a single thread in a single process, so multiple CPU cores cannot be used simultaneously. It should be noted that there are ways around this limitation.
- A consequence of the previous point: this approach does not scale well for CPU-intensive queries. The number of rps for this approach is inversely proportional to the number of CPU operations required to process each query. CPU-intensive queries negate the benefits of this approach.
- Node.js uses the reactor pattern "out-of-the-box". For more, see What’s the secret to NodeJS speed?
- nginx: nginx worker processes use a parallel request processing pattern. See Inside NGINXfor details.
- C/C++ program directly using OS tools (epoll on linux, IOCP on windows, kqueue on FreeBSD), or using a framework (libev, libevent, libuv, etc.).
Half sync/half async
Title taken from book POSA: Patterns for Concurrent and Networked Objects The original definition of this pattern is very broad, but for the purposes of this article I will understand the pattern a little bit narrower. Half sync/half async is an approach to query processing in which a lightweight control thread (green thread) is used for each query. The program consists of one or more operating system level threads, but the program execution system supports green threads that the OS can’t see and can’t control.
Several examples to make the consideration more specific :
- Go language service. The Go language supports many lightweight execution threads, goroutines. The program uses one or more OS threads, but the programmer operates with goroutines, which are transparently distributed between OS threads to utilize multi-core CPUs
- Python service with the gevent library. The gevent library allows the programmer to use green threads at the library level. The whole program is executed in a single OS thread.
In essence, this approach is designed to combine the high performance of the asynchronous approach with the ease of programming synchronous code.
With this approach, despite the illusion of synchronicity, the program will run asynchronously: the runtime system will control the event loop, and each "synchronous" operation will actually be asynchronous. When such an operation is invoked, the runtime system will invoke the asynchronous operation with the OS facilities and register the handler of the operation completion. When the asynchronous operation completes, the runtime system will call the previously registered handler, which will continue executing the program at the point where the "synchronous" operation was called.
As a result, the half sync/half async approach contains both some pros and some cons of the async approach. The length of the article does not allow us to consider this approach in all its details. If you are interested I suggest to read the chapter with the same name in the book POSA: Patterns for Concurrent and Networked Objects
The half sync/half async approach itself introduces a new entity "green thread" a lightweight control thread at the runtime system or library level. What to do with green threads is the programmer’s choice. He can use a pool of green threads, he can create a new green thread for each new request. The difference compared to OS threads/processes is that green threads are much cheaper: they consume much less memory and are created much faster. This allows you to create a huge number of green threads, e.g. hundreds of thousands in Go. Such a huge number makes the use of the approach "green thread per query" justified.
- Scales well in terms of rps and number of simultaneous connections
- Code is easier to write and debug compared to the asynchronous approach
- Since the execution of operations is actually asynchronous, programming errors are possible when a single synchronous operation blocks the whole process. This is especially felt in languages where this approach is implemented with library tools, such as Python.
- Program opacity. When using OS threads or processes, the program execution algorithm is clear: each thread/process performs operations in the sequence in which they are written in the code. With the half sync/half async approach, operations that are written in sequence in the code can unpredictably alternate with operations that handle parallel requests.
- Unsuitable for real-time systems. Asynchronous query processing makes it much more difficult to guarantee the processing time of each individual query. This is a consequence of the previous point.
Depending on the implementation, this approach scales well across CPU cores (Golang) or not at all (Python).
This approach, like the asynchronous approach, can handle a large number of simultaneous connections. But it is easier to program the service using this approach because the code is written in a synchronous style.
As you can guess from the name, in this approach requests are processed in a pipeline. The processing process consists of several OS threads lined up in a chain. Each thread is a link in the chain and performs a certain subset of operations needed to process a request. Each request sequentially passes through all the links in the chain, and different links process different requests at any given time.
- This approach scales well on rps. The more links in the chain, the more requests are processed per second.
- Using more threads allows to scale well across CPU cores.
- This approach will not work for all categories of queries. For example, organizing long polling with this approach will be difficult and inconvenient.
- Complexity of implementation and debugging. Breaking down sequential processing into stages so that performance is high can be difficult. Debugging a program in which each request is alternately processed in multiple threads running in parallel is more difficult than sequential processing.
- An interesting example of conveyorized processing was described in the highload 2018 report Evolution of the architecture of the Moscow Exchange trading and clearing system
Pipeline processing is widely used, but most often the links are individual components in independent processes that exchange messages, such as through a message queue or database.
A brief summary of the approaches considered :
- Synchronous processing.
Simple approach, but severely limited in scalability, both in terms of rps and number of concurrent connections. Does not allow to use several CPU cores simultaneously.
- New process for each request.
High cost of creating processes. Approach does not scale efficiently by number of concurrent connections, has difficulties with shared memory usage. Quite suitable for long requests with a small number of concurrent connections. It has useful properties for some applications (higher reliability, differentiation of access at operating system level).
- A new thread for each request.
The problems are the same as the previous approach, but allows easy use of shared memory. It has a similar scope to the previous approach, but lacks some of its useful features.
- A pool of processes/streams.
Compared to the previous two approaches, avoids the cost of creating processes/threads. The most commonly used approach for building network services. Scales well in terms of rps and number of cores used. Good for processing a large number of short requests. Scales poorly in terms of the number of concurrent connections.
- Event-driven processing (reactor pattern).
Scales well in terms of rps and number of concurrent connections. More difficult to use because of asynchronous programming style, possible hard to catch floating errors. Scaling by number of CPU cores used is difficult
- Half sync/half async.
Scales well by rps and number of concurrent connections. Depending on implementation, scales well across CPU cores (Golang) or not at all (Python). Spends significantly less resources per request than the process (thread) approach per request. Programmed in a synchronous style unlike the reactor pattern, but the same floating errors as the reactor pattern are possible.
- Conveyor Processing.
Allows high performance, but is a difficult approach to implement. Not suitable for all types of queries (e.g. long polling is hard to do).
The list above is not exhaustive, but it contains basic approaches to query processing.
I turn to the reader: what approaches do you use? What pros and cons, peculiarities of their work have you learned from your own experience?
- Articles on the subject :
- Again on the architecture of network daemons
- Parallelism vs. multithreading vs. asynchronous programming : clarification
- Event-driven approach :
- What is the secret of NodeJS speed?
- Inside NGINX
- Comparison of flow-based and event-based approaches :
- Apache vs Nginx: a practical look
- Differences between asynchronous and multithreaded architectures using Node.js and PHP as examples
- Half sync/half async:
- Half-Sync/Half-Async (Java Design Patterns)
- POSA: Patterns for Concurrent and Networked Objects
- Green streams :
- Green threads (Wikipedia)
- Green Vs Native Threads
- Conveyor Processing :
- Evolution of Moscow Exchange trading and clearing system architecture