Home .NET The architecture of a high-load system Diadoc

The architecture of a high-load system Diadoc

by admin

Those who are interested in high-load systems have read about Twitter, Facebook and other architectures. But never before have there been publications about systems of such a class as Diadoc. Unlike Twitter, this system is not free and available to everyone and contains a rather large layer of business logic designed to solve problems from a specific subject area.
A few words about the system: what it is for. To make it clear at once what it is, imagine a web interface for mail, but it is not really mail, or rather, not mail at all. This system is designed to exchange documents. The main documents are invoices and delivery notes. At the same time electronic documents are legally significant, have the same validity as paper documents with stamps and signatures.
The exchange of electronic documents in Russia is just beginning to develop, and in the not-so-distant future, it is likely that all invoices will be transmitted electronically. Every year, 12 billion invoices are created in Russia. That’s an average of 380 documents per second, and at peak load, thousands of documents per second. Any project that aims to provide services for the exchange of electronic documents should expect such volumes and create the appropriate architecture.
More about Diadoc from a business and accounting perspective can be found at the Diadoc website , and here are the technical details to follow.

Platform

OS : Windows, Linux
Language : C#, .Net 4.0
Message Queue : RabbitMQ
Data Stores : Cassandra, MySQL, Berkeley DB, Kanso(proprietary)
Protocols : Thrift, Protocol Buffers
In-memory caching : Redis
MVC: ASP.NET MVC, Razor (admin area only)
Load Balancing : Nginx.

Architecture

The system is service-oriented (SOA). The main data format for interaction is Protocol Buffers from Google, which allows efficient data exchange between services. The communication protocol is HTTP. However, we don’t use IIS to publish services, but our own implementation of an HTTP handler. IIS is used only for the web-interface of the system.
The deployment schema contains a list of exe files that are generally in the system and determines which services will run on which server when they are posted to the workspace. If any component needs to connect to any service, a random selection is made from the working replicas of the service and a connection is made.

Cassandra

Cassandra is mostly used for logging due to its high data writing speed, but lately it’s been used for other purposes as well, e.g. if you want to store key-value persistently. That’s not to say it’s a perfect key-value store, but we’ve learned to work with it. The Thrift protocol mentioned above is used to communicate with Cassandra. Thrift is a Protocol Buffers analog developed by Facebook, now under the tutelage of the Apache Software Foundation.

Kanso

Proprietary fault tolerant and distributed data storage. Its functionality is somewhat similar to that of a file system, but with a severe limitation: you can only write to the end of a file. What is already written cannot be changed. This limitation increases the amount of data, but ensures that no data is lost.

MySQL

Used only for storing data that does not require frequent changes. No sharding is used for MySQL, all changes happen through one server, and there are multiple replicas for reading data.

RabbitMQ

This message service has proven to be quite good and is used for asynchronous event handling. Messages have a limited retention time and are removed from the queue after a few days. Here we pass structures based on Protocol Buffers, just like in http services.

Data Caching

Redis is used to cache data and find information quickly, as well as a whole group of .net services that read data from Kanso and write to their local Berkeley DB on startup.

Integration

Protocol Buffers are also used for the public API, but there is an option to interact via OLE Automation. Many large companies face problems with integration automation, and Diadoc developers help integrate the project with other systems. Very often it’s impossible to upload data from external systems in XML or other machine-readable format, and we have to convert data from printed forms (PDF) to our format.
For more information about integration, see:
https://diadoc.kontur.ru/sdk/IntegrationOptions.html
https://diadoc.kontur.ru/sdk/

Design Principles

  • Pair programming is very common.
  • Mandatory Code Review.
  • Two-week iteration scheduling.
  • A daily stand-up meeting of the entire team about the current state of affairs.
  • Transparency of project status information from both a marketing and development perspective.

Development tools

  • Visual Studio 2012
  • Resharper
  • TeamCity for Continuous Integration
  • YouTrack as issue tracker

Statistics

Number of programmers : 24
Number of servers : ~40
Average Document Delivery Time : 7 sec
Registered Organizations : ~160, 000

You may also like