Once again we publish the conference transcripts HighLoad++ which was held in Skolkovo near Moscow on November 7-8, 2016.Today, Evgeny Piven introduces us to cloud balancing solutions.
My name is Zhenya, I work in the company IPONWEB.Today we will talk about the development of our solutions in the balancing of high-load systems.
First, I’ll run through the concepts I’ll be using. Let’s start with what we do: RTB, Real Time Bidding – real-time auction advertising. A very simplified scheme of what happens when you go to a site :
In order to show ads, a request goes to the RTB server, which asks the ad servers for their rates and then decides which ads to show you.
Features of IPONWEB
We have all of our infrastructure in the cloud. We use Amazon and GCE very heavily, and we have several thousand servers. The main reason we live completely in the cloud is scalability, meaning we really have to add/remove instances often, sometimes a lot.
We have millions of requests per second, all of these HTTP requests. This may not be fully applicable to the other protocols. We have very short responses. Maybe not very, I think, on average one to a few kilobytes responses. We’re not operating on any large amounts of information, just a very large amount of it.
Caching is not relevant to us. We don’t support CDNs. If our customers need CDNs, they handle those solutions themselves. We have very strong diurnal and event fluctuations. They show up during holidays, sporting events, some seasonal discounts, etc. You can get a good look at the per diem on this chart :
The red graph is the normal standard graph in a European country, and the blue graph is Japan. In this graph we can see that every day at about twelve o’clock we have a spike, and at about one o’clock the traffic drops off dramatically. It has to do with people leaving for lunch, and as very decent Japanese citizens, they use the Internet extensively the most at lunch. You can see this very well in this graph.
We have a large spread of users around the world. This is the second big reason why we use the cloud. It’s very important for us to respond quickly, and if the servers are in some other regions from the users, it’s often not acceptable for RTB realities.
And we have a clear distinction between server traffic and user traffic. Let’s go back to the diagram I showed in the first slide. For simplicity, everything that comes from the user to the primary RTB server is user traffic, everything that goes behind it is server traffic.
The two main reasons are. scalability and service availability.
Scalability is when we add new servers, when we grow from one server that can no longer handle, to at least two, and we have to spread requests between them somehow. Availability is when we are afraid that something will happen to this one server. Again, we need to add a second one to somehow balance it all out between them and distribute requests only to those servers that can respond to them.
What else is often required of balancers? These features, of course, may be different for the purposes of each application. For us, SSL Offload is most relevant. How it works is shown here.
From the user to the balancer comes traffic that is encrypted. The balancer decrypts it, distributes the already decrypted HTTP traffic to the backends. Then the balancer encrypts it back and gives it to the user again in encrypted form.
Another thing that’s very important to us is Sticky-balancing, often called session affinity.
Why is this relevant to us? When we see multiple ad slots on a page, we want all requests to come straight to one backend when that page is opened. Why is this important? There is such a feature as roadblock. It means, that if we show a Pepsi banner in one slot, we can’t show the same Pepsi or Coca-Cola banner in another, because it conflicts with other banners.
We need to understand which banners we are currently showing the user. When balancing, we need to make sure that the user comes to one backend. On that backend, we have some sort of session created, and we understand which ads can be shown to that user.
We also have Fallback. In the example above you can see the banner where it doesn’t work, and on the right you can see the banner where it does work.
Fallback is a situation when for some reason the backend fails, and in order not to break the page to the user, we give him usually a completely empty pixel. Here I’ve drawn it with a big green rectangle for clarity, but in reality it’s usually just a little gif, a two-hundred HTTP Response and the right set of headers so that we don’t have anything broken in the layout.
This is what normal balancing looks like.
The blue bold line is the sum of all the requests. These little lines, lots and lots of them, are the requests for each instance. As we can see, the balancing is quite good here: almost all of them go almost the same way, merging into one line.
This is the balancing act of the smoker. Something went wrong here.
The problem is on Amazon’s side. By the way, it happened quite recently, just two weeks ago. From Amazon balancers traffic started to come in like this.
It’s worth saying that metrics are a good thing. This is where Amazon still doesn’t believe we’re doing anything wrong. They only see this overall graph here, which only shows the amount of requests, but they don’t see how many requests are coming in by instance. Still fighting with them, trying to prove to them that something is going wrong with their balancer.
So, I will tell you about balancing on the example of one of our projects. Let’s start from the very beginning. The project was young, we were around the beginning of this project.
Most of the traffic in this project is server traffic. The peculiarity of working with server traffic : we can easily solve some problems. If we have specifics with one of our clients, we can somehow agree with them to change something on their end, somehow upgrade their systems, do something else to make it work better for us with them. We can separate them into a separate pool: we can just take one client who has some problem, bind it to another pool, and solve the problem locally. Or, in really bad cases, we can even ban him.
The first thing we started using was the usual DNS balancing.
Using Round-robin DNS pools.
Every time a DNS request is made, the pool is rotated and a new IP address appears on top. This is how the balancing works.
The problems of the regular Round-Robin DNS:
- It doesn’t have any status checks. We can’t figure out if something is wrong with the backend and stop sending requests to it.
- We have no understanding of client geolocation.
- When requests come from a fairly small number of IPs, which is true for server traffic, the balancing may not be very ideal.
The help comes in the form of gdnsd – is a DNS server which many people probably know, and which we still use a lot.
- The main gdnsd feature we use is DYNA records These are records that give out either a single A-record or a set of A-records dynamically for each request, using some plugin. There inside they can use Round-robin.
- gdnsd can support databases geoIP
- He has status check It can send some requests to the host via TCP, via HTTP to look at Response and drop from the pool those servers which are not used, in which there is some problem at the moment.
To maintain the dynamism of these entries, we need to keep the TTL quite low. This greatly increases the traffic to our DNS servers: quite often clients have to re-request these pools, so we have to have more DNS servers accordingly.
After a period of time, we encounter the 512 bytes problem.
The 512 bytes problem is a problem with almost all DNS servers. Initially, when DNS was first designed, the maximum MTU for modems was 576 bytes. That’s 512 bytes + 64 header lengths. Packets from DNS historically do not send more than 576 bytes over UDP. If we have a pool longer than 512 bytes, we send only part of the pool, we include the truncated flag in it. Then the client sends us a TCP request asking again if it’s a pool. Then we send him the whole pool via TCP.
Only a fraction of our customers had this problem, about 15%. We were able to pool them separately and use weighted pools in gdnsd for them.
The bonus of weighted pools in this case is that they can be partitioned. If we have, say, 100 servers, we break them up into 5 parts. We give each request one of these small sub-pools with only 20 servers. And Round-robin goes through these little pools, each time it gives us a new pool. Within the pool itself, Round-Robin is also used: it shuffles those IPs and gives out a new one each time.
You can use gdnsd weights for staging servers, for example. If you have a weaker instance, you can initially send a lot less traffic to it and check if something is broken there only by sending a fairly small set of traffic to it. Or if you have different types of instances, or if you use different servers. (I often say "instances" because we have everything in the cloud, but that may not be true for your particular case.) So you can use different types of servers and use gdnsd to send more or less traffic to them.
Here we also have a problem – DNS caching. Often when there is a request for this pool, we give only a small pool, and this pool is cached. With this pool, we keep some client alive without having to re-request our DNS. This happens when the DNS client misbehaves, doesn’t adhere to the TTL, and only works with a small limited set of IP addresses without updating it. If it originally got a full list via TCP, that’s fine. But if it only got a small pool that is weighted, that could be a problem.
After a while we encounter a new problem.
Those remaining 85% of the server traffic are still using regular multifopools, as gdnsd calls it.There are starting to be problems with some of them.
We realized that the problem only occurs with Amazon DNS. That is, for those of our customers who host themselves on Amazon, when resolving a pool that has more than 253 hosts, they just get an NXDOMAIN error, and they don’t resolve that whole pool completely.
This happened when we added about 20 hosts and we had 270 hosts. We localized the number to 253, we realized that it was getting problematic at that number. That problem has now been fixed. But at that point we realized that we were stomping around, and we had to somehow solve this problem further.
Since we are in the clouds, the first thing we thought of was to try vertical scaling. It worked, so we reduced the number of instances accordingly. But again, this is a temporary solution to the problem.
We decided to try something else, so we chose the ELB
ELB is Elastic Load Balancing, a solution from Amazon that balances traffic. How does it work?
They give you the CNAME, in this case it’s this scary line under
www.site.com : elb, numbers, region, and so on. And this CNAME resolves to several internal instances’ ipis that balance to our backends. Then we only need to bind their CNAME in our DNS to our pool once. Then we already add to the group the servers to which the balancers are spreading.
ELB can do SSL Offload and you can pin certificate to it. Also it can do HTTP status checks to see if instances are alive or not.
We started running into problems with ELBs almost immediately. There is something called ELB balancer warmup. You need this when you want to let in more than 20-30 thousand requests per second. Before you move all your traffic to ELBs, you have to write a letter to Amazon and say we want to let in a lot of traffic. They send you an email with a bunch of scary questions about the characteristics of your traffic, how much, when, how long you’re going to keep it all up. Then they add new instances to their pool and you’re ready for an influx of traffic.
And even with the pre-warmup, we ran into a problem. When we asked them for 40, 000 requests per second, at about 30, 000 they broke down. We had to roll the whole thing back quickly.
They also have balancing by response rate. This is an algorithm of Amazon’s balancer. It looks at how fast your backends are responding. If it sees it is fast, it sends more traffic there.
What’s the problem here? If your backend is desperately pentisoting [giving an HTTP 5XX status code, indicating a server error] and fails, the balancer thinks that the backend is sending back responses very quickly, and starts sending it even more traffic, bending your backend even more. In our reality this is even more problematic, because, as I told you before, we usually send the 200th response, even if things are bad. The user doesn’t have to see the error; we just send an empty pixel. So this problem is even harder for us to solve.
At the last Amazon conference they told me that if you have something bad going on, wrapping some 100-200ms timeouts in the exception and artificially slowing down the 500 responses, so the Amazon balancer realizes that your backend isn’t doing a good job. But in general, it’s a good idea to do proper status checks. Then your backend would know there’s a problem and give status checks to check the problem, and it would just get kicked out of the pool.
Amazon now has a new solution : Application Load Balancer (ALB) It’s a pretty interesting solution but it doesn’t really matter for us because it doesn’t solve anything for us and it will probably cost a lot more. Their system with hosts has gotten more complicated.
But ALB supports Path-based routing: this means that if you have, for example, a user coming to
/video then you can redirect the request to one set of instances, if on
/static then to another, and so on.
There is support for WebSocket, HTTP/2 and containers. If you have Docker inside one instance, it can distribute between them.
At Google we use GLB It’s quite an interesting solution. Compared to Amazon, it has many advantages.
The first is that we only have 1 IP. When you create a google balancer, you are given a single IP that you can bind to your site. That means you can even bind it to a bare domain. The CNAME, on the other hand, you can bind to a second level domain.
You only need to create one balancer across all regions. So in Amazon we need to create a balancer in each region to balance between instances within that region, but in Google it’s just one balancer, just one IP, and it balances between all your instances across different regions.
Google’s balancer can Sticky by both IP and cookie. I told you why you need Sticky – we need to send one user to one backend. Amazon’s balancers can only do it by cookie, which means they issue a cookie themselves at the balancer level. Then they check it, and if it looks like the user has a cookie that matches one of the instances, they send it to the same instance. Google’s can IP, which is much better for us, although it doesn’t always solve all problems.
Google’s balancer has Instant warm-up: you don’t have to warm it up in any way, you can send up to a million queries to it at once. A million queries is what they promise themselves for sure, and that’s what I checked myself. I think they grow further somehow on their own internally.
That said, they have a problem with the number of backends changing dramatically. At one point we added about 100 new hosts. At that point the google balancer went dead. When we started communicating with the engineer from Google, they said: add one per minute, then you will be happy.
They also recently trimmed the ports you can use when creating a balancer in their HTTP balancer. There used to be a regular input field, now you can only choose between 80 and 8080 ports. This may be a problem for some people.
None of these cloud balancers have SNI support. If you need to support multiple domains and multiple certificates, you need to create a separate balancer for each certificate and bind to it. This is a solvable problem, but it can be inconvenient.