Home PostgreSQL An overview of PostgreSQL statements for Kubernetes.Part 1: our choice and experience

An overview of PostgreSQL statements for Kubernetes.Part 1: our choice and experience

by admin

An overview of PostgreSQL statements for Kubernetes.Part 1: our choice and experience
We get more and more requests from customers: "We want it like Amazon RDS, but cheaper";"We want it like RDS, but everywhere, in any infrastructure".To implement a similar managed solution on Kubernetes, we looked at the current state of the most popular operators for PostgreSQL (Stolon, operators from Crunchy Data and Zalando) and made our choice.
This article is our experience both from a theoretical point of view (overview of solutions) and from a practical point of view (what was chosen and what came out of it). But first, let’s define what are the requirements for a potential RDS replacement in general…

What is RDS

When people talk about RDS, in our experience, they mean a managed service RDS, which :

  1. easy to set up;
  2. has the ability to work with and recover from snapshots (preferably with support for PITR );
  3. allows you to create master-slave topologies;
  4. has a rich list of extensions;
  5. provides auditing and user/access management.

Generally speaking, the approaches to the implementation of the task can be very different, but the way with the conditional Ansible is not close to us. (2GIS colleagues came to a similar conclusion as a result of their attempt to to create "a tool for rapid deployment of a fault-tolerant Postgres-based cluster.")
Operators are the accepted approach for such tasks in the Kubernetes ecosystem. The Flunt techdriver has already told us more about them in relation to databases running inside Kubernetes, distol , in one of his reports
NB For fast creation of simple operators we recommend to pay attention to our Open Source utility shell-operator Using it, you can do it without knowing Go, but in more usual ways for sysadmins: in Bash, Python, etc.
There are several popular K8s operators for PostgreSQL :

  • Stolon;
  • Crunchy Data PostgreSQL Operator;
  • Zalando Postgres Operator.

Let’s look at them more closely.

Operator selection

In addition to those important features already mentioned above, we – as infrastructure operations engineers in Kubernetes – also expected the following :

  • deployment from Git and with Custom Resources ;
  • pod anti-affinity support;
  • node affinity or node selector installation;
  • install tolerations;
  • availability of tuning options;
  • understandable technologies and even commands.

Without going into details for each of the points (ask in the comments if you still have questions about them after reading the whole article), I will note in general that these parameters are needed for a better description of the specialization of the cluster nodes in order to order them for specific applications. This way we can achieve an optimal balance in terms of performance and cost.
Now to the PostgreSQL operators themselves.

1. Pillar

Stolon from the Italian company Sorint.lab at already mentioned report was regarded as a kind of benchmark among operators for the RDBMS. It’s a pretty old project : its first public release was back in November 2015(!), and the GitHub repository boasts nearly 3, 000 stars and 40+ contributors.
Indeed, Stolon is an excellent example of thoughtful architecture :
An overview of PostgreSQL statements for Kubernetes.Part 1: our choice and experience
The device of this operator in detail can be found in the report or project documentation Basically, all I can say is that it has all these features: failover, proxies for transparent client access, backups… And the proxies give access via one endpoint service – unlike the other two solutions described above (they have two services for the database access).
However, Stolon has no Custom Resources , which makes it impossible to deplocate it in such a way that you can easily and quickly – "like hotcakes" – create DBMS instances in Kubernetes. The management is done through the utility stolonctl , Deployment is done via HelmChart, and user settings are defined in ConfigMap.
On the one hand, it turns out that the operator is not really an operator (because it doesn’t use a CRD). But on the other hand, it is a flexible system that allows you to customize the resources in K8s to your liking.
To summarize, for us personally, it did not seem optimal to have a separate chart for each database. So we started looking for alternatives.

2. Crunchy Data PostgreSQL Operator

Operator by Crunchy Data , a young American startup, seemed like a logical alternative. Its public history begins with its first release in March 2017, since then the GitHub repository has received just under 1, 300 stars and 50+ contributors. The latest release from September has been tested to work with Kubernetes 1.15-1.18, OpenShift 3.11+ and 4.4+, GKE and VMware Enterprise PKS 1.3+.
The Crunchy Data PostgreSQL Operator architecture also meets the stated requirements :
An overview of PostgreSQL statements for Kubernetes.Part 1: our choice and experience
Control is done through the utility pgo but it in turn generates Custom Resources for Kubernetes. Therefore, we as potential users are pleased with the operator :

  • there is control via CRD;
  • convenient user management (also via CRD);
  • integration with other components Crunchy Data Container Suite – a specialized collection of container images for PostgreSQL and utilities for it (including pgBackRest, pgAudit, extensions from contrib, etc.).

However, attempts to start using the operator from Crunchy Data revealed several problems :

  • There was no possibility to use nodeSelectors – only nodeSelector is provided.
  • The pods we created were part of Deployment, even though we were deploying a stateful application. Unlike StatefulSet, Deployment doesn’t know how to create pods.

The last flaw leads to funny moments: on a test environment was able to run 3 replicas with the same drive local storage , which caused the operator to report that the 3 replicas were working (although they were not).
Another feature of this operator is its ready integration with various auxiliary systems. For example, it is easy to install pgAdmin and pgBounce, and in documentation discusses pre-configured Grafana and Prometheus. In a recent 4.5.0-beta1 release improved integration with the project is highlighted pgMonitor , thanks to which the operator offers a clear visualization of PgSQL metrics "out of the box".
Nevertheless, the strange choice of Kubernetes-generated resources led us to find a different solution.

3. Zalando Postgres Operator

We know about Zalando products for a long time: we have experience using Zalenium and of course we’ve tried Patroni – their popular HA solution for PostgreSQL. About the company’s approach to creating Postgres Operator one of its authors, Alexey Klukin, told on Postgres-Tuesday #5 , and we took a liking to it.
This is the youngest solution discussed in this article : the first release was in August 2018. However, even despite the small number of formal releases, the project has come a long way, having already surpassed Crunchy Data’s solution in popularity with 1300+ stars on GitHub and the maximum number of contributors (70+).
"Under the hood of this operator uses solutions proven over time :

Here is how the operator architecture from Zalando is presented:
An overview of PostgreSQL statements for Kubernetes.Part 1: our choice and experience
The operator is fully managed through Custom Resources, automatically creates a StatefulSet of containers, which can then be customized by adding different sidecars to the pod. All of this is a significant plus compared to the operator from Crunchy Data.
Since it is the solution from Zalando that we have chosen among the 3 options under consideration, a further description of its features will be presented below, right along with the practice of application.

Practice with the Postgres Operator from Zalando

Deploying the Operator is very easy: Just download the current release from GitHub and apply the YAML-files from the directory manifests Alternatively, you can also use OperatorHub
After installation, it’s worth the trouble of setting up repositories for logs and backups It is done through ConfigMap postgres-operator in the namespace where you installed the operator. When the stores are configured, you can deploy the first PostgreSQL cluster.
For example, our standard deplot looks like this :

apiVersion: acid.zalan.do/v1kind: postgresqlmetadata:name: staging-dbspec:numberOfInstances: 3patroni:synchronous_mode: truepostgresql:version: "12"resources:limits:cpu: 100mmemory: 1Girequests:cpu: 100mmemory: 1Gisidecars:- env:- name: DATA_SOURCE_URIvalue: 127.0.0.1:5432- name: DATA_SOURCE_PASSvalueFrom:secretKeyRef:key: passwordname: postgres.staging-db.credentials- name: DATA_SOURCE_USERvalue: postgresimage: wrouesnel/postgres_exportername: prometheus-exporterresources:limits:cpu: 500mmemory: 100Mirequests:cpu: 100mmemory: 100MiteamId: stagingvolume:size: 2Gi

This manifest deploats a cluster of 3 instances with sidecar as postgres_exporter from which we take the metrics of the application.As you can see, it’s very simple, and you can literally make an unlimited number of clusters if you want to.
It is also worth paying attention to web panel for administration postgres-operator-ui It comes with the operator and allows you to create and delete clusters, as well as work with backups that the operator makes.
An overview of PostgreSQL statements for Kubernetes.Part 1: our choice and experience
List of PostgreSQL clusters
An overview of PostgreSQL statements for Kubernetes.Part 1: our choice and experience
Backup management
Another interesting feature is the support for Teams API This mechanism automatically creates roles in PostgreSQL , based on the received list of user names. The API then allows you to return a list of users for which roles are automatically created.

Problems and their solutions

However, the use of the operator soon revealed several significant drawbacks :

  1. Lack of support for nodeSelector;
  2. Inability to disable backups;
  3. No default privileges appear when using the create bases feature;
  4. documentation is occasionally missing or is out of date.

Fortunately, many of these can be solved. Let’s start at the end – the problems with documentation.
You will probably run into the fact that it’s not always clear how to write a backup and how to connect the backup tank to the Operator UI. The documentation talks about this in passing, but the actual description is in PR :

  1. need to make a secret;
  2. pass it to the operator in the parameter pod_environment_secret_name in a CRD with operator settings or in ConfigMap (depends on how you choose to set the operator).

However, as it turns out, this is not possible at this time. That’s why we put together a his version of the operator with some additional third-party developments. Read more about it below.
If you pass parameters to the operator for a backup, viz. wal_s3_bucket and the access keys in AWS S3, it will back up everything :not only the bases in production, but also staging. We are not satisfied with this.
In the parameter description to Spilo, which is a basic Docker wrapper for PgSQL when using the operator, it turned out : you can pass the parameter WAL_S3_BUCKET empty, thereby disabling backups. Moreover, to my great joy, I also found ready PR , which we immediately accepted into our fork. Now you just have to add enableWALArchiving: false to the PostgreSQL cluster resource.
Yes, it was possible to do otherwise by running 2 operators: one for staging (without backups) and one for production. But this way we were able to get by with just one.
Ok, we learned how to pass access to the bases for S3 and the backups started hitting the repository. How do we get the backup pages to work in Operator UI?
An overview of PostgreSQL statements for Kubernetes.Part 1: our choice and experience
In the Operator UI you will need to add 3 variables :

  • SPILO_S3_BACKUP_BUCKET
  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY

After that, backup management will be available, which in our case will simplify staging by allowing you to deliver slices from production there without additional scripts.
As another plus was called the work with Teams API and ample opportunities to create bases and roles by operator means. However, the created roles did not have rights by default Consequently, a user with read permissions could not read the new tables.
Why is this the case? Despite the fact that in the code. there is necessary GRANT , they do not always apply. There are 2 methods : syncPreparedDatabases and syncDatabases In syncPreparedDatabases – Despite the fact that in the section preparedDatabases there’s there is a condition defaultRoles and defaultUsers to create roles, – the default rights do not apply. We are in the process of preparing a patch to have these rights automatically applied.
And the last point in the relevant refinements for us is patch which adds Node Affinity to the StatefulSet we create. Our customers often prefer to cut costs by using spot instances, and it’s clearly not a good idea to host database services on them. This issue could have been resolved through the Node Affinity, but having a Node Affinity gives you more confidence.

What did you get?

As a result of solving the listed problems we forked Postgres Operator from Zalando in their repository where it’s built with so many useful patches. And for the sake of convenience, we’ve also built Docker image
List of PRs accepted into the fork :

It would be great if the community would support these PRs to get them into upstream with the next version of operator (1.6).

Bonus! A success story with production migration

If you use Patroni, live production can be migrated to the operator with minimal downtime.
Spilo allows you to make standby clusters via S3 storage with Wal-E , where the PgSQL binary log is first stored in S3 and then pumped out by replica. But what if you have not used by Wal-E in the old infrastructure? The solution to this problem is already has already been proposed On the hubra.
PostgreSQL logical replication comes to the rescue. However, let’s not go into the details of how to create publications and subscriptions, because… our plan failed.
The point is that the database had several loaded tables with millions of rows, which, in addition, were constantly replenished and deleted. A simple subscription with copy_data , when the new replica copies all the content from the master, just couldn’t keep up with the master. The copying of the content worked for a week, but it never caught up with the master. In the end, figuring out the problem was helped by article colleagues from Avito: it is possible to transfer the data using pg_dump Let me describe our (slightly modified) version of this algorithm.
The idea is that you could do a disabled subscription tied to a specific replication slot and then fix the transaction number. There were replicas available for production work. This is important because the replica will help create a consistent dump and continue to receive changes from the master.
In the following commands describing the migration process, the following host designations will be used :

  1. master – source server;
  2. replica1 – streaming replica on old production;
  3. replica2 – new logical replica.

Migration plan

1. Create a subscription on the wizard for all the tables in the schema public databases dbname :

psql -h master -d dbname -c "CREATE PUBLICATION dbname FOR ALL TABLES;"

2. Create a replication slot on the wizard :

psql -h master -c "select pg_create_logical_replication_slot('repl', 'pgoutput');"

3. stop replication on the old replica :

psql -h replica1 -c "select pg_wal_replay_pause();"

4. Get the transaction number from the master :

psql -h master -c "select replay_lsn from pg_stat_replication where client_addr = 'replica1';"

5. Let’s dump the old replica. We will do it in several threads, which will help speed up the process:

pg_dump -h replica1 --no-publications --no-subscriptions -O -C -F d -j 8 -f dump/ dbname

6. Upload the dump to the new server :

pg_restore -h replica2 -F d -j 8 -d dbname dump/

7. After downloading the dump, you can run replication on the streaming replica :

psql -h replica1 -c "select pg_wal_replay_resume();"

7. Create a subscription on a new logical replica :

psql -h replica2 -c "create subscription oldprod connection 'host=replica1 port=5432 user=postgres password=secret dbname=dbname' publication dbname with (enabled = false, create_slot = false, copy_data = false, slot_name='repl');"

8. We will get oid subscriptions :

psql -h replica2 -d dbname -c "select oid, * from pg_subscription;"

9. Suppose that a oid=1000 Apply the transaction number to the subscription :

psql -h replica2 -d dbname -c "select pg_replication_origin_advance('pg_1000', 'AA/AAAAAAAA');"

10. Run Replication :

psql -h replica2 -d dbname -c "alter subscription oldprod enable;"

11. Check the subscription status, replication should work :

psql -h replica2 -d dbname -c "select * from pg_replication_origin_status;"psql -h master -d dbname -c "select slot_name, restart_lsn, confirmed_flush_lsn from pg_replication_slots;"

12.Once replication is running and the bases are synchronized, you can switch.
13. After disabling replication, you need to fix the sequences. This is well described by in an article at wiki.postgresql.org
Thanks to this plan, the switchover went with minimal delay.

Conclusion

Kubernetes operators allow you to simplify various activities by reducing them to the creation of K8s resources. However, while you can achieve great automation with them, you should remember that it can also bring some unexpected nuances, so choose your operators wisely.
After considering the three most popular Kubernetes operators for PostgreSQL, we opted for the project from Zalando. And we had to overcome some difficulties with it, but we were really pleased with the result, so we plan to extend this experience to some other PgSQL installations as well. If you have experience with similar solutions – we’d love to see details in the comments!
UPDATED (November 13, 2020) : the article is out. second part where the KubeDB and StackGres operators are also discussed, and a general comparison table is presented.

P.S.

Read also in our blog :

You may also like