Home PostgreSQL Professional Postgres

Professional Postgres

by admin

We continue to publish videos and transcripts of the best talks from the conference PGConf.Russia 2019 Oleg Bartunov’s report on "Professional Postgres" opened the plenary part of the conference.It disclosed the history of Postgres DBMS, Russian contribution to development, architecture peculiarities.
Previous materials in this series : "Typical mistakes when working with PostgreSQL" by Ivan Frolkov, parts 1 and 2
Professional Postgres
I’m going to talk about Postgres Professional. Please don’t be confused with the company I represent now, Postgres Professional.
Professional Postgres
I’m really going to talk about how Postgres, which started out as an amateur academic development, became professional – as we see it now. My personal opinion is purely my own; it does not reflect the opinion of our company or any group.
Professional Postgres
It just so happens that I’ve been using and dealing with Postgres not in bits and pieces, but continuously since 1995 until today. I watched the whole history of Postgres and participated in its main events.

History

On this slide, I have summarized the projects in which I have been involved. Many of them are familiar to you. And I’ll start the history of Postgres with a picture that I drew many, many years ago and then only drew it more and more – the number of versions keeps increasing and increasing. It reflects the evolution of relational databases. On the left, in case anyone doesn’t know, is Michael Stonebraker. called the father of Postgres. Below are our first "nuclear" developers. The man on the right is Vadim Mikheev from Krasnoyarsk, he was one of the first core developers.
I will start the story of the relational model with IBM, which made a giant contribution to the industry. It was at IBMthat worked Edgar Codd. and from its depths came the first white paper on IBMSystem R – it was the first relational database. Mike Stonebraker was working at Berkeley at the time. He read this article and he and his guys got excited: we should make a database.
Professional Postgres
In those years – the early ’70s – as you might suspect, there weren’t many computers. There was one PDP-11 for the entire Computer Science department at Berkeley University, and all the students and professors were fighting for machine time. That machine was mostly used for calculations. I used to work like that when I was young: you gave the operator the task and he started it. But students and developers wanted interactive work. It was our dream to sit at the console and input programs and debug them. And when Mike Stonebraker and his buddies made the first base, they called it Ingres – INteractive Grafic REtrieval System. People didn’t understand: why interactive? And it was just a dream come true for its developers. They had a console client, with which you could work with Ingres. It gave a lot to our industry. Do you see how many arrows there are from Ingres? Those are the databases that he influenced, that fiddled with his code. Michael Stonebraker had a lot of student developers who went off and developed then Sybase and MS SQL , NonStop SQL , Illustra , Informix
When Ingresevolved enough to become commercially interesting, a company was formed Illustra (this was 1992), and the DBMS code Illustra was purchased by the company Informix which was later eaten up by IBM , and thus this code went into the DB2 But what interested IBM in Ingres ? First and foremost, extensibility are the revolutionary ideas that Michael Stonebraker laid down from the beginning, thinking that the database should be ready to solve any business problem. And to do that you need to be able to add your own data types, access methods, and functions to the database. This seems natural to us postgresists now. Back then, it was a revolution. It was since Ingres and Postgres that these features, this functionality became the de facto standard for all relational databases. Now all databases have custom features, and when Stonebraker wrote that custom features were needed, the company Oracle for example, was screaming about how dangerous it was, and that you couldn’t do it that way because users could hurt the data. Now we see that user functions exist in all databases, that you can do your own aggregates and data types.
Professional Postgres
Postgres evolved as an academic development, which means: there is a professor, he has a development grant, students and graduate students who work with him. A serious base, ready for production, cannot be done this way. Nevertheless, the latest version from Berkeley is Postgres95 – language has already been added SQL The development students at this time had already started working for Illustra, were doing Informix and lost interest in the project. They said: we have Postgres95, take it, who wants it! I remember it very well because I was one of those who got the letter: there was a mailing list with less than 400 people subscribed to it. Community Postgres95 started with these 400 people. We all voted amicably to take this project. We found an enthusiast who picked up a CVS server, and we moved everything to Panama, since the servers were there.
Professional Postgres
History PostgreSQL [hereafter simply Postgres] starts with version 6.0, as versions 1, 4, 5 were still Postgres95. In 1997 on April 3rd we had our logo, the elephant. Before that we had different animals. On my page, for example, had a cheetah for a long time. which hinted that Postgres was very fast. Then a question was raised in the mailing list : our big database needs a serious animal. Someone wrote: let it be an elephant. Everyone amicably voted, and then our guys from St. Petersburg drew this logo. Initially it was an elephant in a diamond – if you look around in your time machine, you’ll see it. The elephant was chosen because elephants have a very good memory. Even Agatha Christie had a story called "Elephants Can Remember", where the elephant was very vengeful, and for fifty years he remembered an insult, and then he ran over his abuser. Then they cut off the diamond and vectorized it, and the result was this elephant. So this is one of the first Russian contributions to Postgres.
Professional Postgres

Stages of Postgres development

The first task was to stabilize it. The community took over the source codes of the academic developers. What wasn’t in it! They began to tinker with it all so that it would compile decently. I’ve marked 1997 on this slide, version 6.1, which introduced internationalization. I’ve highlighted it not because I did it myself (it really was my first patch), but because it’s an important step. You got used to the fact that Postgres works with any language, in any locale – all over the world. And back then it only understood ASCII, i.e. no 8 bits, no European languages, no Russian. When I found this, I, following the principles of open-source, just made it support locales. And thanks to this work, Postgres went into the world. After me the Japanese Tatsuo Ishii. [Tatsuo Ishii] made support for multibyte encodings, and Postgres went truly worldwide.
In 2005, support was introduced for Windows I remember this heated debate when the mailing list was discussing it. The developers were all normal people, they were working under Unix You’re clapping now, and that’s exactly how people reacted back then. And voted against it. It’s been going on for years. What’s more, SRA Computers. released a few years ahead of their Powergres – Windows native port. But it was a purely Japanese product. When we got Windows support in version 8 in 2005, it turned out to be a strong move: the community swelled. A lot of people showed up and a lot of stupid questions, but the community got big, we grabbed the Windows users.
In 2010, we had embedded replication. That’s a pain. I remember how many years people fought to have replication in Postgres. At first everyone said : we don’t need replication, it’s not a database thing, it’s an external utility thing. If anyone remembers, Slony did Jan Wieck [Jan Wieck]. By the way, "elephants" also came from the Russian language: Jan asked me how to say "many elephants" in Russian, and I answered: "elephants". So he made Slony. These elephants worked as a logical replication on triggers, setting them up was a nightmare – veterans remember. What’s more, everyone listened for a long time. Tom Lane. [Tom Lane], who I remember desperately shouting: why should we complicate our code with replication when we can do it outside the base? But as a result, embedded replication did appear. It gave a huge number of enterprise users at once, because before that such users said: how can we live without replication at all? It’s impossible!
In 2014, jsonbappeared. This is my work, Fedor Sigayev. and Alexander Korotkov And people also shouted: why do we need it? Actually, we already had hstore, which we made in 2003, and in 2006 it became part of Postgres. People used it all over the world and loved it. hstore it came up with a gigantic number of documents. It was a very popular extension. And we promoted the idea of unstructured data in Postgres in every possible way. From the beginning of my work I was just interested in that, and when we did jsonb , I got a ton of emails thanking me and asking me questions. And the community got NoSQL -users! Before jsonb, people, zombified by hype, went to key-value databases. In doing so, they had to sacrifice integrity, ACID -ness. And we gave them the opportunity to work with their beautiful json without sacrificing anything. The community grew dramatically again.
In 2016, we got parallel querying. If anyone doesn’t know, it’s certainly not for OLTP. If you have a busy machine, all the cores are busy anyway. Parallel query execution is valuable for OLAP -users. And they appreciated it, i.e., some number of OLAP -users.
Next came the cumulative processes. In 2017, we got logical replication and declarative partitioning – it was also a big and serious step because logical replication made it possible to make very, very interesting systems, people got unlimited freedom for their imagination and started making clusters. With declarative partitioning, it became possible to create partitions, not manually, but with the SQL language.
In 2018, in version 11, we got JIT Who doesn’t know, this is the Just In Time compiler: you compile queries, and it can really speed up execution a lot. This is important for speeding up slow queries because fast queries are already fast, and the overhead to compile is still substantial.
In 2019, the most basic thing we expect is pluggable storage, API for developers to create their own storage, one example of which is zheap – repository that is being developed by a company EnterpriseDB
And here is our development : SQL/JSON. I was really hoping that Sasha Korotkov will commit it before the conference, but there were some problems, and we hope this year we’ll have SQL/JSON People have been waiting for it for two years [now a big part of the SQL/JSON patch has been commited: jsonpath, there’s more about that here ].
Professional Postgres
Next I move on to a slide which shows : Postgresis a universal database.You could study this picture for hours, tell a bunch of stories about the emergence of companies, the takeover, the death of companies.I’ll start with the year 2000. One of the first forks of Postgres was IBM’s Netezza … . Just imagine : the "blue giant" took the code of Postgres and built an OLAPdatabase to support its BI !
Here’s the fork TelegraphCQ : as early as 2000, people at Berkeley were making a streaming database based on Postgres. If anyone doesn’t know, it’s a database that’s not interested in the data itself, but in the aggregates of the data. There are a lot of tasks now where you don’t need to know every value, let’s say the temperature at some point, but you need an average value in a given region. And TelegraphCQ took this idea (which also originated at Berkeley), one of the most advanced ideas of the time, and developed a database based on Postgres. It further evolved, and in 2008 it was already released as a commercial product – the base TruCQ , now its owner Cisco
I forgot to say, that this page doesn’t contain all forks, there are twice as many. I chose the most important and interesting, so as not to overload the picture. At postgresql-wiki page lists all the forks. Who knows an unsourced database that has that many forks? There are no such databases.
Postgres differs from other databases not only in its functionality, but also in the fact that it has
a very interesting community, it accepts forks normally. In the world of Open Source it is believed: I made a fork because I was offended – you didn’t support me, so I decided to do my own development. In the postgres world appearing of fork means that some people or some company decided to make some prototype and test their latest functionality, to experiment. And if they are lucky, they will make a commercial base, which can be sold to customers, provide them with services, and so on. In this case, as a rule, the developers of all these forks return their developments and patches to the community. Our company’s product is also a fork, and understandably, we brought a bunch of patches back into the community. In the latest version, version 11, we gave over 100 patches back to the community. If you look in its release notes, there are 25 names of our employees. This is normal behavior in the community. We use the community version and do our fork to test our ideas or give customers functionality before the community is mature enough to adopt it. Forks in the Postgres community are very welcome.
Known Vertica arose out of C-Store – also grew out of Postgres. Some people argue that Vertica did not source from Postgres at all, but only postgres protocol support. Nevertheless, it is common to classify it as a postgres fork.
Greenplum Now you can download it and use it as a cluster. It originated from Bizgres – a massively parallel database. Then it was bought by Greenplum, became and stayed commercial for a long time. But you see, somewhere in 2015 they realized that the world had changed : the world was going to open protocols, to open communities, to open databases. And they opened up Greenplum codes. Now they’re actively catching up with Postgres because they’ve fallen behind, of course, a lot during that time. They brushed off 8.2, and now they’re saying they’ve caught up with 9.6.
Our all-time favorite and disliked Amazon You know how it came about. It was happening before my eyes. There was a company, there was a ParAccel with vector processing, also on Postgres-a community product, open source. In 2012, the sly Amazon bought the source code and just six months later announced that here we have RDS We asked them about it and they agonized, but then it turned out to be Postgres. RDS is still alive and it’s one of the most popular services in Amazon; they have about 7000 databases there. They weren’t satisfied with that, and in 2010 Amazon Aurora – Postgres 10 with rewritten storaging – came into use which is built right into Amazon’s infrastructure, into their distributed storage.
Look now at Teradata Big, good old analytics company, OLAP -om. After "eight" [PostgreSQL 8.0] came up Aster Data
Hadoop : we have Postgres on Hadoop – HadoopDB After a while it became a closed base Hadapt owned by Teradata If you see Hadapt, you know that Postgres is inside it.
Very interesting fate of the Citus Everyone knows that it is a distributed Postgres for online analytics. It does not support transactions. Citus Data was a startup, and Citus was a closed-source, separate database. After a while, people realized that it was better to live with the community, to open up. And they went to great lengths to become just an extension (extension) of Postgres. Plus they started doing business already with their cloud services. You all already know : it says here MS Citus Because. Microsoft bought them, literally two weeks ago. Probably to support Postgres on their Azure , meaning Microsoft is playing these games as well. They have Postgres spinning on Azure, and the Citus development team has joined the MS developers.
In general, purchases of postgres companies have been intense lately. Just after Microsoft bought Citus, another postgres company – credativ – bought the company OmniTI to strengthen its presence in the market. These are two fairly well-known, good-quality companies. And Amazon bought the company OpenSCG The Postgres world is changing now, and I will further show why there is so much interest in Postgres.
The noisy TimescaleDB was also a separate database, but it’s now an extension: you take Postgres and install timescaledb as an extension and you get a database that rips all kinds of specialized databases.
There is also Postgres XL, there are clusters that are developing.
Here, in 2015, I put our fork : Postgres Pro We have Postgres Pro Enterprise , there is a certified version, we support "1C" out of the box and we are recognized by the company "1C" If anyone wants to try Postgres Pro Enterprise, you can get the distribution for free for testing, and if you need it for work, you can buy it.
We did Credereum – prototype database with blockchain support. Now we are waiting for people to be mature enough to start using it.
See how big and interesting the picture is. I’m not even talking about Yahoo! Everest. with columnar storage, with petabytes of data in Yahoo! – it was 2008. They even sponsored our conference in Canada, came there, I even have a T-shirt from there somewhere 🙂
There’s more PipelineDB It also started as a closed source database, but now it is also just an extension. We can see that Citus, TimescaleDB and PipelineDB are kind of separate databases, but they exist as extensions, which means you take the standard Postgres and compile an extension. PipelineDB is an extension of the idea of stream databases. You want to work with stream databases? You take Postgres, you take PipelineDB and you’re good to go.
In addition, there are extensions that allow you to work with GPU See the header? I showed that there is an ecosystem that covers a lot of different types of data and workloads. That’s why we’re saying that. Postgres is a universal database.

People’s favorite base

Professional Postgres
On the next slide are the big names. All the most famous clouds in the world support Postgres. In Russia, Postgres is supported by big state companies. They use it, and we serve them as our customers.
Professional Postgres
There are already a lot of extensions and a lot of applications, so Postgres is good as a database to start a project with. I always tell startups : guys, don’t take NoSQL database. I understand that you really want to, but start with Postgres. If you miss something, you can always unhook some service and give it to a specialized database. Besides the flexibility Postgres has one more advantage: it has a very liberal BSD license which allows you to do almost anything with your database.
Professional Postgres
Everything you see on this slide is available because Postgres is an extensible database, and this extensibility is built right into the database architecture. When Michael Stonebraker wrote about Postgres in his first article about it (written in 1984, I’m quoting from 1987), he already talked about extensibility as a crucial part of the database’s functionality. And this, as they say, is time-tested. You can add your functions, your data types, your operators, your index accesses (i.e. optimized access methods), you can write your procedures in a very large number of languages. We have a Foreign Data Wrapper ( FDW ), that is, interfaces for working with different storages, files, you can connect to Oracle , MySQL and other databases.
I want to give an example from my own personal experience. I worked with Postgres and when I was missing something in Postgres, my colleagues and I simply added this functionality. We needed, for example, to work with Russian language, so we made an 8-bit locale. It was a project Rambler By the way, it was in the top 5 at the time. Rambler was the first major world project to get up on Postgres. There were arrays in Postgres from the beginning, but they were such that you couldn’t do anything with them, it was just a text string that stored arrays. We added operators, we added indexes, and now arrays are an essential part of Postgres functionality, and many of you use them without thinking at all about how fast they are, and that’s fine. It used to be said that arrays were no longer the traditional relational model, not satisfying the classic normal forms. Now people are used to using arrays.
Professional Postgres
When we needed a full text search, we made one. When we needed to store data of a different nature, we made the hstore extension, and a lot of people started using it: it gave us the ability to build flexible database schemas, so we could diploma earlier and faster. We made GIN -index, so that the full-text search works fast. We made trigrams ( pg_trgm ). Made NoSQL. And that’s all in my memory, all from my own needs.
Professional Postgres
The extensibility is exactly what makes Postgres a unique database, a universal database that you can start working with and not be afraid of being unsupported. Look how many people we have here – it’s already a market! Despite the fact that it is now a hype – graph databases, document databases, time series and so on – look: the majority are still using relational databases. They dominate, that’s 75% of the database market, and the rest are exotic databases, pettiness compared to relational databases.
Professional Postgres
If you look at the ratio of open source databases to commercial ones, by
data DB-Engines , we will see that the number of open source databases is almost equal to the number of commercial databases. And we see that open source databases (blue line) are growing, and commercial databases (red) are falling. This is the direction of the entire IT community, the direction of openness. Now, of course, it’s indecent to refer to Gartner but I’ll say it anyway : they predict that by 2022, 70% will use open databases and up to 50% of existing systems will migrate to open source.
Take a look at this puzzler : we see that Postgresis named the 2018 database of the year. Last year, too, it was ranked #1 by DB-Engines independent experts. The ranking shows that Postgres is really ahead of the rest of the planet. It’s at No. 4 in absolute terms, but look at how it’s growing. It’s growing steadily, nicely. That’s the blue line on the slide. The others — MySQL, Oracle, MS SQL — are either balancing at their level or starting to stall.
Professional Postgres
Hacker news – All of you are probably reading it or Y Combinator – they do surveys there periodically, companies post their job openings there, and they’ve been keeping statistics for a while now. You can see that since about 2014, Postgres has been ahead of everyone else. Was 1st MySQL, but Postgres slowly grew, and now among the whole hacker community (in a good way) it also prevails and grows further.
In Stack Overflow also do surveys every year. According to most used our Postgres is in a good, third place. According to most loved – Is in second place. This is a favorite database. Redis is a non-relational database, and of the relational ones, Postgres is my favorite. I didn’t cite the picture here. most dreaded – the worst database, but you can probably guess who comes first. "Base X, " as they like to call it in Russia.
Professional Postgres
There is a review of Russia, too, a poll at our esteemed conference HighLoad++ Conducted not by us, it was done by Oleg Bunin It turned out: in Russia Postgres database number 1.
Professional Postgres
This is the second time we’ve asked. HH.ru to share Postgres job statistics with us. 9 years ago Postgres was 10 times behind Oracle, everyone was screaming : give us oracleists. And we’re seeing that last year we leveled off, and then in 2018 there was growth. And if you’re worried about where to find a job, look : 2 thousand jobs on HH.ru are Postgres. Don’t worry, there are plenty of jobs.
Professional Postgres
To make it easier to see, I made a picture where I showed Postgres vacancies relative to Oracle vacancies. It was less than one, starting in 2018 they are level, and now Postgres is already a little bit more. So far, it’s a little depressing that the absolute number of Oracle vacancies is also increasing, which in principle shouldn’t be the case. But, as they say, we sit by the river and watch : when the corpse of the enemy swims by. We just do our job.
Professional Postgres

Russian Postgres community

This is the most organized community in Russia, I haven’t met any other like it. There are a lot of resources, chats where we all talk business. We have conferences – two big conferences: in St. Petersburg and in Moscow, we have apartments, we participate in all the big international conferences, we have courses.
Professional Postgres
These are actually community courses. Our company prepared them, but they are freely available to any of you, watch our channel on youtube or go to our website in the section "Education", there are courses for free download DBA1 , DBA2 , DBA3 , developer courses.
And now we’re launching certification – that’s what companies are asking for, they want certified specialists. And the employer will know: you are a certified specialist.
Professional Postgres
Very often people ask: How Russian is Postgres? The question is a bit wrong: Postgres is international. But I will say a little bit about the Russian flag. You can see on the slide that made Vadim Mikheev. For those who know Postgres, it is clear what the MVCC , WAL , VACUUM and so on. These are all Russian contributions. There are now three lead Postgres developers, two of whom are committers. You can see on the slide that quite a lot has been done. If you look at the major features in the release notes, you will see our contribution. The Russian contribution is there, and it’s quite substantial. We’ve been working from the very beginning and we’re continuing to work with the community, already at the campaign level.
Professional Postgres
And the company’s other contribution is books. We have two university courses on Postgres. You can go to the store and buy these books, you can teach these courses, take exams, and so on. We have books for beginners, which are distributed, including here. It’s a very useful and good book. We even translated it into English.

Professional Postgres

Let’s get to the basics. Academic Postgres, when it started, was designed for a few dozen users. The Postgres95 community was less than 400 people. The community consisted mostly of developers and a few more users. And – an interesting detail – the developers were basically both customers and implementers. For example, when I needed it, I developed it for myself, and at the same time, shared it with everyone else. So the community developed for the community.
Starting in 2000, a little earlier, the first postgres companies began to appear : GreatBridge , 2ndQuadrant , EDB They were already hiring full-time developers who worked for the community. The first enterprising forks and the first enterprising customizers appeared. This led to the fact that by 2015 the main number – yes, almost all of the leading developers – had already been organized into some kind of companies. In 2015, our company was formed: we were the last free freelance developers. Now there are almost none of them left. The post-Gres community has changed, became enterprising, and now these companies are already driving development. This is a good thing because these companies are doing what the entreprenuership needs. The community is a brake in a good sense: it tests the features, it condemns or accepts new features, it brings us all together. And Postgres has become enterprise ready , it’s happily used by big companies, it’s become professional.
Professional Postgres
This slide is about the future as I see it. With the advent of pluggable storage There will be new storage : append-only , read-only , column storage – whatever you want (I, for example, dream of parquet). There will be support for vector operations. There will be a talk about them today, by the way. There will be support for blockchain. There is no getting away from it, since we are transitioning to a digital economy, to paperless technology. You have to use electronic signatures and you have to be able to certify your database, make sure that nobody has tampered with anything, and blockchain is very good for that.
Professional Postgres
Professional Postgres
Next : adaptive Postgres This is a bit of a sad topic for you, but it’s still pretty far out there. The fact is that DBAs, generally speaking, are a fairly expensive resource, and soon databases won’t need them. The databases will be smart enough to configure and tweak themselves. But that will be another ten years, probably. We still have plenty of time.
Professional Postgres
And it’s clear that Postgres will have native support for clouds, cloud storage – we just can’t survive without it. And, of course, here it is, the last slide :
ALL YOU NEED IS POSTGRES!
Thanks for attention.

You may also like