Home PostgreSQL PostgreSQL Antipatterns: Analyzing Blockages – SELF JOIN vs WINDOW

PostgreSQL Antipatterns: Analyzing Blockages – SELF JOIN vs WINDOW

by admin

Previously we have already learned how to intercept locks from the PostgreSQL server log.Let’s now put them in the database and break down what actual errors and performance problems can be caused by a simple analysis of them.
In the logs, we have only 3 types of events that can happen with the blocking :

  • lockout wait
    LOG: process 38162 still waiting for ExclusiveLock on advisory lock[225382138, 225386226, 141586103, 2] after 100.047 ms
  • lockout receipt
    LOG: process 38162 acquired ExclusiveLock on advisory lock [225382138, 225386226, 141586103, 2] after 150.741 ms
  • interlocking
    ERROR: deadlock detected

Let’s exclude deadlocks from the analysis – they are just errors, and try to find out, how much time we lost because of deadlocks in a particular day on a particular host.
First, we need a table where we will collect all such entries :

CREATE TABLE lock(dt -- chronological sectioning keydate, host -- the server where the lock occurreduuid, pid -- process PID from the log stringinteger, ts -- time of eventtimestamp, event -- { lock-wait| lock-acquire| deadlock-detect }lockevent, type-- { relation | extend | .}locktype, mode -- { AccessShare | RowShare | .}lockmode, lock -- lock objectuuid, exectime-- durationnumeric(32, 2));

More in detail About sectional organization in of our monitoring system can be read in the article "Writing in PostgreSQL on Subluminal : 1 host, 1 day, 1TB" , and about different types and modes of locks- in "DBA: Who’s hiding behind the blockchain."

As you hear it, so it is spelled.

Let’s try to answer the question posed at the beginning of the article in the simplest way.
PostgreSQL Antipatterns: Analyzing Blockages - SELF JOIN vs WINDOW
What is the waiting time for a lock? Well, obviously it’s the time it takes to get it for each time it’s waiting :

  • take each waiting case ( lock-wait )
  • for it we find the closest "from below" in time to the receipt record ( lock-acquire ) "of the same" (lock, pid, mode) lock – that is, on the same object, in the same process, with the same mode

Interlock type ( type ) can be omitted in our case, since it is unambiguously determined by the object itself ( lock ).
Next, all that remains is to summarize the results.

SELECTts, pid, event, type, mode, lock, exectime, T.*FROMlock lc, LATERAL (SELECTexectimewaittimeFROMlockWHERE(dt, host, lock, pid, mode, event) = ('2020-06-19'::date, lc.host, lc.lock, lc.pid, lc.mode, 'lock-acquire') ANDts > = lc.tsORDER BYtsLIMIT 1) TWHERE(lc.dt, lc.host, lc.event) = ('2020-06-19'::date, 'e828a54d-7e8a-43dd-b213-30c3201a6d8e'::uuid, 'lock-wait'::lockevent);

It’s all plain and simple! And what does it show us EXPLAIN ?..
PostgreSQL Antipatterns: Analyzing Blockages - SELF JOIN vs WINDOW
Oh… had to read almost 900MB of data and almost all of it – because of the search for the linked record for each lock, not very nice.
But is this query even correct for our problem? No! Let’s take a close look at the collected data :
PostgreSQL Antipatterns: Analyzing Blockages - SELF JOIN vs WINDOW
Oops… It turns out that the server "complains" to the logs about the same unreceived lock sometimes many, many times. Which means that we are have taken into account its waiting time multiples of the number of such entries in the log, which is not at all what we want.

Remember the goal!

Actually, why are we even looking for a related one for each wait record? I mean, we want to know how long the wait took, and it’s directly written in lock-acquire So let’s select only them right away, then there will only be one Index Scan – right?
Almost, but not quite. The thing is. under load, the logging process can skip any of the entries – either about the start of waiting for a lock, or about the fact that it was received :
PostgreSQL Antipatterns: Analyzing Blockages - SELF JOIN vs WINDOW
So is there no way to get all the data we need in one read?

Window Functions: seven in one stroke

To help us, we have window functions
PostgreSQL Antipatterns: Analyzing Blockages - SELF JOIN vs WINDOW
And more specifically chain allocation model in a ready sample from the article "SQL HowTo: assembling "chains" using window functions"
First, let’s understand that the condition for ending a "chain"-that is, a segment of consecutive key (host, lock, pid, mode) lock entries – for us is either the explicit occurrence of event = 'lock-acquire' Or (which is very rare, but happens) new segment locks the same object whose duration ( exectime ) started counting again.
PostgreSQL Antipatterns: Analyzing Blockages - SELF JOIN vs WINDOW
We must also take into account the fact that time can coincide for several log entries, even from one PID. In this case, you have to additionally sort by exectime to get the correct sequence :
PostgreSQL Antipatterns: Analyzing Blockages - SELF JOIN vs WINDOW

-- create a lock end conditionWITH lc AS (SELECT*, CASEWHEN event = 'lock-wait' THENexectime > coalesce(lead(exectime) OVER(PARTITION BY lock, pid, mode ORDER BY ts, exectime), 0) -- "break" wait timeELSE TRUE -- 'lock-acquire' -- lock doneEND cond -- end of "chain"FROMlock lcWHEREevent <> 'deadlock-detect' AND -- exclude all deadlocks(lc.dt, lc.host) = ('2020-06-19'::date, 'e828a54d-7e8a-43dd-b213-30c3201a6d8e'::uuid))-- leaving only the "latest" records - their exectime is the waiting time for the "entire" lockSELECTts, pid, event, type, mode, lock, exectimeFROMlcWHEREcond;

PostgreSQL Antipatterns: Analyzing Blockages - SELF JOIN vs WINDOW
Now we read 8MB total data (100 times less!), slightly reducing the final execution time.
It can be further reduced by creating an index ideally suited to OVER (i.e. including lock, pid, mode, ts, exectime ), getting rid of the Sort -node. But usually field in the index "behind timestamp" should not be done

You may also like