Home PostgreSQL Example of PostgreSQL table recovery using the new mega feature pg_filedump

Example of PostgreSQL table recovery using the new mega feature pg_filedump

by admin

Example of PostgreSQL table recovery using the new mega feature pg_filedump
Let me tell you about a cool feature that my colleagues and I at Postgres Pro. recently wrote in a utility pg_filedump This feature allows you to partially restore data from the database, even if the database was severely damaged and PostgreSQL instance with such a database can not run. Of course, I want to believe that this functionality is very rarely needed. But just in case it would be nice to have something like this at hand. Read on to find out how this feature works.
Partial data recovery was introduced in commit 52fa0201 :
commit 52fa0201f97808d518c64bcb9696f2a350678aa5
Author: Teodor Sigaev <teodor@sigaev.ru>
Date: Tue Jan 17 16:01:12 2017 +0300
Partial data recovery (-D flag).
This feature allows to partially recover data from a given segment file
in format suitable for using in COPY FROM statement. List of supported
data types is currently not full and TOAST is not yet supported, but
it's better than nothing. Hopefully data recovery will be improved in
the future.
Implemented by Aleksander Alekseev, reviewed by Dmitry Ivanov, tested
by Dmitry Ivanov and Grigoriy Smolkin.

Suppose there is some table :

create table tt (x int, y bool, z text, w timestamp);

filled with some data :

insert into tt values(123, true, 'Text test test', now());insert into tt values(456, null, 'Ololo troolo', null);checkpoint;

Here I tell checkpoint to make sure the data gets to disk. Otherwise they will get to WAL, but the buffer manager will keep them in memory until the tuple (tuple, tuple, row in table) is superseded by newer and/or more frequently used tuples. Or checkpoint by timeout/accumulation max_wal. I think this is the most common scenario for a page sync to disk. – note. Stas Kelvic.
Also find out the name of the segment corresponding to the table :

select relfilenode from pg_class where relname = 'tt';

In my case, the relfilenode of the table was 16393. Find this segment (or segments, if the table is larger than 1 GB) on disk :

find /path/to/db/ -type f | grep 16393

Let’s copy it somewhere and pretend we want to recover the data with only the segment file.
To do this, let’s assemble the latest version of pg_filedump:

git clone git://git.postgresql.org/git/pg_filedump.gitcd pg_filedumpmake

Usually, we know the schema of the base because we have an application that works with it. So we know the types of columns in the table and can decode them this way :

/pg_filedump -D int, bool, text, timestamp /path/to/db/base/16384/16393

Example output :

******************************************************************** PostgreSQL File/Block Formatted Dump Utility - Version 9.6.0** File: /home/eax/work/postgrespro/postgresql-install/data-master/base/16384/16393* Options used: -D int, bool, text, timestamp** Dump created on: Tue Jan 17 16:28:07 2017*******************************************************************Block 0 ********************************************************<Header>-----Block Offset: 0x00000000 Offsets: Lower 32 (0x0020)Block: Size 8192 Version 4 Upper 8080 (0x1f90)LSN: logid 0 recoff 0x0301e4c0 Special 8192 (0x2000)Items: 2 Free Space: 8048Checksum: 0x0000 Prune XID: 0x00000000 Flags: 0x0000 ()Length (including item array): 32<Data>------Item 1 -- Length: 56 Offset: 8136 (0x1fc8) Flags: NORMALCOPY: 123 t Text test test 2017-01-17 16:25:03.448488Item 2 -- Length: 52 Offset: 8080 (0x1f90) Flags: NORMALCOPY: 456N Ololo trooloN*** End of File Encountered. Last Block Read: 0 ***

This is quite a lot of data, because pg_filedump outputs information about each page in the segment and decodes the header of each tap. Fortunately, it is quite easy to separate the flies from the cutlets, e.g:

pg_fiedump-D ...how..earlier... | grep COPY | perl -lne 's/^COPY: //g; print;' > /tmp/copy.txtcat /tmp/copy.txt

Contents of file copy.txt:

123 t Text test test 2017-01-17 16:25:03.448488456N Ololo trooloN

This is the data of our table in a format suitable for the COPY FROM query. Checking :

create table tt2 (x int, y bool, z text, w timestamp);copy tt2 from '/tmp/copy.txt';select * from tt2;

Result :

x | y | z | w-----+---+----------------+----------------------------123 | t | Text test test | 2017-01-17 16:25:03.448488456 | | Ololo troolo |(2 rows)

As you can see, all the data has been successfully recovered.
Of course, this was a somewhat simplified example and in practice it is more complicated. First, the list of supported types is currently somewhat limited :

static ParseCallbackTableItem callback_table[] = {{ "smallserial", decode_smallint }, { "smallint", decode_smallint }, { "int", decode_int }, { "serial", decode_int }, { "bigint", decode_bigint }, { "bigserial", decode_bigint }, { "time", decode_time }, { "timetz", decode_timetz }, { "date", decode_date }, { "timestamp", decode_timestamp }, { "float4", decode_float4 }, { "float8", decode_float8 }, { "float", decode_float8 }, { "bool", decode_bool }, { "uuid", decode_uuid }, { "macaddr", decode_macaddr }, /* internally all string types are stored the same way */{ "char", decode_string }, { "varchar", decode_string }, { "text", decode_string }, { "json", decode_string }, { "xml", decode_string }, { NULL, NULL}, };

Second, TOAST is currently not supported. If the string is stored uncompressed or compressed in-place, pg_filedump will recover it successfully (if the compressed data has not been corrupted). However, if the string has been moved to an external TOAST table, you will get just "(TOASTED)" instead of the string. In principle, TOAST support is not an insurmountable task. You just need to teach pg_filedump to parse the directory and find the corresponding TOAST table. It’s just that no one has done that yet. Maybe TOAST support will be added in future versions of pg_filedump.
Finally, in practice, the database schema sometimes changes, columns appear and disappear in the table. Removing columns is not such a big problem, because physically in the taplet this column remains, it just always equals null. But with the addition is a little more complicated, because because of it the placeholders within the same table can have a variable number of attributes. If the number of attributes in the taple does not match the number of attributes specified by the user, pg_filedump just shows a warning with partially decoded data, and moves on to the next taple. This means that in practice parsing the output of pg_filedump will be a bit harder, or that you will have to run it several times with different attribute lists.
In my opinion, and not only mine, as extreme data recovery tool, it’s better to have at least one than none 🙂 If you have any ideas for further improvement of the presented functionality, and in general any comments and additions, I will be very interested to read them in the comments!
You may also be interested in articles :

Continued – Another new pg_filedump feature: restore PostgreSQL directory

You may also like