Home PostgreSQL Accessing tables from C extensions for Postgres

Accessing tables from C extensions for Postgres

by admin

Accessing tables from C extensions for Postgres

Hello everyone

This time I’m not talking about using Python or another CSS/HTML trick, and, alas, not about how I spent 5 years porting Vangers but about one important aspect of writing extensions for the wonderful PostgresSQL DBMS.
In fact, there are already quite a few articles on how to write a C extension for Postgres (for example, this one ), including the in Russian But, as a rule, they describe rather simple cases. In these articles and instructions, the authors implement functions that receive input data, process it somehow, and then return a single number, string, or user type. They don’t explain what to do if you need to run a regular table that exists in the database, or an index, from C code.
The C tables can be accessed through the well described but slow SPI (Server Programming Interface) There is also a very complicated way, through buffers and I will try to tell you about the compromise. I’ve tried to give you code examples with detailed explanations.


I assume you have already written your simple functions, and you have seen that they are declared in a clever way :


and you can’t just call a function like that. Let’s just take the function to_regclass :

Datummy_index_oid_datum = DirectFunctionCall1(to_regclass, CStringGetDatum("my_index"));Oidmy_index_oid = DatumGetObjectId(my_index_oid_datum);

In this code I call the function to_regclass , using a macro to convert the name of a database object (index, table, etc.) to its Oid (a unique number in the directory). This function has only one argument, so the macro with the telling name DirectFunctionCall1 has one at the end. In the file include/fmgr.h macros up to 9 arguments are declared. The arguments themselves are always of the universal type Datum , which is why the string "my_index" is cast to Datum by means of the function CStringGetDatum Postgres functions in principle communicate by means of Datum , so the result of our macro will be a value of type Datum After that we need to convert it to the type Oid using the macro DatumGetObjectId All possible conversions can be found here : include/postgres.h
I will also explain one more thing: in C it is common practice to declare variables at the beginning of a block, but I declare them where I start using them for clarity. In practice, you don’t write them that way.


Let me explain right away why SPI is slow. The point is that a request made with SPI goes through all the steps of parsing and planning. Besides, going the easy way, where there is no magic, does not seem interesting to me.
The next thing I want to say about names is that in Postgres they are confusing! Because of the long history of the project, there are a lot of strange names for types, methods and functions left in the code.
Before reading on, it’s a good idea to have a basic understanding of the Postgres MVCC. All the examples below only work in an existing transaction and you’ll need a lot more magic if you find yourself in a place where you don’t already have one.
So, let’s say we just want to run through a table which contains two fields : int id and text nickname, and output them in the log. First, we need to open a heap (table) with a certain locking :

RangeVar*table_rv = makeRangeVar("public", "my_table", -1);Relationtable_heap = heap_openrv(table_rv, AccessShareLock);

Instead of a function heap_openrv you can use heap_open which has the first argument of Oid table (you can get it with the function in the first part of the article). I think the purpose of RangeVar is intuitively clear, but let’s dwell on the locks. The lock types are declared in the file include/storage/lockdefs.h with fairly self-explanatory comments. You can see this information in table :

AccessShareLock SELECT
ShareRowExclusiveLock like EXCLUSIVE MODE, but allows ROW SHARE
AccessExclusiveLock ALTER TABLE, DROP TABLE, VACUUM FULL, and unqualified LOCK TABLE

Since we only wanted to run through the plate, that is to execute, SeqScan , we select AccessShareLock After we have opened the heap, we need to initialize the table scan process :

HeapScanDescheapScan = heap_beginscan(sr_plans_heap, SnapshotSelf, 0, (ScanKey) NULL);

As expected, we give our heap as the first argument to this function, but SnapshotSelf requires some explanation. The way MVCC works in Postgres assumes that there can be multiple versions of the same table row at any given time, and it is the snapshot (transaction snapshot) that records which ones we can see and which ones we cannot. Except for SnapshotSelf , i.e. the current snapshot of the transaction, there is, for example, SnapshotAny which, by substituting it, we could also see all deleted and changed tuples (table rows). You can see the other views in the include/utils/tqual.h The following arguments to heap_beginscanare the number of search keys (ScanKey) and the keys themselves. The search keys (ScanKey) are essentially conditions, i.e., what you write in WHERE You don’t really need search keys to work with heap, because you can always check the conditions yourself in your code. But for index search we can’t do without their initialization and use.
And now most importantly, appear cycle :

Datum values[2];bool nulls[2];for (;;){HeapTuple local_tuple;local_tuple = heap_getnext(heapScan, ForwardScanDirection);if (local_tuple == NULL)break;heap_deform_tuple(local_tuple, table_heap-> rd_att, values, nulls);elog(WARNING, "Test id:%i nick:%s", DatumGetInt32(values[0]), TextDatumGetCString(values[1]));}

In this loop we call the function heap_getnext which will get the next tuple until we get a null pointer back. The function heap_getnext gets our HeapScanDesc and directions of scanning, two will be relevant for us : this direct scan – ForwardScanDirection and the reverse – BackwardScanDirection Now we have to decompress the tuple and access its fields, to do that we call heap_deform_tuple where we pass our tuple, then its description (which we take from the heap) and two arrays (one for the values and the other for the NULL values). Then, using functions we are already familiar with, we convert the elements of the array values (consisting of Datum ) to regular C types.
And now don’t forget to close our heap scan and close the heap itself:

heap_endscan(heapScan);heap_close(sr_plans_heap, AccessShareLock);

We close the heap with the same type of lock that we opened it with.

Working with the index

The index search API will be similar to the heap search, but will require more lines of code to initialize. In the code we’ll try to output messages only for strings where the first argument gives the answer to the main question of life, the universe and everything. As for heap, first of all here is a bit of code with all the preparatory work:

RangeVar *table_rv = makeRangeVar("public", "my_table", -1);Relation table_heap = heap_openrv(table_rv, AccessShareLock);table_idx_oid = DatumGetObjectId(DirectFunctionCall1(to_regclass, StringGetDatum("my_table_idx")));Relation table_idx_rel = index_open(table_idx_oid, AccessShareLock);indexScan = index_beginscan(table_heap, table_idx_rel, SnapshotSelf, 1, 0);ScanKeyDatakey;ScanKeyInit(key, 1, BTEqualStrategyNumber, F_INT4EQ, Int32GetDatum(42));index_rescan(indexScan, key, 1, NULL, 0);

So, to search for an index, just like in the last example, we need to open the heap (table) to which the index is linked. With to_regclass we find the oid for our index my_table_idx, then we open it with index_open , finding the one we want Relation Then we initialize the index scanning process index_beginscan , here the main difference from heap_beginscan is that we will have 1 search key (ScanKey).
ScanKeyInit , as the name implies, initializes the key to be searched. The first argument is the key itself (type ScanKeyData ), then we specify a serial number of the argument, which will be used to find the key (numbering from 1), then comes the search strategy. This is actually similar to the operator in the condition (you can see the rest of the strategies here include/access/startnum.h ), after that we directly specify the oid of the function that will perform our comparison operation (these oid are declared in the file include/utils/fmgroids.h ). Finally, our last argument is Datum which should contain the value we want to search for.
Next comes another new function index_rescan , and serves to start an index search. One index_beginscan is not enough here. This function takes as input a list of keys (we have only one), the number of those keys, then the keys to sort and the number of keys to sort (they are used for the condition ORDER BY which is not present in this example). All preparations are done and we can show the main loop, but it will be very similar to heap:

for (;;){HeapTuple local_tuple;ItemPointer tid = index_getnext_tid(indexScan, ForwardScanDirection);if (tid == NULL)break;local_tuple = index_fetch_heap(indexScan);heap_deform_tuple(local_tuple, table_heap-> rd_att, values, nulls);elog(WARNING, "Test id:%i nick:%s", DatumGetInt32(values[0]), DatumGetCString(PG_DETOAST_DATUM(values[1])));}

Since we are now running on an index and not on the heap itself, we get ItemPointer, a special pointer to an entry in the index, (if you are interested in the details, see the corresponding documentation www.postgresql.org/docs/9.5/static/storage-page-layout.html or directly to the file include/storage/bufpage.h ), using which we also have to get the tuple from the heap.In this loop index_getnext_tid is functionally similar to heap_getnext and only index_fetch_heap is added, and the rest is completely similar.
To finish our operation, as you can guess, we need to close the index search, the index itself and the open heap:

index_endscan(indexScan);index_close(table_idx_rel, heap_lock);heap_close(table_heap, heap_lock);

Changing data

So, we learned how to do SeqScan and IndexScan, i.e.to search our table and even use an index to do it, but now how do we add something to it? To do this, we will need the following functions simple_heap_insert and index_insert
Before you change anything in the table and in the linked indexes, you have to open them with the necessary locks, in the way shown earlier, then you can paste :

values[0] = Int32GetDatum(42);values[1] = CStringGetDatum("First q");tuple = heap_form_tuple(table_heap-> rd_att, values, nulls);simple_heap_insert(table_heap, tuple);index_insert(table_idx_rel, values, nulls, (tuple-> t_self), table_heap, UNIQUE_CHECK_NO);

Here we do the reverse operation, i.e. we form a tuple from the values and nulls arrays,
and then we add it to the heap and then add the corresponding record to the index. After the previous explanations this code should be clear to you.
To update the tuple you need to do the following :

values[0] = Int32GetDatum(42);replaces[0] = true;newtuple = heap_modify_tuple(local_tuple, RelationGetDescr(table_heap), values, nulls, replaces);simple_heap_update(table_heap, newtuple-> t_self, newtuple);

We get an array of boolean variables replaces, which stores information about which field has changed. Then we form a new tuple based on the old one, but with our modifications with heap_modify_tuple And at the very end, we do the update itself simple_heap_update Since we have a new tuple and the old one is marked as deleted, we also need to add an entry to the index for the new tuple in the way shown earlier.
Now it is easy to delete tuple with the function simple_heap_delete and it is self-explanatory. It should be noted that it is not necessary to delete the corresponding record in the index, this will happen automatically when you do the cleanup operation VACUUM


We have learned how to access a table from the C code, including the index. I tried to describe each function and its purpose in detail, but, if something was not clear, please ask in the comments, I will try to answer and supplement the article.

You may also like