In coding, introspection refers to the ability of some systems to query and expose information on their own structure. Typical examples are being able to query an object’s methods or properties (eg. Python’s
In a DB system, it typically refers to the mechanism by which schema information regarding tables, attributes, foreign keys, indices, data types, etc… can be programmatically queried.
This is useful in many ways, eg:
In this article we will explore some options for making use of the introspection capabilities of PostgreSQL.
There are two main devices to query information of the objects defined in a Postgres database. The first one is the information schema, which is defined in the SQL standard and thus expected to be portable and remain stable, but cannot provide information about posgres-specific features. As with many aspects of the SQL standard, there are vendor-specific issues (most notably Oracle does not implement it out of the box). If you are using introspection as a part of a library, and do not need to get into postgres-specific information this approach gives you a better chance for future compatibility accross RDBMS and even PostgreSQL versions.
The other approach involves querying the so called System Catalogs. These are tables belonging to the
pg_catalog schema. For example, the
pg_catalog.pg_class (pseudo-)table catalogs tables and most everything else that has columns or is otherwise similar to a table (views, materialized or not…). This approach is version dependent, but I would be surprised to see major changes in the near future.
This is the approach we will be focusing on in this article, because the tooling and coding ergonomics from PostgreSQL are more convenient, as you will see in the nexts sections.
psql command-line client is a very powerful and often overlooked utility (as many other command_-line tools). Typing
\? after connecting will show a plethora of commands that let you inspect the DB. What most people do not know, though, is that these commands are implemented as regular SQL queries to the system catalogs and that you can actually see the code just by invoking the
psql client with the
-E option. For example:
PGPASSWORD=<password> psql -E -U <user> -h <host> <db>
And then typing for the description of the
pg__catalog.pg_class table itself:
********* QUERY ********** SELECT n.nspname as "Schema", c.relname as "Name", CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 'm' THEN 'materialized view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 's' THEN 'special' WHEN 'f' THEN 'foreign table' WHEN 'p' THEN 'partitioned table' WHEN 'I' THEN 'partitioned index' END as "Type", pg_catalog.pg_get_userbyid(c.relowner) as "Owner", pg_catalog.pg_size_pretty(pg_catalog.pg_table_size(c.oid)) as "Size", pg_catalog.obj_description(c.oid, 'pg_class') as "Description" FROM pg_catalog.pg_class c LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace WHERE c.relkind IN ('r','p','s','') AND n.nspname !~ '^pg_toast' AND c.relname OPERATOR(pg_catalog.~) '^(pg_class)$' AND n.nspname OPERATOR(pg_catalog.~) '^(pg_catalog)$' ORDER BY 1,2; ************************** List of relations Schema | Name | Type | Owner | Size | Description ------------|----------|-------|----------|--------|------------- pg_catalog | pg_class | table | postgres | 136 kB | (1 row)
Gives you a quite descriptive (and corner-case complete) template to start your own code from. For example, in the former query we could replace the
^(pg_class)$ regex with some other. Bear in mind that this trick is only helpful with the system catalog approach.
Many objects in the system catalogs have some sort of “unique id” in the form of an
oid attribute. It is sometimes convenient to know that you can turn descriptive names into such
oids by casting into the
regclass data type.
For example, in a somewhat circular turn of events, the attributes of the catalog table storing attribute information can be queried by name as:
In the result of that query, we can see that attrelid should be an
attnum | attname | Type -----------|---------------|----------- 1 | attrelid | oid 2 | attname | name ... 20 | attoptions | text 21 | attfdwoptions | text
without the “regclass” cast, querying by name would mean joining with the
pg_class and filtering by name. There are other types that will get you an oid from a string description for other objects (
regprocedure for procedures,
regtype for types, …).
Another interesting utility for the
pg_catalog approach is the ability to translate definitions into SQL DDL. We saw one of them (
format_type) in the previous example, but there are many of them (constraints, function source code …).
Just refer to the section in the manual for more.
As a sidenote, it might be useful to know that we can inspect the data types of any provided query by pretending to turn it into a temporary table. This might be useful for user-provided queries in external tools (injection caveats apply)…
As usual, good SW practices apply to DB code, too, and it is easy to isolate any incompatible code just by defining a clear interface in your library: instead of querying for the catalog everywhere, define just a set of views or functions that expose the introspection information to the rest of your code and work as an API. This way, any future change in system catalogs will not propagate further than those specific views. For example, if your application needs to know about tables and attribute data types, instead of querying the catalog from many places, define a view that works as in interface between the system catalogs and your code. As an example:
CREATE OR REPLACE VIEW table_columns AS WITH table_oids AS ( SELECT c.relname, c.oid FROM pg_catalog.pg_class c LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace WHERE pg_catalog.pg_table_is_visible(c.oid) AND relkind = 'r'), column_types AS ( SELECT toids.relname AS "tablename", a.attname as "column", pg_catalog.format_type(a.atttypid, a.atttypmod) as "datatype" FROM pg_catalog.pg_attribute a, table_oids toids WHERE a.attnum > 0 AND NOT a.attisdropped AND a.attrelid = toids.oid) SELECT * FROM column_types;
I will be assembling some such utility views I find useful in the future in this gist.
UPDATE Dec. 15th 2022: For any real use case, check syonfox’s solution (see comments) documented here. It is way more powerful than my solution above, which I’ll only leave here just to keep things simple in this article.