Foreign Data Wrappers for Data Synchronization

Posted on October 2, 2018

The standard way of copying databases (or just tables) between PostgreSQL servers seems to be through backup (with its many options). This article describes another way that I have been using lately using foreign data wrappers. The two servers need to be able to connect to each other (albeit only during the synchronization time), it does not need shell access to any of them and avoids generating intermediate files. The steps involved are:

  1. Install the foreign data wrapper extension for your database (just once in the target server).
  2. Setup the foreign server connection from in the target server pointing to the source server.
  3. Setup a user mapping.
  4. Import the foreign tables (or schema) into a “proxy” schema.
  5. Create materialized views for the desired tables

Install the Foreign Data Wrapper

Foreign Data Wrappers are a mechanism that allow presenting external data sources as PostgreSQL tables. Note that it is not limited to foreign PostgreSQL databases: there are foreign data wrappers for other DB servers and other sources of data, including CSV files and even Twitter streams. Once the data is presented into Postgres, all the power of SQL becomes available for your data, so they are quite a feature for data management and integration.

FDWs are installed as extensions for every kind of data source. Of course there is one FDW for connecting to external PostgreSQL databases in the standard distribution:

CREATE EXTENSION postgres_fdw;

Creating the server

Once the extension is installed, the remote server needs to be set up:

CREATE SERVER remote_server
  FOREIGN DATA WRAPPER postgres_fdw
  OPTIONS (host 'host_or_ip', dbname 'db_name');

Creating the user mapping

In order to allow for a greater flexibility in terms of permissions, the remote server needs a user mapping, wich will map users between the source and target servers. For every mapped user, the following sentence should be used:

CREATE USER MAPPING FOR postgres SERVER perry
    OPTIONS (password 'pwd', "user" 'postgres');

Create the foreign tables

Once the database and server are linked, we can start creating the foreign tables. Notice that foreign tables are just “proxies” for the external tables (think of them as symlinks in the file system or pointers in a programming language). That means creating them is just a matter of defining their structure, no data is transferred, and hence should be fast. The downside is that the description for the foreign tables has to be written in the target server (much like writing the table create script).

In order to make the process easier, PostgreSQL has a command that will just import the foreign structure through:

IMPORT FOREIGN SCHEMA source_schema FROM SERVER source_server INTO proxy_schema;

If you just want to import some tables of the schema, you can use:

IMPORT FOREIGN SCHEMA public LIMIT TO 
( table1, table2 )
FROM SERVER source_server INTO proxy_schema;

Just refer to the [documentation] for other options.

You can verify that the tables have been imported typing \det inside the psql cli.

Instantiate materialized views

As stated before, foreign tables are just a proxy for the real data. In order to be able to work independently of the source server, actual data needs to be copied. The easiest way to do so in order to be able to update the data is through materialized views. You can think of them as new tables with a refresh mechanism. In particular, that means that the original indices over the data will be lost, so new indices should be created.

CREATE MATERIALIZED VIEW view_name AS SELECT * FROM proxy_schema.table_name;

Whenever the data needs to be refreshed, just:

REFRESH MATERIALIZED VIEW view_name;

In the command-line client

The following commands might be useful if you use the psql client>

  • \det <pattern> lists foreign tables
  • \des <pattern> lists foreign servers
  • \deu <pattern> lists user mappings
  • \dew <pattern> lists foreign-data wrappers
  • \dm <pattern> list marterialized views

Helper function

Depending on how many tables you wish to import, something along the following anonymous code block might be useful. It just creates the materialized views and indices, but it can be adapted to whatever is needed.

DO $$
DECLARE r record;
BEGIN
    FOR r IN SELECT tname FROM (VALUES 
            ('table1'),
            ('table2'), 
            ('...'), 
            ('tableN')) AS x(tname)
    LOOP
        -- SQL automatically concatenates strings if there is a line separator in between
        EXECUTE format('CREATE MATERIALIZED VIEW IF NOT EXISTS %s AS '
            'SELECT * FROM proxy_schema.%s',
             r.tname, r.tname);
        -- Index by geometry (Postgis), just an example
        EXECUTE format('CREATE INDEX IF NOT EXISTS sidx_%s ON %s USING GIST (geom)',
             r.tname, r.tname);
    END LOOP;
END$$

Conclusion and final warnings

This method can be convenient if data has to be synced frequently, as it just boils down to refreshing the materialized view. This can also be used from within other pgplsql functions, and needs no external tools (apart from a client) or intermediate files.

It is not for every situation, though. In particular, the data in materialized views are not backed up (psql generates the view create script and performs a REFRESH). That means that if the original server is unavailable at restore time, data will be lost. This can be avoided by using regular proxy tables instead of materialized views.