The standard way of copying databases (or just tables) between PostgreSQL servers seems to be through backup (with its many options). This article describes another way that I have been using lately using foreign data wrappers. The two servers need to be able to connect to each other (albeit only during the synchronization time), it does not need shell access to any of them and avoids generating intermediate files. The steps involved are:
Foreign Data Wrappers are a mechanism that allow presenting external data sources as PostgreSQL tables. Note that it is not limited to foreign PostgreSQL databases: there are foreign data wrappers for other DB servers and other sources of data, including CSV files and even Twitter streams. Once the data is presented into Postgres, all the power of SQL becomes available for your data, so they are quite a feature for data management and integration.
FDWs are installed as extensions for every kind of data source. Of course there is one FDW for connecting to external PostgreSQL databases in the standard distribution:
CREATE EXTENSION postgres_fdw;
Once the extension is installed, the remote server needs to be set up:
CREATE SERVER remote_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'host_or_ip', dbname 'db_name');
In order to allow for a greater flexibility in terms of permissions, the remote server needs a user mapping, wich will map users between the source and target servers. For every mapped user, the following sentence should be used:
CREATE USER MAPPING FOR postgres SERVER perry OPTIONS (password 'pwd', "user" 'postgres');
Once the database and server are linked, we can start creating the foreign tables. Notice that foreign tables are just “proxies” for the external tables (think of them as symlinks in the file system or pointers in a programming language). That means creating them is just a matter of defining their structure, no data is transferred, and hence should be fast. The downside is that the description for the foreign tables has to be written in the target server (much like writing the table create script).
In order to make the process easier, PostgreSQL has a command that will just import the foreign structure through:
IMPORT FOREIGN SCHEMA source_schema FROM SERVER source_server INTO proxy_schema;
If you just want to import some tables of the schema, you can use:
IMPORT FOREIGN SCHEMA public LIMIT TO ( table1, table2 ) FROM SERVER source_server INTO proxy_schema;
Just refer to the [documentation] for other options.
You can verify that the tables have been imported typing
\det inside the
As stated before, foreign tables are just a proxy for the real data. In order to be able to work independently of the source server, actual data needs to be copied. The easiest way to do so in order to be able to update the data is through materialized views. You can think of them as new tables with a refresh mechanism. In particular, that means that the original indices over the data will be lost, so new indices should be created.
CREATE MATERIALIZED VIEW view_name AS SELECT * FROM proxy_schema.table_name;
Whenever the data needs to be refreshed, just:
REFRESH MATERIALIZED VIEW view_name;
The following commands might be useful if you use the
\det <pattern>lists foreign tables
\des <pattern>lists foreign servers
\deu <pattern>lists user mappings
\dew <pattern>lists foreign-data wrappers
\dm <pattern>list marterialized views
Depending on how many tables you wish to import, something along the following anonymous code block might be useful. It just creates the materialized views and indices, but it can be adapted to whatever is needed.
DO $$ DECLARE r record; BEGIN FOR r IN SELECT tname FROM (VALUES ('table1'), ('table2'), ('...'), ('tableN')) AS x(tname) LOOP -- SQL automatically concatenates strings if there is a line separator in between EXECUTE format('CREATE MATERIALIZED VIEW IF NOT EXISTS %s AS ' 'SELECT * FROM proxy_schema.%s', r.tname, r.tname); -- Index by geometry (Postgis), just an example EXECUTE format('CREATE INDEX IF NOT EXISTS sidx_%s ON %s USING GIST (geom)', r.tname, r.tname); END LOOP; END$$
This method can be convenient if data has to be synced frequently, as it just boils down to refreshing the materialized view. This can also be used from within other pgplsql functions, and needs no external tools (apart from a client) or intermediate files.
It is not for every situation, though. In particular, the data in materialized views are not backed up (
psql generates the view create script and performs a
REFRESH). That means that if the original server is unavailable at restore time, data will be lost. This can be avoided by using regular proxy tables instead of materialized views.