Note: most of the content of this article is already in this blog post, which is far more detailed. I’m just writing this for people who want to get directly to the solution and do not care for Perl code.
As its name suggests, it is something that combines the functionality of an insert and an update.
bag data structure).
Well, first thing is to check your PostgreSQL version. Versions 9.5 will have this functionality built-in (as many other RDBMS do) through the use of the
INSERT ... ON CONFLICT clause. See the discussion.
At the time of writing this, 9.5 is not available, and anyway many of us are forced to work with legacy versions, so here I will discuss a couple of ways to achive this functionality. Notice that geting this right is not easy, since concurrency and transaction issues must be taken into account. In particular, doing this in your code:
Will not work even inside a transaction. The default isolation level for transactions will only guarantee that other connections do not see your intermediate changes and that all of your changes succed or fail at once. (That is, the A(tomic) and I(solated) in ACID). If two different transactions query the db concurrently for a new ID, they might both get the same ID, and the transaction will fail. Setting the transaction level to SERIALIZABLE would work (think of transactions running inside a mutual exclusion region), but the application code should be ready to retry the transaction.
So if you want a general solution that works for all of the 3 cases stated previously, you should probably write a stored procedure in the DB like the documentation suggests:
CREATE TABLE db (a INT PRIMARY KEY, b TEXT);
CREATE FUNCTION merge_db(key INT, data TEXT) RETURNS VOID AS
-- first try to update the key
UPDATE db SET b = data WHERE a = key;
IF found THEN
-- not there, so try to insert the key
-- if someone else inserts the same key concurrently,
-- we could get a unique-key failure
INSERT INTO db(a,b) VALUES (key, data);
EXCEPTION WHEN unique_violation THEN
-- Do nothing, and loop to try the UPDATE again.
SELECT merge_db(1, 'david');
SELECT merge_db(1, 'dennis');
This solution has the advantage of delegating db-specific code into the DB and simplifying the SQL queries on the application side.
If you know in advance that your process will not run concurrently, then all of the above can be greatly simplified. For example, if just want to make sure that a record exists in the db (and not fail with a duplicate key entry the second time), a combination of an INSERT and a SELECT will work, making use of the fact that INSERTS will accept a sub-SELECT instead of a VALUES clause:
… that is more concise, but not that different from checking the DB state inside the application code.