Join gathers keys from multiple tables, and then uses those keys to perform a target action.
< schema.json denorm create-join > output.sql
For a full working example, see Join example.
See Join config for documentation generated from JSONSchema.
consistency
There are two modes. The default is immedidate.
"immediate"
The target is updated at the end of the statement.
"deferred"
The target is updated at the end of the transaction.
Deferring work involves overhead. It can be useful in a couple cases
- Deduplicating work. For example, if multiple "levels" of records (e.g. a record and children and grandchildren) are affected in the same transaction.
- Reducing lock duraton on target records.
Context propogates PostgreSQL settings through async joins.
Recall that custom settings must be two-part dot-delimited. See Customized Options.
id
The ID is used to name database objects. To prevent naming conflicts, the ID should be unique within a schema.
The target primary key is specified by key
, which is an array of the column
names and types.
[{ "name": "id", "type": "bigint" }]
If targetTarget.key
is specified, key
can be ommitted, and inferred from
that.
lock
Whether to use a lock table.
Using upserts in REPEATABLE READ
transactions and multi-table joins can be
susceptible to ordering conditions.
To prevent these, denorm can use a value lock table on the target key.
schema
If specified, created generated objects in the this schema. If not specified, objects are created and referenced without schema qualifiers.
tables
A map of IDs to source table definitions.
targetQuery
This query the gathered keys. Placeholders take the form of ${key}
. Literal
$
characters are escaped as $$
.
Placeholders:
key
- table (or parenthesized table expression) of keystable
- table name, or empty if deferred
"SELECT * FROM ${key}"
targetTable
See Target table.
The target table that will receive changes.
columns
The column names, in the same order as returned by the target query.
key
The column names of the unique key of the table. Used for asynchronous joins.
schema
The schema name of the table. If not specified, the table is referenced without qualification.
table
The table name.
The source table from which changes will be propogated.
Either the target key (direct relationship with the target) or the join (transitive relationship with the target) must be specified.
The 16-bit advisory lock prefix to use for queueing, if the join is asynchronous. By default, the lock space is generated from the ID and table ID.
The base lock ID is in the comment on the lock
column.
join
The ID of the table to join to.
joinMode
There are two modes for joining tables
"sync"
Join to all dependency records in the current transaction.
"async"
If tables have an 1:N relationship with a very large N — say, tens of thousands — it may not be feasible to process all records a single transaction. Denorm allows the join to happen over multiple transactions.
Use columns
for the relevant data on the table, and joinKey
to indicate a
unique key on the foreign table. These are used to track the iteration state.
(See comments in Performance section).
Building on the book example, suppose each book had a genre, and the genre's
name is to be included in the target table. A genre may have hundreds of
thousands of books, so we'll chunk updates to genre
by iterating through
related book
records by id
.
book_full.yml
tables:
author:
join: author.id = book_author.author_id
joinDep: book_author
name: book_author
schema: public
book:
columns:
- name: id
name: book
schema: public
targetKey: [book.id]
book_author:
name: book_author
schema: public
targetKey: [book_author.book_id]
genre:
join: book
joinMode: async
joinOn: book.genre_id = genre.id
joinKey: [id]
key: [id]
name: genre
schema: public
See additional comments in Asynchronous joins.
joinOn
The conditional expression for joining.
joinOther
Expressions to add to the joins, before joining to this table. This can add extra context. Care should be taken to ensure that tables referenced here are tracked and monitored for changes elsewhere.
table
The table name. If not specified, the table is a "pseudo table." This is usefull for asynchronous backfills.
targetKey
SQL expressions for the target key.
schema
The schema name. If unspecified, the table is referenced without schema qualification.
For asynchronous joins, updates will not automatically affect the target table.
Instead, the state of the join is tracked in a table (e.g.
book_full__que__genre
) and must be processed by a worker.
-- Find an incomplete author change and refresh the target for up to 1000 corresponding
-- book_author records.
-- Return whether additional processing remains to be done.
SELECT book_full__pcs__genre(1000);
This function should be called periodically. For lower latency updates, workers
can listen to the public.book_full__que__genre
topic which notified whenever a
join requires processing.
Errors in updating the target no longer fail the original transaction. Ensure that the query does not have errors, else they will halt asynchronous updates.
The dependency table should have an btree index that covers the foreign key and its own unique key, in that order.
In the earlier example, that index would be:
CREATE INDEX ON book (genre_id, id);
Take careful note of this requirement, as such indices do not usually include the second part (a unique key on the table itself). However, this is essential for good performance as it allows the join to continue where it left off, without unnecessary scans.
Denorm can be leveraged to create an asynchronous fill of the entire table.
Add a tables entry (suggested name: all) with join
, joinMode: async
, and
joinKey
.
book_full.yml
tables:
all:
join: book
joinMode: async
joinKey: [id]
author:
join: book_author
joinOn: author.id = book_author.author_id
name: author
book:
name: book
targetKey: [book.id]
book_author:
name: book_author
targetKey: [book_author.book_id]
name: book_author
Then run
SELECT test__rfs__all();
and after successive test__pcs__all()
, the table will be backfilled/refreshed.