[WIP] - Partitioning support #78

ggam · 2024-05-01T11:29:46Z

resolves #168

Problem

This is a draft PR adding partitioning support for Postgres as stated on #168. It doesn't use dbt coding best practices and lacks, but it works. I'm opening it to gather feedback on how to proceed.

Syntax is copied from on dbt-bigquery partitioning:

{{ config(
    materialized = 'table',
    partition_by = {
      "field": "created_at",
      "granularity": "month"
    }
) }}

select generate_series(current_date - interval '1000 day', current_date, '1 month'::interval)::date as created_at,
        'hello' as dummy_text

Benefits this brings:

Postgres doesn't allow to partition an existing table. If dbt creates an initially partitioned table, a DBA can then manually tune them or use pg_partman extension over it. That's not possible unless dbt is able to create partitioned tables.
When using insert+write incremental strategy, you can change the access method of old partitions to Citus columnar or Hydra, which don't allow deletes or updates. Incremental methods are currently not currently usable on those environments.

Moreover, creating a custom incremental strategy replacing partitions is then relatively easy as shown in #168. That incremental strategy is out of scope for this PR as there are a lot of possible variations.

IMO this patch is already in a stage that can be reviewed. I would help on some best practices and adding tests though.

Solution

To Do:

Contract config is not enforced
Tests
Partitioning means executing the SQL model twice, or temporarily storing it twice. This need to be documented.
Partition names concatenate the start day of the partition (regardless of granularity). Names longer than the maximum 64 characters are still not handlded correctly.
Add a default partition to handle uncovered ranges. Without specific partition support on incremental strategies, one must handle creation of future partitions. This can be done via a cron job or manually. Having a default partition means inserts won't fail. Handle incremental partitions
handle adding of new fields to the parent table

Checklist

I have read the contributing guide and understand what's expected of me
I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX

ggam · 2024-05-01T11:34:14Z

dbt/include/postgres/macros/adapters.sql

-      {{ adapter.dispatch('get_column_names', 'dbt')() }}
-    )
-    {%- set sql = get_select_subquery(sql) %}
+  {% if config.get('partition_by') != None %}


I separated the logic for partitioning as it's easier to read. When partioning is not enabled, the code is exactly the same as before.

ggam · 2024-05-01T11:35:56Z

dbt/include/postgres/macros/adapters.sql

+    alter table {{ from_relation }} rename to {{ target_name }};
+
+    {# If the relation is partitioned, rename the subtables #}
+    {% set existing_partitions_query %}


Partitions are created with the __dbt_tmp suffix and so they need to be individually renamed. Sadly I don't think the config variable is available here so I don't know if there's a way to determine if the relation is partitioned.

HyuHiguchi · 2024-08-02T21:36:01Z

dbt/include/postgres/macros/materializations/incremental_strategies.sql

+{% endmacro %}
+
+{% macro postgres__get_incremental_delete_insert_sql(arg_dict) %}
+  {{ create_incremental_missing_partitions(arg_dict) }}


This was parsed but not executed.
When I wrote the following, it was executed and the partition was created.

{% do return( create_incremental_missing_partitions(arg_dict) + default__get_incremental_delete_insert_sql(arg_dict) ) %}

AniPatel · 2024-10-09T11:51:11Z

Do you have any plans for when these changes will be available in the master branch?

cla-bot bot added the cla:yes label May 1, 2024

ggam force-pushed the main branch from 135d274 to 23a0854 Compare May 1, 2024 11:30

ggam mentioned this pull request May 1, 2024

Add Table Partitioning Option for PostgreSQL #168

Open

ggam commented May 1, 2024

View reviewed changes

ggam force-pushed the main branch from dbd2a9a to 1bf527c Compare May 1, 2024 16:44

WIP - Partitioning support

b46ab1c

ggam force-pushed the main branch from 1bf527c to b46ab1c Compare May 1, 2024 16:45

ggam added 2 commits May 1, 2024 18:55

refactor

dc630af

Rough handling of incremental partitions

8361eeb

njuguna-n mentioned this pull request May 17, 2024

Improve BRAC model performance medic/cht-pipeline#82

Closed

HyuHiguchi reviewed Aug 2, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] - Partitioning support #78

[WIP] - Partitioning support #78

ggam commented May 1, 2024 •

edited

Loading

ggam May 1, 2024

ggam May 1, 2024

HyuHiguchi Aug 2, 2024

AniPatel commented Oct 9, 2024

[WIP] - Partitioning support #78

Are you sure you want to change the base?

[WIP] - Partitioning support #78

Conversation

ggam commented May 1, 2024 • edited Loading

Problem

Solution

Checklist

ggam May 1, 2024

Choose a reason for hiding this comment

ggam May 1, 2024

Choose a reason for hiding this comment

HyuHiguchi Aug 2, 2024

Choose a reason for hiding this comment

AniPatel commented Oct 9, 2024

ggam commented May 1, 2024 •

edited

Loading