Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] - Partitioning support #78

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

[WIP] - Partitioning support #78

wants to merge 3 commits into from

Conversation

ggam
Copy link

@ggam ggam commented May 1, 2024

resolves #168

Problem

This is a draft PR adding partitioning support for Postgres as stated on #168. It doesn't use dbt coding best practices and lacks, but it works. I'm opening it to gather feedback on how to proceed.

Syntax is copied from on dbt-bigquery partitioning:

{{ config(
    materialized = 'table',
    partition_by = {
      "field": "created_at",
      "granularity": "month"
    }
) }}

select generate_series(current_date - interval '1000 day', current_date, '1 month'::interval)::date as created_at,
        'hello' as dummy_text

Benefits this brings:

  • Postgres doesn't allow to partition an existing table. If dbt creates an initially partitioned table, a DBA can then manually tune them or use pg_partman extension over it. That's not possible unless dbt is able to create partitioned tables.
  • When using insert+write incremental strategy, you can change the access method of old partitions to Citus columnar or Hydra, which don't allow deletes or updates. Incremental methods are currently not currently usable on those environments.

Moreover, creating a custom incremental strategy replacing partitions is then relatively easy as shown in #168. That incremental strategy is out of scope for this PR as there are a lot of possible variations.

IMO this patch is already in a stage that can be reviewed. I would help on some best practices and adding tests though.

Solution

To Do:

  • Contract config is not enforced
  • Tests
  • Partitioning means executing the SQL model twice, or temporarily storing it twice. This need to be documented.
  • Partition names concatenate the start day of the partition (regardless of granularity). Names longer than the maximum 64 characters are still not handlded correctly.
  • Add a default partition to handle uncovered ranges. Without specific partition support on incremental strategies, one must handle creation of future partitions. This can be done via a cron job or manually. Having a default partition means inserts won't fail. Handle incremental partitions
  • handle adding of new fields to the parent table

Checklist

  • I have read the contributing guide and understand what's expected of me
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX

{{ adapter.dispatch('get_column_names', 'dbt')() }}
)
{%- set sql = get_select_subquery(sql) %}
{% if config.get('partition_by') != None %}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I separated the logic for partitioning as it's easier to read. When partioning is not enabled, the code is exactly the same as before.

alter table {{ from_relation }} rename to {{ target_name }};

{# If the relation is partitioned, rename the subtables #}
{% set existing_partitions_query %}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partitions are created with the __dbt_tmp suffix and so they need to be individually renamed. Sadly I don't think the config variable is available here so I don't know if there's a way to determine if the relation is partitioned.

{% endmacro %}

{% macro postgres__get_incremental_delete_insert_sql(arg_dict) %}
{{ create_incremental_missing_partitions(arg_dict) }}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was parsed but not executed.
When I wrote the following, it was executed and the partition was created.

  {% do return(
create_incremental_missing_partitions(arg_dict) + default__get_incremental_delete_insert_sql(arg_dict)
) %}

@AniPatel
Copy link

AniPatel commented Oct 9, 2024

Do you have any plans for when these changes will be available in the master branch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Table Partitioning Option for PostgreSQL
3 participants