Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Implementation] define snapshots as yml #10368

Closed
graciegoheen opened this issue Jun 26, 2024 · 3 comments
Closed

[Implementation] define snapshots as yml #10368

graciegoheen opened this issue Jun 26, 2024 · 3 comments
Labels
enhancement New feature or request snapshots Issues related to dbt's snapshot functionality

Comments

@graciegoheen
Copy link
Contributor

graciegoheen commented Jun 26, 2024

Describe the feature

From #10246

Provide a new way for folks to define snapshots, that don't require the jinja block.

Acceptance criteria

  • I can define my snapshot as a top-level yml config
# snapshots/my_snapshots.yml
snapshots:
  - name: orders_snapshot
    config:
      tags: finance
    from: source('jaffle_shop', 'orders') # could also be a ref
    unique_key: id
    strategy: timestamp
    updated_at: updated_at
  • from: indicates which resource i'm snapshotting, it can be a source or a ref; the snapshot "logic" is then select * from whatever from: is; you can snapshot an ephemeral model if you want your snapshot to contain additional logic
  • we support historic way of defining snapshots for backwards compatability

Open questions

  • snapshots could be defined in yml in models folder (like sources) to allow for better organization (currently silo-d in snapshots folder) or they could remain in the snapshots folder
@graciegoheen graciegoheen added enhancement New feature or request triage labels Jun 26, 2024
@graciegoheen graciegoheen changed the title [Feature] define snapshots as yml [Implementation] define snapshots as yml Jun 26, 2024
@graciegoheen graciegoheen added snapshots Issues related to dbt's snapshot functionality user docs [docs.getdbt.com] Needs better documentation and removed triage labels Jun 26, 2024
@jenna-jordan
Copy link

Personally, I think that snapshots could be defined in the yaml block of the source or model they are on top of. So something like:

sources:
  - name: my_source
    database: database_here
    schema: schema_here
    tables:
      - name: my_source_table_i_want_to_snapshot
        description: "A source table that needs historical data preserved"
        snapshot: true # false by default
        config:
          unique_key: id
          strategy: timestamp
          updated_at: updated_at

and

models:
  - name: my_model_i_want_to_snapshot
    description: "A model that I need historical data preserved for"
    latest_version: 1
    snapshot: true # false by default
    config:
      contract:
        enforced: true
      unique_key: id
      strategy: timestamp
      updated_at: updated_at
    versions:
      - v: 1

Then you could move snapshots to functioning more like versions. See: #7018 (comment)

For me the most important thing to preserve is that snapshots remain a way to capture historical data, are never dropped as part of the dbt build, and can stay outside of the idempotency paradigm of dbt models. But otherwise, I think the developer flow introduced with model versions is the best path for a next generation of dbt snapshots. Model versions should also be accounted for in snapshots that function this way - you won't want a snapshot per model version, but you will want to denote what version a model is on (latest_version) when a snapshot was taken (so, an additional metadata field).

@tommyh
Copy link

tommyh commented Aug 23, 2024

I think the # snapshots/my_snapshots.yml direction makes a lot of sense.

A few things I'm curious about:

  1. Could it support a unique_key which is not a column of the source model? ie: unique_key: "{{ dbt_utils.generate_surrogate_key('field_a', 'field_b') }}"? Or some way that surrogate_key could be added as a column to the snapshot table?

  2. When I am snapshotting multiple tables from the same source, the majority of the snapshot config is the same between those models. Would I be able todo this with the new implementation:

# dbt_project.yml
snapshots:
  my_dbt_project:
    jaffle_shop:
       +unique_key: id
       +strategy: timestamp
       +updated_at: updated_at

# snapshots/my_snapshots.yml
snapshots:
  - name: orders_snapshot
    config:
      tags: finance
    from: source('jaffle_shop', 'orders')

  - name: customers_snapshot
    config:
      tags: finance
    from: source('jaffle_shop', 'orders')
    unique_key: customer_id # override the default
  1. Would I be able to define a .sql file for very custom snapshot (similar to tests vs custom generic tests)?

@graciegoheen graciegoheen removed the user docs [docs.getdbt.com] Needs better documentation label Oct 17, 2024
@graciegoheen
Copy link
Contributor Author

Completed #10151

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request snapshots Issues related to dbt's snapshot functionality
Projects
None yet
Development

No branches or pull requests

3 participants