Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support pulling in multiple manifests from single bucket #31

Open
akromish opened this issue Feb 28, 2024 · 6 comments
Open

Support pulling in multiple manifests from single bucket #31

akromish opened this issue Feb 28, 2024 · 6 comments
Assignees
Labels
enhancement New feature or request

Comments

@akromish
Copy link

akromish commented Feb 28, 2024

Currently, dbt-loom supports pulling in a manifest from cloud storage using bucket name + object name.

However, for organizations with n number of dbt-core projects that need to peer with each other, adding an entry to each repo gets difficult. I propose that in the s3 and gcp clients, we add a method that allows for specifying just the bucket name. From there, dbt loom will iterate through all the manifests in the bucket and add them to the project.

I could take a first stab at implementing s3 version.

Edit: Would actually prefer trying this in artifiactory first if this is something we want to do. Can implement single and muli-manifest json pull from artifiactory

@nicholasyager nicholasyager added the triage This issue is being investigated label Feb 28, 2024
@nicholasyager nicholasyager self-assigned this Feb 28, 2024
@nicholasyager
Copy link
Owner

Hi @akromish! Thanks for making this issue.

Admittedly, I've not put too much thought into how dbt-loom ought to operate for large mesh topologies, particularly large meshes with a high degree of connectivity. Based on your comment around n projects needing to be added to multiple downstream configs, it makes me think of something like this (taken to an extreme, of course!)

flowchart
  a --> x
  b --> x
  c --> x

  a --> y
  b --> y
  c --> y

  a --> z
  b --> z
  c --> z
Loading

In this sort of paradigm, it would definitely make sense to move away from one-off ManifestReferece declarations towards an approach that expects the reference in the ManifestReference to return one or more manifest files. For a path type, this could include glob support. For S3 and GCP this could be a bucket and object key, or a bucket, prefix, and suffix.

In any case, I'd love to better understand what your project topology looks like, and if this thinking is in aligned with your needs.

@nicholasyager nicholasyager added the question Further information is requested label Feb 29, 2024
@akromish
Copy link
Author

akromish commented Feb 29, 2024

Hey, learned today that you can add diagrams to github comments lol!

So I see two cases where you might want to have multiple manifests pulled in:

  1. as you diagrammed, where there are top level projects, and then projects that import those top level projects

    This is the use case I'm interested in. For some context, what I want to achieve by doing this is to have one dbt repo on which I can use metricflow(mf) to query any metric in the data org

    flowchart TB
      a --> x
      b --> x
      c --> x
      d --> x 
      
      mf --> |query| x
      linkStyle 4 stroke-width:2px,fill:none,stroke-dasharray: 5 5;
    
    Loading
  2. use case where every repo is a sister repo

    This might be an unsupported use case, as I don't know how dbt would handle circular imports

    flowchart TB
        a --> b
        b --> a
    
    
    Loading

As you said, we would want ManifestReference to pull multiple files, or have collection of ManifestReferences.
I think bucket and prefix make sense, but do you think suffix will be needed as I think we can fetch only .json
from the dbt-loom side. Same question for glob.

Thanks!

@nicholasyager
Copy link
Owner

nicholasyager commented Mar 1, 2024

@akromish Thanks for the diagrams! 😍

Use case one definitely makes sense, and is really quite clever for bringing multiple project's semantic models into one project. I, too, am a little hesitant about use-case two. I believe (will have to confirm) that dbt-core 1.7.x allows circular dependencies at a project level, but not a model level (1.6.x did not allow circular project deps), so this should be doable. Edit: I was able to confirm that 1.7.x as of time of writing does not allow for circular project-level dependencies.

You've swayed me that this is useful functionality!

I think bucket and prefix make sense, but do you think suffix will be needed as I think we can fetch only .json
from the dbt-loom side. Same question for glob.

This is totally fair! My mind went to a scenario where people might modify the name of their manifest files. It can be added later if we need it.

If you're still up for it, I'd love to see what you come up with. I'm not particularly familiar with artifactory, but I'd be open to a contribution that provides support.

@geoHeil
Copy link

geoHeil commented Mar 4, 2024

I intend to sue dbt-loom in a context of dagster, dbt-core and branch deployments https://docs.dagster.io/dagster-cloud/managing-deployments/branch-deployments

individual domains will have their own dbt projects and for each one there would be a main/feature-xxx branch

it would be neat if such a branching could be supported natively - for now the consuming project needs to know the exact branch/key prefix when pulling in data from a feature branch of a still unfinished source/reference model i.e. perhaps during a teseting phase.

Here, also bringing all into 1 bucket plus the additional branching logic would be needed.

@nicholasyager
Copy link
Owner

Hi @akromish 👋🏻 Just checking in to see if you've run into any snags on this. Let me know if you'd like another set of 👀

@nicholasyager nicholasyager added enhancement New feature or request and removed question Further information is requested triage This issue is being investigated labels Mar 27, 2024
@akromish
Copy link
Author

Hey, sorry got tied up with some other things, let me try to get a PR out next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants