Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query endpoint for SnowparkTableDataset #721

Open
ElenaKhaustova opened this issue Jun 6, 2024 · 3 comments
Open

Query endpoint for SnowparkTableDataset #721

ElenaKhaustova opened this issue Jun 6, 2024 · 3 comments

Comments

@ElenaKhaustova
Copy link
Contributor

Description

SnowparkTableDataset dataset configuration does not have a query endpoint, so running database-level SQL queries is not possible at the catalog level. Thus users have to make it at the level of the database - at first, execute query to filter data and only after run a Kedro pipeline. Users expect it to work similar to SQLQueryDataset and GBQQueryDataset where they have a query endpoint.

https://docs.kedro.org/projects/kedro-datasets/en/kedro-datasets-3.0.1/api/kedro_datasets.snowflake.SnowparkTableDataset.html

We propose to:

  1. Explore the feasibility of adding a query endpoint in dataset configuration.
  2. Enhance documentation with tutorials and working examples of how to run SQL queries with Ibis in such cases instead: https://kedro.org/blog/sql-data-processing-in-kedro-ml-pipelines.

Context

  • "If I had a query functionality here, then I would have just put that query here and run it from the catalog."

Screenshot 2024-06-06 at 15 00 21

@merelcht
Copy link
Member

This seems very specific to the SnowparkTableDataset, so I personally wouldn't tackle this as part of the other catalog work. I'll move it to the kedro-plugins repo under the individual dataset improvements milestone.

@merelcht merelcht transferred this issue from kedro-org/kedro Jun 10, 2024
@merelcht merelcht added the help wanted Contribution task, outside help would be appreciated! label Jun 10, 2024
@ElenaKhaustova
Copy link
Contributor Author

After the discussion with the team, we've decided to look through similar datasets to check if it makes sense to extend their configuration with a query endpoint. As a potential solution to this issue, we can consider adding SnowparkQueryDataset with query endpoint.

@merelcht merelcht removed the help wanted Contribution task, outside help would be appreciated! label Sep 16, 2024
@deepyaman
Copy link
Member

Enhance documentation with tutorials and working examples of how to run SQL queries with Ibis in such cases instead: https://kedro.org/blog/sql-data-processing-in-kedro-ml-pipelines.

@ElenaKhaustova @merelcht I'm going to move this into a new issue for tracking purposes (just found it while searching the issue tracker for Ibis).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

3 participants