Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kedro Viz & DeltaTableDataset: Unable to get file size. Object has no attribute '_protocol' #1893

Closed
julio-cmdr opened this issue May 7, 2024 · 7 comments
Assignees
Labels
Issue: Bug Report Python Pull requests that update Python code Requires user input

Comments

@julio-cmdr
Copy link

Description

Hello!
I'm using a pandas.DeltaTableDataset and I'm getting the warning: Unable to get file size for the dataset DeltaTableDataset(...): 'DeltaTableDataset' object has no attribute '_protocol'.
I think this warning has been raised by kedro-viz.

After disabling auto-registered kedro-viz’ Hooks in settings.py, no further warnings are raised.

Steps to Reproduce

  1. Set a DeltaTableDataset in catalog.yml and use it as a node input.

Your Environment

  • Windows
  • kedro-viz 9.0.0
  • kedro 0.19.3
  • kedro-datasets[pandas.DeltaTableDataset] 2.1.0
  • Python 3.11
@ravi-kumar-pilla ravi-kumar-pilla added the Python Pull requests that update Python code label Jul 1, 2024
@astrojuanlu
Copy link
Member

Hi @julio-cmdr, sorry for the slow reply. It's been a while, but do you think you could go back and give a full traceback of the error you get? Would help us understand how to fix it.

@astrojuanlu
Copy link
Member

Warning comes from here

try:
file_path = get_filepath_str(
PurePosixPath(dataset._filepath), dataset._protocol
)
return dataset._fs.size(file_path)
except Exception as exc:
logger.warning(
"Unable to get file size for the dataset %s: %s", dataset, exc
)

@julio-cmdr
Copy link
Author

Hi @SajidAlamQB.
A lot of datasets have an attribute called "_protocol". But DeltaTableDataset doesn't have it (check source code in here). I think this is the reason why this warning has been thrown

@ravi-kumar-pilla
Copy link
Contributor

Hi @SajidAlamQB , I was responsible for using the private method to get the file size in the stats hook. I would request you to check if there are any alternatives in determining the file size and also refactoring the stats hook to avoid using private attributes. Thank you

@SajidAlamQB
Copy link
Contributor

Thank you, I have a PR that will attempt to fix this, #2174.

@astrojuanlu
Copy link
Member

astrojuanlu commented Nov 5, 2024

In line with what @ravi-kumar-pilla asked, is there a way we can make this a public, optional behavior of datasets?

For example, datasets now have a preview method. They could have a get_info method or similar.

Notice that in #1714 a user suggested using _describe for this. We ended up adding a preview method, but maybe we should try to unify all this "meta" information instead of adding more ad-hoc public methods.

@SajidAlamQB
Copy link
Contributor

Completed in: #2174

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Bug Report Python Pull requests that update Python code Requires user input
Projects
Status: Done
Development

No branches or pull requests

5 participants