Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing method when running data-pipeline/caids/get_caids.py #1620

Open
ignatiusm opened this issue Sep 5, 2024 · 3 comments · May be fixed by #1621
Open

Missing method when running data-pipeline/caids/get_caids.py #1620

ignatiusm opened this issue Sep 5, 2024 · 3 comments · May be fixed by #1621

Comments

@ignatiusm
Copy link
Contributor

ignatiusm commented Sep 5, 2024

I'm trying to run data-pipeline/caids/get_caids.py with a different dataset, but am encountering an issue.

When running lines 139 to 147 of the script (all_part_urls, and completed_part_urls), Hail errors out with the message:

    await f.url() async for f in await fs.listfiles(sharded_vcf_url) if f.name().startswith("part-")
                                                                        ^^^^^^
AttributeError: 'GoogleStorageFileListEntry' object has no attribute 'name'. Did you mean: '_name'?

Looking at the hail source code on line 483 of hail/python/hailtop/aiocloud/aiogoogle/client/storage_client.py I can see that the GoogleStorageFileListEntry class indeed does not have an async name method.

It seems like you were able to run these scripts to create the gnomad_v4 version of CAIDS data set earlier this year. I note this PR where there were updates to the get_caids.py script (and mentions that there have been "a number of Hail utils that have either been changed, removed or replaced since its last update"). I'd be grateful for any suggestions you have for addressing this 😃

I'm using GCP infrastructure with python v3.11 and hail v0.2.132.

@ignatiusm
Copy link
Contributor Author

Ah! Found the fix 🥳 I'll make a PR now 😄

@ignatiusm ignatiusm linked a pull request Sep 5, 2024 that will close this issue
@rileyhgrant rileyhgrant self-assigned this Sep 5, 2024
@rileyhgrant
Copy link
Contributor

Heya @ignatiusm, hope things are well!

Thanks for filing this issue, and for finding a fix for it yourself! We currently use a prior version of hail (0.2.127), we found a workaround to pin this in our pipeline dependencies (our convenience deployment tool wrapped hailctl which ships with hail, leading to the need for a slight workaround).

It appears as though this particular issue was introduced at some point between hail v0.2.127, and hail v0.2.132. As such, we will merge and accept your contribution sometime down the line when we bump the hail version in our pipeline.

Let me know if that seems reasonable, and thanks again! :)

@ignatiusm
Copy link
Contributor Author

Hi @rileyhgrant 👋 - that sounds super sensible. No worries from me :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants