Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sites dataset prep pipeline (subset data) #3

Open
2 of 7 tasks
ignatiusm opened this issue Aug 20, 2024 · 0 comments
Open
2 of 7 tasks

Sites dataset prep pipeline (subset data) #3

ignatiusm opened this issue Aug 20, 2024 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@ignatiusm
Copy link

ignatiusm commented Aug 20, 2024

Description

For Autism CRC data, we need a pipeline to create the required input data. This involves converting gnomad_qc v2 data prep steps into a form that they can be applied using a dataproc cluster, similar to other gnomAD data steps.

Definition of done

  • ensure Matthew has access to Garvan forks of gnomad_qc, gnomad_methods and gnomad-browser repos
  • have meeting with Matthew re Nature paper and methods descriptions
  • converted the gnomad_qc v2 data prep steps into functions
  • created a gnomad-browser pipeline that can run v2 data prep steps on subsetted data
  • Have matthew review the results

Alternative Approach

I'm trying to work backwards (to see which data prep steps we can skip) by trying to load data into the backend, and then progressively including more of the data loading steps in reverse order:

  • CAIDS (ClinGen Canonical Allele IDs) prep step
  • MNVS (Multinucleotide variants)
@ignatiusm ignatiusm self-assigned this Aug 20, 2024
@ignatiusm ignatiusm added the enhancement New feature or request label Aug 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant