You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
I was wondering whether it is possible to also retrieve data processing description that is present in the sample's records in GEO. See here for an example: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM6005004 - there is a lot of information that we would like to see in the table that pysradb generates:
Status
Title
Sample type
Source name
Organism
Characteristics
Treatment protocol
Growth protocol
Extracted molecule
Extraction protocol
Library strategy
Library source
Library selection
Instrument model
Description
Data processing
Describe the solution you'd like
I like the table that is currently generated using the following: df = db.sra_metadata(df["study_accession"], detailed = True, expand_sample_attributes = True, output_read_lengths = True)
although I feel like it is missing sometimes crucial information that is only included in GEO under specific records of the samples. For an example it the record of the sample that I have included above you can find the following:
Sequenced reads were trimmed for adaptor sequence and low-quality sequence (bbduk; minlength=30, qtrim=rl, trimq=15)
Reads were then mapped to the reference genome of Mus musculus (GRCm38) using STAR aligner version 2.5.3a with parameters --quantMode GeneCounts --runThreadN 4
Assembly: GRCm38
It would be nice to get that into the sra_metadata table too if that is possible. I guess for now I could just use geoquery for that and then merge two tables if possible by GSM sample ids, although I would need to test that. Then probably the hustle including this here would be redundant. But still it seems like a nice direction that one could take to expand this :)
Thank you for your work so far!
The text was updated successfully, but these errors were encountered:
ajandria
changed the title
[ENH]
[ENH] Include data processing steps, reference to which the reads were aligned or if possible lab protocol into the main table
Apr 11, 2023
Thanks, this is a great suggestion! It is doable - once the experiment_alias is fetched pysradb would need to make another request for the corresponding detailed GEO metadata. I currently do not have the bandwidth to do this, but pull requests are always welcome!
Is your feature request related to a problem? Please describe.
I was wondering whether it is possible to also retrieve data processing description that is present in the sample's records in GEO. See here for an example: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM6005004 - there is a lot of information that we would like to see in the table that
pysradb
generates:Describe the solution you'd like
I like the table that is currently generated using the following:
df = db.sra_metadata(df["study_accession"], detailed = True, expand_sample_attributes = True, output_read_lengths = True)
although I feel like it is missing sometimes crucial information that is only included in GEO under specific records of the samples. For an example it the record of the sample that I have included above you can find the following:
It would be nice to get that into the
sra_metadata
table too if that is possible. I guess for now I could just use geoquery for that and then merge two tables if possible by GSM sample ids, although I would need to test that. Then probably the hustle including this here would be redundant. But still it seems like a nice direction that one could take to expand this :)Thank you for your work so far!
The text was updated successfully, but these errors were encountered: