-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Find full-test data #85
Comments
https://lomanlab.github.io/mockcommunity/ And @sofstam will help look for 'real' illumina/nanopore stuff :) |
I have asked at our site if we are allowed to use some of our data as test-data. Meanwhile, I found those for Nanopore: |
@jfy133 I got a response from our site that we cannot use any of our data as test data right now, we need an ethical approval that we will be working on during fall. |
OK lets just look for already published stuff 👍 |
I will have a look at this! Sounds good with the dataset from this article. Since the dataset is focused on bacteria, it might be good to have test data for viruses as well? https://www.ebi.ac.uk/ena/browser/view/PRJNA670157?show=reads |
Beyond the mock communities, I'm personally interested in using the CAMI data. |
And also need to decide databases, and where to store them (presumably aws...?) |
I think Zenodo could be a good place if we want to publish the benchmark. Then every database has a DOI. |
I fear the file sizes for some will be too large for Zenodo (50GB limit) but we can see |
Minimum criteria for full-test data:
|
To 'borrow' from the MAG full-test data we can pick 2-3 illumina and 2-3 ONT samples/runs from here: https://www.ebi.ac.uk/ena/browser/view/PRJEB29152 Have both Illumina and Nanopoe, and sequencing depth is >10m, and is shotgun |
https://www.nature.com/articles/s41597-019-0287-z I post the article here so we do not forget. |
Meslier2022 is what we are going for: https://www.nature.com/articles/s41597-022-01762-z I did a test run:
With a the following samplesheet:
And database sheet
The databases TARs basd on these instructions And it mostly ran! Observations in next commetn |
Output files:
Overall we get 35% reads classified with Brakcken so I think this a good sign this is reasonable dataset |
|
Shall we close this? |
Not yet, lets get the samplesheets and databases seets upload to test-datasets and then we can close it :) |
Description of feature
These should be 3-5 'real life' samples that you would profile against.
Ideally these would be shortread/long read pairs.
The text was updated successfully, but these errors were encountered: