-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
de_analysis.R: Creating counts dataframe changes counts variable from matrix to 'named' integer #95
Comments
Hi, thanks for this feedback - We will update it in the workflow as soon as possible in the meantime you can run the local version that you have editted the script in by running it with the main script whereever that is located for you locally eg.
|
also is your ref_annotation and transcript from a publicly available source? Just to help us recreate the error. |
Hello, Error in dmDSdata(counts = counts, samples = coldata) :
all(samples$sample_id %in% colnames(counts)) is not TRUE
Calls: dmDSdata -> stopifnot
Execution halted I'm investigating now to see what the issue with it is. I have half a mind to manually carry out the de_analysis.R to catch any errors and make changes as needed. Yes, my annotation and transcript are the X_tropicalis reference from the ftp. I'm using the genomic.fna and rna_from_genomic respectively for reference and transcripts. I've also tried using both the gff and gtf files. |
Okay, I've done more testing and figured out something that's my fault. So I have changed this and importantly I've run it with an unchanged de_analysis.R to determine if my original error is related to my mistake. So far so good. It seems like it is indeed related! The workflow completed successfully! |
ah great, let us know if any other questions or issues. |
Hello,
I also put all the fastq files in the same input folder and got the same errors as you. Seeing your message, I tried adding 6 subfolders in my input folder, and named them with the barcodes corresponding to my 6 conditions . Finally, in each of my subfolders I put my fastq.gz files starting with the corresponding barcode. By doing so, I get an error informing me that my barcode format is wrong: “Invalid sample sheet: values in 'barcode' column are incorrect format”. Did I enter the wrong barcodes? |
Hi, the barcode column should only be in the format eg.
|
Ok thank you I didn't understand ! It works well thank you !
De: "Sarah Griffiths" ***@***.***>
À: "epi2me-labs/wf-transcriptomes" ***@***.***>
Cc: "Natacha Clairet" ***@***.***>, "Comment" ***@***.***>
Envoyé: Lundi 14 Octobre 2024 17:35:29
Objet: Re: [epi2me-labs/wf-transcriptomes] de_analysis.R: Creating counts dataframe changes counts variable from matrix to 'named' integer (Issue #95)
Hi, the barcode column should only be in the format eg. barcode_01 , you can use alias column to rename sample_01 and the condition column to indicate the conditions see the read me for an example and instructions
barcode,alias,condition
barcode01,sample01,control
barcode02,sample02,control
barcode03,sample03,control
barcode04,sample04,treated
barcode05,sample05,treated
barcode06,sample06,treated
—
Reply to this email directly, [ #95 (comment) | view it on GitHub ] , or [ https://github.com/notifications/unsubscribe-auth/BFF4N3KMJEA5ZTMXTDZWHETZ3PQEDAVCNFSM6AAAAABJWPVF2OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJRGYYTMNZVGA | unsubscribe ] .
You are receiving this because you commented. Message ID: <epi2me-labs/wf-transcriptomes/issues/95/2411616750 @ github . com>
|
Operating System
Other Linux (please specify below)
Other Linux
Red Hat Enterprise Linux release 8.6
Workflow Version
v1.1.1-g999fb4e
Workflow Execution
Command line (Cluster)
Other workflow execution
No response
EPI2ME Version
No response
CLI command run
nextflow run epi2me-labs/wf-transcriptomes --fastq fastqs_dir --sample_sheet sample_sheet.csv --de_analysis --ref_genome ref_genomic.fna --transcriptome_source precomputed --ref_transcriptome ref_rna.fna --ref_annotation ref.gtf --minimap2_index_opts '-k 15' --threads 32 --cdna_kit SQK-PCS111 --pychopper_opts '-U -y' -profile singularity
Workflow Execution - CLI Execution Profile
singularity
What happened?
Observed:
The workflow started with no issues and proceeded to the de_analysis.R step. However, this results in an error.
Error in data.frame(gene_id = txdf$GENEID, feature_id = txdf$TXNAME, cts) : arguments imply differing number of rows: 0, 42347 Execution halted
-- see logfile and below this for full error message.Expected/Solution:
After trying a bunch of things, I decided to 'brute-force' it and look through the de_analysis.R script and manually run the commands myself to figure out what's happened.
Basically in line 70 this command:
cts <- cts[rownames(cts) %in% txdf$TXNAME, ]
broke the workflow for me.This line caused a change in the cts from being a matrix to being just a named integer list. That resulted in the subsequent commands:
to be unsuccessful.
My understanding is that when subsetting a matrix like line 70 is attempting, sometimes the result is coerced to a vector if the subset operation results in a single row or column. So I fixed this by adding the drop=FALSE to the command.
I checked it manually and this retained cts as a matrix.
So I propose a change to line 70 that looks like this:
cts <- cts[rownames(cts) %in% txdf$TXNAME, ,drop=FALSE ]
However at this point, I'm unsure how to proceed as the hot-fix I figured out manually can't be applied to the workflow as I currently run it. So I would appreciate any pointers on how to change the script locally and run it with the workflow (I'm assuming that you may be too busy to push out an update).
Observed Error:
Relevant log output
Application activity log entry
No response
Were you able to successfully run the latest version of the workflow with the demo data?
yes
Other demo data information
No response
The text was updated successfully, but these errors were encountered: