Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RSeQC module tin.py creates RefseqID instead of ENSEMBLIDs of transcripts #1442

Open
AnissaE opened this issue Oct 28, 2024 · 0 comments
Open
Labels
bug Something isn't working

Comments

@AnissaE
Copy link

AnissaE commented Oct 28, 2024

Description of the bug

Hi all,
I ran the pipeline to analyze bulk RNAseq including TIN module from RSeQC to remove samples with low median TIN (thanks for integrating It to the pipeline it is definitely helpful!!). I ran nfcore/rnaseq ( see command line below) and after 21 hours, I get this output in xls file for each sample(see .xls output for one sample). In the xls file, however, the column geneID, there are RefSeq entry including transcripts names such as rna1, rnaX, etc... (see command head -20 1117877.markdup.sorted.tin.xls below) After some googling, I found out that the expected output of tx names should be ensemblID. Is there an additional argument to change the transcript name or was there a problem that occurred during the processing ? Here is the command that I used and the outputs (xls file). The pipeline finished successfully, I didn't get any error, I labelled this issue as a bug by default.

Thanks for your help,
Anissa.
1117877.markdup.sorted.tin.xls

Command used and terminal output

~/nextflow run nf-core/rnaseq -profile singularity -r 3.6 --max_cpus ${THREADS} --max_memory ${MAX_MEM} --aligner star_salmon --input $SAMPLESHEET --outdir "${FILES}/results_TIN" --genome GRCh38 --gencode --gtf ${REF_DIR}/gencode.v39.annotation.gtf.gz -work-dir "${FILES}/results_TIN/work" --rseqc_modules 'bam_stat,inner_distance,infer_experiment,junction_annotation,junction_saturation,read_distribution,read_duplication,tin'

head -20 1117877.markdup.sorted.tin.xls
geneID chrom tx_start tx_end TIN
rna0 chr1 11873 14409 0.0
rna1 chr1 14361 29370 52.880286053751696
rna3 chr1 17368 17391 0.0
rna2 chr1 17368 17436 0.0
rna4 chr1 17408 17431 0.0
rna5 chr1 30365 30503 0.0
rna6 chr1 30437 30458 0.0
rna7 chr1 34610 36081 0.0
NM_001005484.1 chr1 69090 70008 0.0
rna9 chr1 120711 133748 15.569611580163949
rna10 chr1 134772 140566 17.925981206462087
rna13 chr1 142436 146418 28.571428571428534
rna11 chr1 142436 174392 18.35966859786253
rna12 chr1 142436 174392 20.947542906746364
rna14 chr1 142436 143602 0.0
rna15 chr1 146469 174392 22.22074678696301
rna16 chr1 146469 174392 23.745214873731445
rna17 chr1 149039 174392 18.276476252845494
rna18 chr1 153506 174392 18.220218733785686

Relevant files

No response

System information

Nextflow version 23.10.1 build 5891
System: Linux 4.18.0-553.22.1.el8_10.x86_64
Container engine: singularity
nf-core/rneaseq version 3.6

@AnissaE AnissaE added the bug Something isn't working label Oct 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant