Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

groHMM overhaul #165

Merged
merged 54 commits into from
Oct 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
1df35bd
style: Run styler
edmundmiller Sep 12, 2024
f7ddbb6
test: Write grohmm tests
edmundmiller Sep 13, 2024
21ba6f7
fix(grohmm): bam => bams
edmundmiller Sep 11, 2024
39265aa
fix(grohmm): Link up tuning files and samples
edmundmiller Sep 12, 2024
f431c73
refactor(grohmm): Add notes of brainstorming
edmundmiller Sep 13, 2024
727fdd1
feat: Add grohmm min and max tuning params
edmundmiller Sep 14, 2024
bb59488
fix: of => fromList
edmundmiller Sep 15, 2024
ab20315
refactor(grohmm): Use parameter tuning split
edmundmiller Sep 15, 2024
6e0b9a8
fix(grohmm): Use windowAnalysis
edmundmiller Sep 15, 2024
114136c
refactor: Setup transcript calling
edmundmiller Sep 15, 2024
d72402b
build: Build a seperate grohmm conda package
edmundmiller Sep 18, 2024
ce1c512
chore: Add working
edmundmiller Sep 18, 2024
6ba858f
build(grohmm): Add Seqera containers
edmundmiller Sep 19, 2024
46646f0
test(grohmm): Add chr7 gtf
edmundmiller Sep 19, 2024
2ad3b9c
test: Setup tests for grohmm
edmundmiller Sep 23, 2024
65d16d0
chore(grohmm): Add example tuning evals from tutorial
edmundmiller Sep 23, 2024
e221791
fix(grohmm): Try removing any genes from "random" Chromosome
edmundmiller Sep 23, 2024
0691cba
test(grohmm): Use kgChr7 gtf
edmundmiller Sep 23, 2024
692e688
test: Try it with refGene again
edmundmiller Sep 24, 2024
d965491
fix(grohmm): Set memory.limit
edmundmiller Sep 24, 2024
013978d
fix(grohmm): Remove keytype
edmundmiller Sep 24, 2024
af3cba8
refactor(grohmm): Get jobs running for every set of possibilities
edmundmiller Sep 24, 2024
a5e2bcc
test(grohmm): Test the whole subworkflow
edmundmiller Sep 24, 2024
7807899
fix(grohmm): Give up on taking a tuning file
edmundmiller Sep 24, 2024
d67acde
test(grohmm): Fix how the channel is created to avoid exhausting it
edmundmiller Sep 24, 2024
09212f9
fix(grohmm): Get transcript calling running
edmundmiller Sep 24, 2024
d2e4065
fix(grohmm): Clean up tuning file to match
edmundmiller Sep 25, 2024
2286b7a
fix(grohmm): Update groHMM with fix
edmundmiller Sep 25, 2024
9388a21
Add consensus bed as output for testing
edmundmiller Oct 17, 2024
b7567c8
test(grohmm): Update all the snapshots
edmundmiller Sep 25, 2024
baa008b
refactor(grohmm): Use each input
edmundmiller Sep 26, 2024
d9e94aa
feat(grohmm): Add Native MultiQC support
edmundmiller Sep 27, 2024
a535c8c
fix(grohmm): Update labels for parametertuning
edmundmiller Sep 28, 2024
2f1c81c
chore: Add a note about the subworkflow functionality
edmundmiller Sep 30, 2024
cd9151e
chore: oras => https
edmundmiller Oct 6, 2024
26626fa
refactor(grohmm): Use each input
edmundmiller Sep 26, 2024
9272060
test(grohmm): Write a failing test
edmundmiller Oct 7, 2024
3d5cf85
fix(grohmm): Update to work with CHM13
edmundmiller Oct 4, 2024
b8548a3
refactor: Replace jpg with png
edmundmiller Oct 4, 2024
0c7a9ba
test: Remove skip_tuning test
edmundmiller Oct 9, 2024
0958778
test: Bump snapshot
edmundmiller Oct 9, 2024
d17d9df
Update CHANGELOG
edmundmiller Oct 15, 2024
3c947b0
docs: Update groHMM docs
edmundmiller Oct 15, 2024
2b2b90f
fix(grohmm): Try calling no more than 10 cores
edmundmiller Oct 16, 2024
e101553
chore: Add a custom makeConsensusAnnotations function
edmundmiller Oct 18, 2024
3028ae1
Try creating a custom makeConsensusAnnotations and setup test
edmundmiller Oct 18, 2024
d4ac91f
Get custom running
edmundmiller Oct 18, 2024
3444c5b
Start over
edmundmiller Oct 18, 2024
508452b
refactor: Split into chunks
edmundmiller Oct 18, 2024
b538a16
More tests
edmundmiller Oct 19, 2024
ef677b1
fix(grohmm): Support gxf
edmundmiller Oct 19, 2024
afbb7b6
fix(chm13): Prevent some errors during transcriptcalling
edmundmiller Oct 21, 2024
5c6b24a
style(grohmm): Clean up
edmundmiller Oct 21, 2024
9c65113
chore: Move test data to test-datasets repo
edmundmiller Oct 22, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion .editorconfig
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ trim_trailing_whitespace = true
indent_size = 4
indent_style = space

[*.{md,yml,yaml,html,css,scss,js}]
[*.{md,yml,yaml,html,css,scss,js,R,Rmd}]
indent_size = 2

# These files are edited and tested upstream in nf-core/modules
Expand All @@ -31,3 +31,7 @@ indent_size = unset
# ignore python and markdown
[*.{py,md}]
indent_style = unset

# Follow tidyverse style for R
[*.{R,Rmd}]
indent_size = 2
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,16 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- [#137](https://github.com/nf-core/nascent/pull/137) - Use singularity containers for PINTS
- [#142](https://github.com/nf-core/nascent/pull/142) - Updated CHM13 references
- [#171](https://github.com/nf-core/nascent/pull/171) - Use assertAll in tests
- [#165](https://github.com/nf-core/nascent/pull/165) - groHMM overhaul. Removed R mclapply calls and replaced with Nextflow scatter gather for parameter tuning. This creates a job for each parameter set.

### Fixed

- [#170](https://github.com/nf-core/nascent/pull/170) - Remove "Access to undefined parameter forwardStranded" warnings

### Removed

- [[#165](https://github.com/nf-core/nascent/pull/165)] - Removed support for groHMM tuning files.

## v2.2.0 - 2024-03-05

### Added
Expand Down
2 changes: 1 addition & 1 deletion assets/multiqc_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ custom_data:
plot_type: "image"
sp:
grohmm_plot:
fn: "*.tdplot_mqc.jpg"
fn: "*.tdplot_mqc.png"
ignore_images: false

export_plots: true
Expand Down
170 changes: 170 additions & 0 deletions bin/grohmm_parametertuning.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
#!/usr/bin/env Rscript

suppressPackageStartupMessages(library(argparse))
suppressPackageStartupMessages(library(GenomicFeatures))
suppressPackageStartupMessages(library(GenomicAlignments))
suppressPackageStartupMessages(library(groHMM))

parser <- ArgumentParser(description = "Run groHMM on some bam files")

parser$add_argument(
"-i",
"--bam_files",
type = "character",
nargs = "+",
metavar = "path",
help = "GRO SEQ data in bam files.",
required = TRUE
)
parser$add_argument(
"-o",
"--outdir",
type = "character",
default = "./",
metavar = "path",
help = "Output directory."
)
parser$add_argument(
"-l",
"--ltprobb",
type = "integer",
default = -200,
metavar = "integer",
help = cat(
"Log-transformed transition probability of switching from transcribed
state to non-transcribed state"
)
)
parser$add_argument(
"-u",
"--uts",
type = "integer",
default = 5,
metavar = "integer",
help = cat(
"Variance of the emission probability for reads in the
non-transcribed state, respectively."
)
)
parser$add_argument(
"-p",
"--outprefix",
type = "character",
default = "grohmm",
metavar = "string",
help = "Output prefix."
)
parser$add_argument(
"-g",
"--gxf",
type = "character",
default = NULL,
metavar = "string",
help = "GFF/GTF File to create TxDb",
required = TRUE
)
parser$add_argument(
"-c",
"--cores",
type = "integer",
default = 1,
metavar = "integer",
help = "Number of cores."
)
parser$add_argument(
"-m",
"--memory",
type = "integer",
metavar = "integer",
help = "Amount of memory in MB"
)

args <- parser$parse_args()

options(mc.cores = getCores(args$cores))
memory.limit(size = args$memory)
setwd(args$outdir)

if (is.null(args$bam_files)) {
print_help(args)
stop("Please provide a bam file", call. = FALSE)
}

# Load alignment files
# TODO? CHANGE BASED ON PAIRED OR SINGLE END
alignments <- c()
for (bam in args$bam_files) {
alignments <- append(
alignments,
as(readGAlignments(bam), "GRanges")
)
alignments <- keepStandardChromosomes(alignments, pruning.mode = "coarse")
}

print("Input transcript annotations")
kg_db <- makeTxDbFromGFF(args$gxf)
kg_tx <- transcripts(kg_db, columns = c("gene_id", "tx_id", "tx_name"))
print("Collapse annotations in preparation for overlap")
kg_consensus <- makeConsensusAnnotations(
kg_tx,
mc.cores = args$cores
)
print("Finished consensus annotations")

############
## TUNING ##
############
print("Starting tuning run")
tune <- data.frame(
LtProbB = args$ltprobb,
UTS = args$uts
)
fp <- windowAnalysis(alignments, strand = "+", windowSize = 50)
fm <- windowAnalysis(alignments, strand = "-", windowSize = 50)
hmm <- detectTranscripts(
Fp = Fp,
Fm = Fm,
reads = alignments,
LtProbB = args$ltprobb,
UTS = args$uts
)
print("Evaluating")
e <- evaluateHMMInAnnotations(hmm$transcripts, kg_consensus)

# Extract evaluation metrics and convert to a data frame
eval_metrics <- as.data.frame(e$eval)

# If eval_metrics is a list of lists, unlist it
if (is.list(eval_metrics[[1]])) {
eval_metrics <- as.data.frame(t(sapply(e$eval, unlist)))
}

# Combine the tuning parameters with the evaluation metrics
tune <- cbind(tune, eval_metrics)

print(e$eval)
print(e)

# Write the combined data to a CSV file without row names
write.csv(tune, file = paste0(args$outprefix, ".tuning.csv"), row.names = FALSE)
# Write kg_consensus to a bed file for testing
export.bed(kg_consensus, con = paste0(args$outprefix, ".tuning.consensus.bed"))

########################
## CITE PACKAGES USED ##
########################
citation("groHMM")
citation("GenomicFeatures")
citation("GenomicAlignments")
citation("AnnotationDbi")

####################
## R SESSION INFO ##
####################
r_log_file <- "R_sessionInfo.log"
if (file.exists(r_log_file) == FALSE) {
sink(r_log_file)
a <- sessionInfo()
print(a)
sink()
}
Loading
Loading