Skip to content

Commit

Permalink
Update tests and documentation for 0.1.0 release
Browse files Browse the repository at this point in the history
  • Loading branch information
percyfal committed Dec 18, 2023
1 parent 7c500b7 commit ec14d15
Show file tree
Hide file tree
Showing 8 changed files with 298 additions and 108 deletions.
23 changes: 8 additions & 15 deletions .github/workflows/R-CMD-check.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,12 @@ name: R-CMD-check

jobs:
R-CMD-check:
runs-on: macOS-latest
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: r-lib/actions/setup-r@master
- uses: r-lib/actions/setup-pandoc@master
- name: Install dependencies
run: |
install.packages(c("remotes", "rcmdcheck", "knitr"))
deps <- remotes::dev_package_deps(dependencies = TRUE)
install.packages(deps$package[!is.na(deps$available)])
if (!requireNamespace("BiocManager", quietly = TRUE)) {install.packages("BiocManager")}
BiocManager::install(deps$package[is.na(deps$available)])
shell: Rscript {0}
- name: Check
run: rcmdcheck::rcmdcheck(args = "--no-manual", error_on = "error")
shell: Rscript {0}
- uses: actions/checkout@v3
- uses: r-lib/actions/setup-r@v2
- uses: r-lib/actions/setup-r-dependencies@v2
with:
extra-packages: any::rcmdcheck
needs: check
- uses: r-lib/actions/check-r-package@v2
2 changes: 2 additions & 0 deletions .lintr
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
linters: linters_with_defaults(
indentation_linter(hanging_indent_style="tidy"))
52 changes: 52 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
- id: check-merge-conflict
- id: debug-statements
- id: mixed-line-ending
- id: detect-private-key
- id: check-case-conflict
- id: check-yaml
- id: trailing-whitespace
- repo: https://github.com/DavidAnson/markdownlint-cli2
rev: v0.11.0
hooks:
- id: markdownlint-cli2
files: \.(md|qmd)$
types: [file]
exclude: LICENSE.md
- id: markdownlint-cli2-fix
files: \.(md|qmd)$
types: [file]
exclude: LICENSE.md
- repo: https://github.com/lorenzwalthert/precommit
rev: v0.3.2.9027
hooks:
- id: style-files
name: style-files
description: style files with {styler}
entry: Rscript inst/hooks/exported/style-files.R
language: r
files: '(\.[rR]profile|\.[rR]|\.[rR]md|\.[rR]nw|\.[qQ]md)$'
exclude: |
(?x)^(
renv/activate\.R|
)$
minimum_pre_commit_version: "2.13.0"
- id: parsable-R
name: parsable-R
description: check if a .R file is parsable
entry: Rscript inst/hooks/exported/parsable-R.R
language: r
files: '\.[rR](md)?$'
minimum_pre_commit_version: "2.13.0"
- id: lintr
args: [--warn_only]
name: lintr
description: check if a `.R` file is lint free (using {lintr})
entry: Rscript inst/hooks/exported/lintr.R
language: r
files: '(\.[rR]profile|\.R|\.Rmd|\.Rnw|\.r|\.rmd|\.rnw)$'
exclude: 'renv/activate\.R'
minimum_pre_commit_version: "2.13.0"
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: genecovr
Title: Gene body coverage analysis to evaluate genome assemblies
Version: 0.0.0.9013
Authors@R:
Version: 0.1.0
Authors@R:
person(given = "Per",
family = "Unneberg",
role = c("aut", "cre"),
Expand Down
79 changes: 73 additions & 6 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -19,21 +19,30 @@ knitr::opts_chunk$set(
[![R build status](https://github.com/NBISweden/genecovr/workflows/R-CMD-check/badge.svg)](https://github.com/NBISweden/genecovr/actions)
<!-- badges: end -->

Perform gene body coverage analyses in R to evaluate genome assembly
quality.
`genecovr` is an `R` package that provides plotting functions that
summarize gene transcript to genome alignments. The main purpose is to
assess the effect of polishing and scaffolding operations has on the
quality of a genome assembly. The gene transcript set is a large
sequence set consisting of assembled transcripts from RNA-seq data
generated in relation to a genome assembly project. Therefore,
`genecovr` serves as a complement to software such as
[BUSCO](https://busco.ezlab.org/), which evaluates genome assembly
quality using a smaller set of well-defined single-copy orthologs.

## Installation

You can install the released version of genecovr from [NBIS
GitHub](https://github.com/nbis) with:

``` r
# If necessary, uncomment to install devtools
# install.packages("devtools")
devtools::install_github("NBISweden/genecovr")
```

## Usage

## Quick usage
### genecovr script quick start

There is a helper script for generating basic plots located in
PACKAGE_DIR/bin/genecovr. Create a data input csv-delimited file with
Expand All @@ -52,8 +61,66 @@ script to generate plots:
PACKAGE_DIR/bin/genecovr indata.csv
```

## Vignette
#### Example

Alternatively, import the library as usual in an R script and use the
package functions. See the vignette for a minimum working example.
There are example files located in PACKAGE_DIR/inst/extdata consisting
of two psl alignment files containing gmap alignments and fasta
indices for the transcript sequences and two for different assembly
versions:

- nonpolished.fai - fasta index for raw assembly
- polished.fai - fasta index for polished assembly
- transcripts.fai - fasta index for transcript sequences
- transcripts2nonpolished.psl - gmap alignments, transcripts to raw assembly
- transcripts2polished.psl - gmap alignments, transcripts to polished
assembly

Using these files and the labels `non` and `pol` for the different
assemblies, a `genecovr` input file (called e.g., `assemblies.csv`)
would look as follows:

```
nonpol,transcripts2nonpolished.psl,nonpolished.fai,transcripts.fai
pol,transcripts2polished.psl,polished.fai,transcripts.fai
```

and the command to run would be:

```
genecovr assemblies.csv
```

#### genecovr options

To list genecovr script options, type 'genecovr -h`:

```
usage: genecovr [-h] [-v] [-p number]
[-d OUTPUT_DIRECTORY] [--height HEIGHT]
[--width WIDTH]
csvfile
positional arguments:
csvfile csv-delimited file with columns
1. data label
2. mapping file (supported formats: psl)
3. assembly file (fasta or fasta index)
4. transcript file (fasta or fasta index)
optional arguments:
-h, --help show this help message and exit
-v, --verbose print extra output
-p number, --cpus number
number of cpus [default 1]
-d OUTPUT_DIRECTORY, --output-directory OUTPUT_DIRECTORY
output directory
--height HEIGHT figure height in inches [default 6.0]
--width WIDTH figure width in inches [default 6.0]
```



### R package vignette

Alternatively, import the library in an R script and use the package
functions. See the vignette for a minimum working example.
19 changes: 13 additions & 6 deletions inst/bin/genecovr
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,7 @@ apl <- AlignmentPairsList(
seqinfo.query=transcripts.sinfo[[x]])
}, BPPARAM=bpparam)
)

names(apl) <- names(psl.fn)

##------------------------------
Expand Down Expand Up @@ -183,8 +184,9 @@ save_plot(p, outfile)


## FIXME: number of levels should be parametrized via option
suppressPackageStartupMessages(library(dplyr))
outfile <- file.path(outdir, "qnuminsert")
x <- insertionSummary(apl, reduce=FALSE)
x <- dplyr::tibble(insertionSummary(apl, reduce=FALSE))
p <- ggplot(x, aes(id)) +
geom_bar(aes(fill=cuts)) +
scale_fill_viridis_d(name="qNumInsert", begin=1, end=0)
Expand All @@ -199,9 +201,8 @@ message("saving ", outfile)
write.csv(x, file=gzfile(outfile), row.names=FALSE)

## Also make plot based on gbc
suppressPackageStartupMessages(library(dplyr))
outfile <- file.path(outdir, "qnuminsert.gbc")
x <- insertionSummary(apl)
x <- dplyr::tibble(insertionSummary(apl))
p <- ggplot(x, aes(id)) +
geom_bar(aes(fill=cuts)) +
scale_fill_viridis_d(name="qNumInsert", begin=1, end=0)
Expand All @@ -211,6 +212,10 @@ save_plot(p, outfile)
##--------------------
## Save gbc data
##--------------------
x$revmap <- as.character(x$revmap)
x$hitCoverage <- as.character(x$hitCoverage)
x$hitStart <- as.character(x$hitStart)
x$hitEnd <- as.character(x$hitEnd)
outfile <- file.path(outdir, "gbcdata.tsv.gz")
message("saving ", outfile)
write.table(x, file=gzfile(outfile), row.names=FALSE, sep="\t")
Expand Down Expand Up @@ -313,11 +318,13 @@ p <- ggplot(data=subset(data, n.subjects>1),
outfile <- file.path(outdir, paste0("depth_breadth_seqlengths.mm", mm))
save_plot(p, outfile)


data$revmap <- as.character(data$revmap)
data$hitCoverage <- as.character(data$hitCoverage)
data$hitStart <- as.character(data$hitStart)
data$hitEnd <- as.character(data$hitEnd)
outfile <- file.path(outdir, "gene_body_coverage.csv.gz")
message("saving ", outfile)
write.csv(data, gzfile(outfile), row.names=FALSE)

write.csv(dplyr::tibble(data), gzfile(outfile), row.names=FALSE)

##############################
## Save Rdata of analysis
Expand Down
33 changes: 20 additions & 13 deletions tests/testthat/test-genecovr.R
Original file line number Diff line number Diff line change
@@ -1,29 +1,36 @@
skip("Skip until resolve issue of testing package binary")
pkg_path <- getwd()
tmp <- tempdir(check = TRUE)
withr::local_dir(tmp)
withr::local_temp_libpaths()
withr::local_path(file.path(.libPaths()[1], "genecovr", "bin"))
devtools::install(pkg=pkg_path, quick = TRUE, upgrade = "never")
devtools::install(pkg = pkg_path, quick = TRUE, upgrade = "never")

create_file <- function(fn) system.file("extdata", fn, package = "genecovr")

data <- as.data.frame(rbind(
c("nonpol", create_file("transcripts2nonpolished.psl"),
create_file("nonpolished.fai"), create_file("transcripts.fai")),
c("pol", create_file("transcripts2polished.psl"),
create_file("polished.fai"), create_file("transcripts.fai"))
c(
"nonpol", create_file("transcripts2nonpolished.psl"),
create_file("nonpolished.fai"), create_file("transcripts.fai")
),
c(
"pol", create_file("transcripts2polished.psl"),
create_file("polished.fai"), create_file("transcripts.fai")
)
))
colnames(data) <- NULL
withr::local_file(write.csv(data, file = "assemblies.csv", row.names = FALSE))

test_that("genecovr runs as expected", {
system(paste(paste0("R_LIBS_USER=", .libPaths()[1]),
"genecovr", "assemblies.csv"))
expect_equal(length(list.files(pattern = "*.png")), 13)
expect_equal(length(grep("Rplots", list.files(pattern = "*.pdf"),
invert = TRUE)), 13)
expect_equal(length(list.files(pattern = "*.csv.gz")), 4)
system(paste(
paste0("R_LIBS_USER=", .libPaths()[1]),
"genecovr", "assemblies.csv"
))
expect_equal(length(list.files(pattern = "*.png")), 15)
expect_equal(length(grep("Rplots", list.files(pattern = "*.pdf"),
invert = TRUE
)), 15)
expect_equal(length(list.files(pattern = "*.csv.gz")), 4)
expect_equal(length(list.files(pattern = "*.tsv.gz")), 1)
})

withr::defer(unlink(dir, recursive = TRUE))
withr::defer(unlink(tmp, recursive = TRUE))
Loading

0 comments on commit ec14d15

Please sign in to comment.