Skip to content

Commit

Permalink
add gradient_pos_rel_amt function
Browse files Browse the repository at this point in the history
  • Loading branch information
jeffkimbrel committed Sep 17, 2023
1 parent 5debdf8 commit bd8ac31
Show file tree
Hide file tree
Showing 5 changed files with 88 additions and 8 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: qSIP2
Title: qSIP Analysis
Version: 0.4.0.9007
Version: 0.4.0.9008
Authors@R:
person("Jeff", "Kimbrel", , "[email protected]", role = c("aut", "cre"),
comment = c(ORCID = "YOUR-ORCID-ID"))
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# Generated by roxygen2: do not edit by hand

export(add_gradient_pos_rel_amt)
export(add_isotopolog_label)
export(get_sample_counts)
export(gradient_pos_density_validation)
Expand Down
31 changes: 31 additions & 0 deletions R/add_gradient_pos_rel_amt.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
#' Add gradient_pos_rel_amt to data
#'
#' This function will calculate the relative amt of a fraction compared to the
#' whole replicate using either qPCR copies or DNA concentrations.
#'
#' @param data A dataframe or tibble
#' @param source_mat_id Grouping variable for a replicate
#' @param amt Column name that has the qPCR or DNA amounts per fraction
#' @param overwrite Determines whether or not to overwrite an existing gradient_pos_rel_amt column
#'
#' @export
#'
#' @keywords sample_data

add_gradient_pos_rel_amt = function(data,
source_mat_id = "source_mat_id",
amt,
overwrite = F) {

if ("gradient_pos_rel_amt" %in% colnames(data)) {
if (overwrite == FALSE) {
stop("gradient_pos_rel_amt already exists! Set overwrite = TRUE if you want to overwrite")
} else if (overwrite == TRUE) {
message("gradient_pos_rel_amt already exists and will be overwritten")
}
}

data |>
dplyr::mutate(gradient_pos_rel_amt = !!as.name(amt) / sum(!!as.name(amt)),
.by = !!as.name(source_mat_id))
}
27 changes: 27 additions & 0 deletions man/add_gradient_pos_rel_amt.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

35 changes: 28 additions & 7 deletions vignettes/sample_data.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,15 @@ knitr::opts_chunk$set(
)
```

```{r setup}
# Background

In `qSIP2`, "sample data" refers to any high level metadata associated with either an experiment or the individual fractions. This vignette will show available tools to format and validate your sample data for the `qsip_sample_object` class in the `qSIP2` package.

```{r setup, message=FALSE}
library(dplyr)
library(qSIP2)
```

# Background

In `qSIP2`, "sample data" refers to any high level metadata associated with either an experiment or the individual fractions.

## What is a sample?

The word **sample** typically refers to the biological or environmental entity the DNA was isolated from as well as the single sequencing run tied to that **sample**. In qSIP, however, because there are multiple sequencing runs per biological subject, the term **sample** has historically been reserved for sequencing of each fraction. In practice, this means you will have many **samples** for each single biological subject.
Expand All @@ -43,7 +43,7 @@ To standardize the qSIP workflow, column names should adhere as closely to MISIP

In traditional qSIP the `isotope` field has been populated with either the light (e.g. 16O) or heavy (e.g. 18O) isotope depending on the substrate used in that rep or `source_mat_id`. In MISIP standards, only the heavy isotope is listed under the `isotope` field, and then a secondary field `isotopolog_label` is used to designate whether the replicate used a substrate with "natural abundance" (i.e. "light") or "isotopically labeled" (i.e. "heavy") isotopes.

In the `qSIP2` package, either method can be used. If the `isotopolog_label` is missing from your dataset then it will assume both the light and heavy isotopes are present in the `isotope` field. But, if you do have an `isotopolog_label` field, then only the heavy isotope designation is allowed in the `isotope` field and the dataframe will not pass validation checks.
In the `qSIP2` package, either method can be used. If the `isotopolog_label` is missing from your dataset then it will assume both the light and heavy isotopes are present in the `isotope` field. But, if you do have an `isotopolog_label` field, then only the heavy isotope designation is allowed in the `isotope` field and the dataframe will not pass validation checks if there are light isotopes listed there.

Conversion between these two objects can be done with the `add_isotopolog_label()` or `remove_isotopolog_label()` functions.

Expand All @@ -55,7 +55,8 @@ sample_data_nonMISIP %>%
count(isotope)
# new data has only one isotope and a mixture of isotopolog_label
df_with_labels = add_isotopolog_label(sample_data_nonMISIP, isotope = "isotope")
df_with_labels = add_isotopolog_label(sample_data_nonMISIP,
isotope = "isotope")
df_with_labels %>%
count(isotope, isotopolog_label)
Expand All @@ -65,6 +66,26 @@ remove_isotopolog_label(df_with_labels) %>%
count(isotope)
```

## Fraction relative amounts

A requirement for qSIP is the `gradient_pos_rel_amt` field, which gives the percent amount that a fraction has of the whole. The preferred method is given in qPCR copy numbers, but DNA concentrations can be used as well.

For example, if there are 100,000 total 16S copies in a replicate as determined by qPCR, and 15,000 copies in fraction 7, then the `gradient_pos_rel_amt` value for fraction 7 would be 0.15 (15,000 / 100,000). Similarly, if you had 25ng total DNA used for density separation, and fraction 7 had 3.75 ng DNA recovered, then `gradient_pos_rel_amt` would also be 0.15 (3.75 / 25).

Ideally, all of the `gradient_pos_rel_amt` for a given `source_mat_id` should add up to 1, but there are situation where it might be less than 1. For example, if you removed some fractions because they didn't sequence well or there was some other reason to remove a fraction. If you have 16S or DNA concentrations for these removed samples they would be subtracted from the total. So, for our example above, if fraction 7 needed to be removed, then the total for all fractions of that `source_mat_id` would only be 0.85. The total fractions within a `source_mat_id` within should never be greater than 1.

If your sample data tibble does not have the `gradient_pos_rel_amt` there is a function that can add it for you.

The `add_gradient_pos_rel_amt()` can create this column from either a column of qPCR totals, or DNA concentrations. Because this function can not know if there are missing fractions, the totals per `source_mat_id` will be equal to 1. If you do have fractions that you want removed, keep them in the dataframe for this `add_gradient_pos_rel_amt()` and then remove them after.











# Make a qSIP sample data object
Expand Down

0 comments on commit bd8ac31

Please sign in to comment.