Skip to content

Commit

Permalink
Add line breaks (#224)
Browse files Browse the repository at this point in the history
Have greatly reduced number of lines > 80 characters, in all R files except in the file match_taxa.R which we will likely refactor - as this is the one file with lots of longer lines within the code itself.

Closes #188
  • Loading branch information
ehwenk authored May 3, 2024
1 parent 10e909b commit 1bf0761
Show file tree
Hide file tree
Showing 10 changed files with 407 additions and 191 deletions.
3 changes: 2 additions & 1 deletion R/APCalign-package.R
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,8 @@
#' @name APCalign
#' @docType package
#' @references If you have any questions, comments or suggestions, please
#' submit an issue at our [GitHub repository](https://github.com/traitecoevo/APCalign/issues)
#' submit an issue at our
#' [GitHub repository](https://github.com/traitecoevo/APCalign/issues)
#' @keywords internal
#' @section Functions:
#' **Standarise taxon names**
Expand Down
197 changes: 137 additions & 60 deletions R/align_taxa.R

Large diffs are not rendered by default.

10 changes: 7 additions & 3 deletions R/create_species_state_origin_matrix.R
Original file line number Diff line number Diff line change
@@ -1,13 +1,17 @@
#' Use the taxon distribution data from the APC to determine state level native and introduced origin status
#' Use the taxon distribution data from the APC to determine state level
#' native and introduced origin status
#'
#' This function processes the geographic data available in the APC and
#' returns state level native, introduced and more complicated origins status for all taxa.
#'
#'
#' @family diversity methods
#' @param resources the taxonomic resources required to make the summary statistics. Loading this can be slow, so call load_taxonomic_resources separately to greatly speed this function up and pass the resources in.
#' @param resources the taxonomic resources required to make the summary statistics.
#' Loading this can be slow, so call load_taxonomic_resources separately to greatly
#' speed this function up and pass the resources in.
#'
#' @return A tibble with columns representing each state and rows representing each species. The values in each cell represent the origin of the species in that state.
#' @return A tibble with columns representing each state and rows representing each
#' species. The values in each cell represent the origin of the species in that state.
#'
#'
#' @export
Expand Down
124 changes: 86 additions & 38 deletions R/create_taxonomic_update_lookup.R
Original file line number Diff line number Diff line change
@@ -1,49 +1,93 @@
#' Create a lookup table with the best-possible scientific name match for a list of Australian plant names
#' Create a lookup table with the best-possible scientific name match for a
#' list of Australian plant names
#'
#' This function takes a list of Australian plant names that need to be reconciled with current taxonomy and
#' generates a lookup table of the best-possible scientific name match for each input name.
#' It uses first the function `align_taxa`, then the function `update_taxonomy` to achieve the output.
#' This function takes a list of Australian plant names that need to be
#' reconciled with current taxonomy and
#' generates a lookup table of the best-possible scientific name match for
#' each input name.
#' It uses first the function `align_taxa`, then the function `update_taxonomy`
#' to achieve the output.
#'
#' @family taxonomic alignment functions
#'
#' @param taxa A list of Australian plant species that needs to be reconciled with current taxonomy.
#' @param stable_or_current_data either "stable" for a consistent version, or "current" for the leading edge version.
#' @param taxa A list of Australian plant species that needs to be reconciled
#' with current taxonomy.
#' @param stable_or_current_data either "stable" for a consistent version,
#' or "current" for the leading edge version.
#' @param version The version number of the dataset to use.
#' @param taxonomic_splits How to handle one_to_many taxonomic matches. Default is "return_all". The other options are "collapse_to_higher_taxon" and "most_likely_species". most_likely_species defaults to the original_name if that name is accepted by the APC; this will be right for certain species subsets, but make errors in other cases, use with caution.
#' @param full logical for whether the full lookup table is returned or just key columns
#' @param fuzzy_abs_dist The number of characters allowed to be different for a fuzzy match.
#' @param fuzzy_rel_dist The proportion of characters allowed to be different for a fuzzy match.
#' @param fuzzy_matches Fuzzy matches are turned on as a default. The relative and absolute distances allowed for fuzzy matches to species and infraspecific taxon names are defined by the parameters `fuzzy_abs_dist` and `fuzzy_rel_dist`
#' @param resources These are the taxonomic resources used for cleaning, this will default to loading them from a local place on your computer. If this is to be called repeatedly, it's much faster to load the resources using \code{\link{load_taxonomic_resources}} separately and pass the data in.
#' @param APNI_matches Name matches to the APNI (Australian Plant Names Index) are turned off as a default.
#' @param imprecise_fuzzy_matches Imprecise fuzzy matches uses the fuzzy matching function
#' with lenient levels set (absolute distance of 5 characters; relative distance = 0.25).
#' It offers a way to get a wider range of possible names, possibly corresponding to very distant spelling mistakes.
#' This is FALSE as default and all outputs should be checked as it often makes erroneous matches.
#' @param identifier A dataset, location or other identifier, which defaults to NA.
#' @param quiet Logical to indicate whether to display messages while aligning taxa.
#' @param output file path to save the output. If this file already exists, this function will check if it's a subset of the species passed in and try to add to this file. This can be useful for large and growing projects.
#' @return A lookup table containing the accepted and suggested names for each original name input, and additional taxonomic information such as taxon rank, taxonomic status, taxon IDs and genera.
#' @param taxonomic_splits How to handle one_to_many taxonomic matches.
#' Default is "return_all". The other options are "collapse_to_higher_taxon"
#' and "most_likely_species". most_likely_species defaults to the original_name
#' if that name is accepted by the APC; this will be right for certain species
#' subsets, but make errors in other cases, use with caution.
#' @param full logical for whether the full lookup table is returned or
#' just key columns
#' @param fuzzy_abs_dist The number of characters allowed to be different for
#' a fuzzy match.
#' @param fuzzy_rel_dist The proportion of characters allowed to be different
#' for a fuzzy match.
#' @param fuzzy_matches Fuzzy matches are turned on as a default. The relative
#' and absolute distances allowed for fuzzy matches to species and
#' infraspecific taxon names are defined by the parameters `fuzzy_abs_dist`
#' and `fuzzy_rel_dist`.
#' @param resources These are the taxonomic resources used for cleaning, this
#' will default to loading them from a local place on your computer. If this is
#' to be called repeatedly, it's much faster to load the resources using
#' \code{\link{load_taxonomic_resources}} separately and pass the data in.
#' @param APNI_matches Name matches to the APNI (Australian Plant Names Index)
#' are turned off as a default.
#' @param imprecise_fuzzy_matches Imprecise fuzzy matches uses the fuzzy
#' matching function with lenient levels set (absolute distance of
#' 5 characters; relative distance = 0.25).
#' It offers a way to get a wider range of possible names, possibly
#' corresponding to very distant spelling mistakes.
#' This is FALSE as default and all outputs should be checked as it often
#' makes erroneous matches.
#' @param identifier A dataset, location or other identifier,
#' which defaults to NA.
#' @param quiet Logical to indicate whether to display messages while
#' aligning taxa.
#' @param output file path to save the output. If this file already exists,
#' this function will check if it's a subset of the species passed in and try
#' to add to this file. This can be useful for large and growing projects.
#' @return A lookup table containing the accepted and suggested names for each
#' original name input, and additional taxonomic information such as taxon
#' rank, taxonomic status, taxon IDs and genera.
#' - original_name: the original plant name.
#' - aligned_name: the input plant name that has been aligned to a taxon name in the APC or APNI by the align_taxa function.
#' - aligned_name: the input plant name that has been aligned to a taxon name in
#' the APC or APNI by the align_taxa function.
#' - accepted_name: the APC-accepted plant name, when available.
#' - suggested_name: the suggested plant name to use. Identical to the accepted_name, when an accepted_name exists; otherwise the the suggested_name is the aligned_name.
#' - genus: the genus of the accepted (or suggested) name; only APC-accepted genus names are filled in.
#' - family: the family of the accepted (or suggested) name; only APC-accepted family names are filled in.
#' - suggested_name: the suggested plant name to use. Identical to the
#' accepted_name, when an accepted_name exists;
#' otherwise the the suggested_name is the aligned_name.
#' - genus: the genus of the accepted (or suggested) name;
#' only APC-accepted genus names are filled in.
#' - family: the family of the accepted (or suggested) name;
#' only APC-accepted family names are filled in.
#' - taxon_rank: the taxonomic rank of the suggested (and accepted) name.
#' - taxonomic_dataset: the source of the suggested (and accepted) names (APC or APNI).
#' - taxonomic_dataset: the source of the suggested (and accepted) names
#' (APC or APNI).
#' - taxonomic_status: the taxonomic status of the suggested (and accepted) name.
#' - taxonomic_status_aligned: the taxonomic status of the aligned name, before any taxonomic updates have been applied.
#' - aligned_reason: the explanation of a specific taxon name alignment (from an original name to an aligned name).
#' - update_reason: the explanation of a specific taxon name update (from an aligned name to an accepted or suggested name).
#' - taxonomic_status_aligned: the taxonomic status of the aligned name,
#' before any taxonomic updates have been applied.
#' - aligned_reason: the explanation of a specific taxon name alignment
#' (from an original name to an aligned name).
#' - update_reason: the explanation of a specific taxon name update
#' (from an aligned name to an accepted or suggested name).
#' - subclass: the subclass of the accepted name.
#' - taxon_distribution: the distribution of the accepted name; only filled in if an APC accepted_name is available.
#' - scientific_name_authorship: the authorship information for the accepted (or synonymous) name; available for both APC and APNI names.
#' - taxon_ID: the unique taxon concept identifier for the accepted_name; only filled in if an APC accepted_name is available.
#' - taxon_ID_genus: an identifier for the genus; only filled in if an APC-accepted genus name is available.
#' - scientific_name_ID: an identifier for the nomenclatural (not taxonomic) details of a scientific name; available for both APC and APNI names.
#' - taxon_distribution: the distribution of the accepted name;
#' only filled in if an APC accepted_name is available.
#' - scientific_name_authorship: the authorship information for the accepted
#' (or synonymous) name; available for both APC and APNI names.
#' - taxon_ID: the unique taxon concept identifier for the accepted_name;
#' only filled in if an APC accepted_name is available.
#' - taxon_ID_genus: an identifier for the genus;
#' only filled in if an APC-accepted genus name is available.
#' - scientific_name_ID: an identifier for the nomenclatural (not taxonomic)
#' details of a scientific name; available for both APC and APNI names.
#' - row_number: the row number of a specific original_name in the input.
#' - number_of_collapsed_taxa: when taxonomic_splits == "collapse_to_higher_taxon", the number of possible taxon names that have been collapsed.
#' - number_of_collapsed_taxa: when taxonomic_splits == "collapse_to_higher_taxon",
#' the number of possible taxon names that have been collapsed.
#'
#' @export
#'
Expand Down Expand Up @@ -96,8 +140,11 @@ create_taxonomic_update_lookup <- function(taxa,
updated_data %>%
dplyr::select(
dplyr::any_of(c(
"original_name", "aligned_name", "accepted_name", "suggested_name", "genus", "taxon_rank", "taxonomic_dataset", "taxonomic_status", "scientific_name", "aligned_reason", "update_reason",
"alternative_possible_names", "possible_names_collapsed", "number_of_collapsed_taxa"
"original_name", "aligned_name", "accepted_name", "suggested_name",
"genus", "taxon_rank", "taxonomic_dataset", "taxonomic_status",
"scientific_name", "aligned_reason", "update_reason",
"alternative_possible_names", "possible_names_collapsed",
"number_of_collapsed_taxa"
))
)
}
Expand All @@ -117,7 +164,8 @@ validate_taxonomic_splits_input <- function(taxonomic_splits) {
paste(
"Invalid input:",
taxonomic_splits,
". Valid inputs are 'return_all', 'collapse_to_higher_taxon', or 'most_likely_species'."
". Valid inputs are 'return_all', 'collapse_to_higher_taxon', or
'most_likely_species'."
)
)
}
Loading

0 comments on commit 1bf0761

Please sign in to comment.