bigsnpr

{bigsnpr} is an R package for the analysis of massive SNP arrays, primarily designed for human genetics. It enhances the features of package {bigstatsr} for the purpose of analyzing genotype data.

Quick demo

LIST OF FEATURES

Note that most of the algorithms of this package don't handle missing values. You can use snp_fastImpute() (taking a few hours for a chip of 15K x 300K) and snp_fastImputeSimple() (taking a few minutes only) to impute missing values of genotyped variants.

New! Package {bigsnpr} now provides functions that directly work on bed files with a few missing values. See new paper "Efficient toolkit implementing..".

Installation

In R, run

# install.packages("remotes")
remotes::install_github("privefl/bigsnpr")

or for the CRAN version

install.packages("bigsnpr")

Input formats

This package reads bed/bim/fam files (PLINK preferred format) using functions snp_readBed() and snp_readBed2(). Before reading into this package's special format, quality control and conversion can be done using PLINK, which can be called directly from R using snp_plinkQC() and snp_plinkKINGQC().

This package can also read UK Biobank BGEN files using function snp_readBGEN(). This function takes around 40 minutes to read 1M variants for 400K individuals using 15 cores.

This package uses a class called bigSNP for representing SNP data. A bigSNP object is a list with some elements:

genotypes: A FBM.code256. Rows are samples and columns are SNPs. This stores genotype calls or dosages (rounded to 2 decimal places).
fam: A data.frame with some information on the SNPs.
map: A data.frame with some information on the individuals.

New! Package {bigsnpr} now provides functions that directly work on bed files with a few missing values. See new paper "Efficient toolkit implementing..".

Polygenic scores

Polygenic scores are one of the main focus of this package. There are 3 main methods currently available:

Penalized regressions with individual-level data (see paper and tutorial)
Clumping and Thresholding (C+T) and Stacked C+T (SCT) with summary statistics and individual level data (see paper and tutorial).
LDpred2 with summary statistics (see preprint and tutorial)

Possible upcoming features

Multiple imputation for GWAS (https://doi.org/10.1371/journal.pgen.1006091).
More interactive (visual) QC.

You can request some feature by opening an issue.

Bug report

How to make a great R reproducible example?

Please open an issue if you find a bug.

If you want help using {bigstatsr}, please open an issue on {bigstatsr}'s repo or post on Stack Overflow with the tag bigstatsr.

I will always redirect you to GitHub issues if you email me, so that others can benefit from our discussion.

References

Privé, Florian, et al. "Efficient analysis of large-scale genome-wide data with two R packages: bigstatsr and bigsnpr." Bioinformatics 34.16 (2018): 2781-2787.
Privé, Florian, et al. "Efficient implementation of penalized regression for genetic risk prediction." Genetics 212.1 (2019): 65-74.
Privé, Florian, et al. "Making the most of Clumping and Thresholding for polygenic scores." The American Journal of Human Genetics 105.6 (2019): 1213-1221.
Privé, Florian, et al. "Efficient toolkit implementing best practices for principal component analysis of population genetic data." Bioinformatics (2020).
Privé, Florian, et al. "LDpred2: better, faster, stronger." BioRxiv (2020).

Name		Name	Last commit message	Last commit date
Latest commit History 887 Commits
.github/workflows		.github/workflows
R		R
data-raw		data-raw
data		data
docs		docs
examples		examples
inst		inst
man		man
raw-vignettes		raw-vignettes
src		src
tests		tests
tmp-save		tmp-save
tmp-tests		tmp-tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.travis.yml		.travis.yml
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.md		README.md
_pkgdown.yml		_pkgdown.yml
appveyor.yml		appveyor.yml
bigsnpr.Rproj		bigsnpr.Rproj
cleanup		cleanup

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bigsnpr

Installation

Input formats

Polygenic scores

Possible upcoming features

Bug report

References

About

Releases

Packages

Languages

dianacornejo/bigsnpr

Folders and files

Latest commit

History

Repository files navigation

bigsnpr

Installation

Input formats

Polygenic scores

Possible upcoming features

Bug report

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages