LDetect requires python3 to run, all other requirements should be installed by pip (below).
The following command will install LDetect in python's site-packages
directory
pip install ldetect
LDetect comes with a set of example scripts, located in the examples/
directory. Example data is also provided in examples/example_data
.
Genetic maps are available at: https://github.com/joepickrell/1000-genomes-genetic-maps
1000 Genomes Phase 1 used as example: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/
The idea is to split the chromosomes up into partitions by virtue of having large genetic distances (for example, so that they can run in parallel)
- Genetic map
- Number of individuals in reference panel
python3 P00_00_partition_chromosome.py <input_genetic_map> <n_individuals_in_ref_panel> <output_file>
python3 P00_00_partition_chromosome.py example_data/chr2.interpolated_genetic_map.gz 379 example_data/cov_matrix/scripts/chr2_partitions
This example will output the partitions to chr2_partitions. The columns are:
[start position] [stop position]
If the output directory for the covariance matrix is:
example_data/cov_matrix/
then downstream scripts EXPECT partition files to be in the scripts/
subdirectory and to be named chr<chr_number>_partitions
Script used to calculate the Wen and Stephens shrinkage estimator of the covariance matrix. Reads a VCF file from stdin.
- Reference panel (via stdin)
- Genetic map file
- Ouput covariance file [gzipped]
- List of individuals to use in the calculation
- Effective population size
- Cutoff beneath which covariance is not reported
tabix -h <input_ref_panel> <chr_number>:<start>-<stop> | python3 P00_01_calc_covariance.py <input_genetic_map> <input_individuals_file> <effective_population_size> <cov_cutoff> <output_cov_matrix_partition>
tabix -h ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/ALL.chr2.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz 2:39967768-40067768 | python3 P00_01_calc_covariance.py example_data/chr2.interpolated_genetic_map.gz example_data/eurinds.txt 11418 1e-7 example_data/cov_matrix/chr2/chr2.39967768.40067768.gz
This example will output the covariance matrix to chr2.39967768.40067768.gz
. The columns are:
[snpid 1] [snpid 2] [position 1] [position 2] [genetic position 1] [genetic position 2] [empirical covariance] [shrinkage covariance]
NOTE: The effective population size in the example is set to 11418
, which is appropriate for European populations, and the cutoff below which the covariance is not reported was set to 1e-7
in previous work.
If the output directory for the covariance matrix is:
example_data/cov_matrix/
then downstream scripts EXPECT covariance matrix files to be in subdirectories named chr<chr_number>/
and to be named chr<chr_number>.<start>.<stop>.gz
, where <start>
and <stop>
correspond to values from the partitions file
- Covariance matrix from previous step
python3 P01_matrix_to_vector_pipeline.py --dataset_path=<path_to_covariance_matrix_root> --name=<chromosome_name> --out_fname=<vector_fname>
python3 P01_matrix_to_vector_pipeline.py --dataset_path=example_data/cov_matrix/ --name=chr2 --out_fname=example_data/vector/vector-EUR-chr2-39967768-40067768.txt.gz
This example will output the covariance matrix to vector-EUR-chr2-39967768-40067768.txt.gz
. The columns are:
[position] [diagonal_sum]
- Vector from previous step
- Covariance matrix
- Mean number of SNPs between breakpoints
python3 P02_minima_pipeline.py --input_fname=<vector_fname> --chr_name=<chromosome_name> --dataset_path=<path_to_covariance_matrix_root> --n_snps_bw_bpoints=<N_SNPs> --out_fname=<pickle_fname>
This file (P02_minima_pipeline.py) can be tweaked to remove all but the low-pass filter with local search algorithm in order to reduce total runtime.
python3 P02_minima_pipeline.py --input_fname=example_data/vector/vector-EUR-chr2-39967768-40067768.txt.gz --chr_name=chr2 --dataset_path=example_data/cov_matrix/ --n_snps_bw_bpoints=50 --out_fname=example_data/minima/minima-EUR-chr2-50-39967768-40067768.pickle
.pickle file with minima and various metrics for using uniform breakpoints, uniform breakpoints with local search, breakpoints with low-pass filter, and breakpoints with low-pass filter and local search.
- Pickle file from previous step
- Covariance matrix
- Which subset of breakpoints to extract. Subset is one of
['fourier', 'fourier_ls', 'uniform', 'uniform_ls']
python3 P03_extract_bpoints.py --name=<chromosome_name> --dataset_path=<path_to_covariance_matrix_root> --subset=<subset> --input_pickle_fname=<pickle_fname> > <output_file>
python3 P03_extract_bpoints.py --name=chr2 --dataset_path=example_data/cov_matrix/ --subset=fourier_ls --input_pickle_fname=example_data/minima/minima-EUR-chr2-50-39967768-40067768.pickle > example_data/bed/EUR-chr2-50-39967768-40067768.bed
.bed file with columns:
[chromosome name] [region start] [region stop]