Skip to content

Latest commit

 

History

History

library_design

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Mutant library design

This subdirectory contains scripts and notebooks that generate primer pools used for Delta variant spike deep mutational scanning experiments. The design is run by Snakefile, and puts results in ./results/. Code generates four separate primer pools:

The final primer sequences used as ordering sheets for oPools on IDTdna are .csv files ending with oPool in ./results/.

The file results/aggregated_mutations.csv indicates all mutations that are designed in each category.

Input data

  • GISAID_data/spikeprot0724.fasta contains an alignment of all spike proteins as downloaded from the Download tab of the EpiCov section of GISAID on July-26-2021. Note that the download yields a zipped .tar file; this file was then un-tarred and unzipped. Due to GISAID data sharing terms, this file is not actually included in the repo.

  • ./reference_sequences subdirectory contains SARS-CoV-2 spike reference sequences and lookup tables required to renumber positions between variants.

Scripts

  • scripts subdirectory contains scripts for generating data. Scripts work as follows:
    • spike_positive_selection_sites.py script uses SARS-CoV-2 spike protein selection data (as described in this paper) to filter for positions in spike that are undergoing positive selection.
    • filter_and_align_gisaid.py uses GISAID_data/spikeprot0724.fasta sequences to align all SARS-CoV-2 spike sequences deposited in GISAID as of July-26-2021
    • spike_alignment_counts.py extracts all mutations in GISAID spike alignments relative too Wuhan-1 sequence.
    • spike_mutcounts.py counts the number of independently reoccurring mutations on SARS-CoV-2 phylogenetic tree available from UShER.
    • 2021Jan_create_primers.py and create_primers_del.py are scripts that create random or specific amino acid change primers, respectively.

Notebooks

  • ./notebooks subdirectory contains notebooks used to generate primer pools found in ./results/primers.
    • gisaid_variant_primers.py.ipynb notebook generates specific amino acid primers for each mutation present in GISAID data.
    • usher_primers.py.ipynb notebook generates specific amino acid primers for each independently reoccurring mutation on SARS-CoV-2 phylogenetic tree.
    • positive_selection_primers.py.ipynb notebook generates NNG/NNC primer pools for each position on spike that is undergoing positive selection.
    • paired_positive_selection_primers.py.ipynb notebook generates pools of NNG/NNC primers that introduce paired mutations for closely located sites that are undergoing positive selection
    • oPool_primer_sheets.py.ipynb takes primer pools generated by the notebooks above and formats spreadsheets in accordance to IDTdna oPool order input format.

Lab notebook

  • Bernadeta's lab notebook that includes all experiments done on this project can be found here.