Project in 3-state protein structure prediction with STRIDE

This repository contains following scripts:

- Sequence-only prediction workflow

all_scripts/python/Sequence/

Extract sequence and structure infromation from 3-line *.fasta files

Encode sequence information with either BLOSUM or OneHotEncoding

Train models, view cross-validation performance reports and confusion plots

Read 2-line *.fasta files and output predictions based on a selection of trained models

- PSSM - based prediction workflow

all_scripts/python/PSSM/

Split 3-line *.fasta files in preparation of PSI-BLAST processing

Bash script for PSI-BLAST query of the split files

Extract and scale Substitution or Frequency matrices from *.PSSM files and save them to a dictionary

Train and save final optimized models on PSSM data

Train on adjustible proportion of total dataset over a range of window sizes, print cross-validation performance reports and show confusion >plots

Test saved models on adjustible proportion of our own testing dataset, print performance report and show confusion plot

- Misc processing

all_scripts/python/50_proteins/

Run longest common substring on datasets in *.fasta format, print the IDs and save the list for exclusion

Stride parser that creates 8-state, 3-line *.fasta files from ALL chains in *.stride output files

Stride parser that creates (A-type) 3-state, 3-line *.fasta files from ONE chain in *.stride output files

- Working models

Linear SVC predictor:

complete_predictors/Sequence_only/LinearSVC_21/

Can be run locally from the folder on file named "testset.txt" :

PSSM-based predictor:

complete_predictors/PSSM_21/

Should be run from project folder runs on the pssm file in the same folder (specified by name), modify name of file and choose another model if needed

BLOSUM-based predictor:

complete_predictors/BLOSUM_21/BLOSUM_predict.py

Should be run from project folder, currently predicts on 50 first sequences from Stride testset, change if needed

- Saved optimized models to choose from:

models/

extract the rbfsvc_15 & rbfsvc_21 before running...

Name		Name	Last commit message	Last commit date
Latest commit History 171 Commits
all_scripts		all_scripts
complete_predictors		complete_predictors
datasets		datasets
diaries		diaries
models		models
project_report		project_report
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project in 3-state protein structure prediction with STRIDE

This repository contains following scripts:

- Sequence-only prediction workflow

all_scripts/python/Sequence/

- PSSM - based prediction workflow

all_scripts/python/PSSM/

- Misc processing

all_scripts/python/50_proteins/

- Working models

Linear SVC predictor:

complete_predictors/Sequence_only/LinearSVC_21/

PSSM-based predictor:

complete_predictors/PSSM_21/

BLOSUM-based predictor:

complete_predictors/BLOSUM_21/BLOSUM_predict.py

- Saved optimized models to choose from:

models/

If neeeded, run one of the hefty /all_scripts/python/PSSM/classifiers/ for better models and predictions

About

Releases

Packages

Languages

Mropat/Project-MTLS

Folders and files

Latest commit

History

Repository files navigation

Project in 3-state protein structure prediction with STRIDE

This repository contains following scripts:

- Sequence-only prediction workflow

all_scripts/python/Sequence/

- PSSM - based prediction workflow

all_scripts/python/PSSM/

- Misc processing

all_scripts/python/50_proteins/

- Working models

Linear SVC predictor:

complete_predictors/Sequence_only/LinearSVC_21/

PSSM-based predictor:

complete_predictors/PSSM_21/

BLOSUM-based predictor:

complete_predictors/BLOSUM_21/BLOSUM_predict.py

- Saved optimized models to choose from:

models/

If neeeded, run one of the hefty /all_scripts/python/PSSM/classifiers/ for better models and predictions

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages