Name		Name	Last commit message	Last commit date
parent directory ..
indexes		indexes
scripts/nbconverted		scripts/nbconverted
README.md		README.md
explore_data.ipynb		explore_data.ipynb
split_data.ipynb		split_data.ipynb
split_data.sh		split_data.sh

README.md

1. Split Data

In this module, we split the training data into training and testing datasets.

Data is split into subsets in split_data.ipynb. The testing dataset is determined by randomly sampling 15% (stratified by phenotypic class) of the single-cell dataset. The training dataset is the subset remaining after the testing samples are removed. We store sample indexes associated with training and testing subsets in indexes/, and we later use these sample indexes to load subsets from labeled data in 0.download_data/data/.

Step 1: Split Data

Use the commands below to create indexes for training and testing data subsets:

# Make sure you are located in 1.split_data
cd 1.split_data

# Activate phenotypic_profiling conda environment
conda activate phenotypic_profiling

# Split data
bash split_data.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1.split_data

1.split_data

README.md

1. Split Data

Step 1: Split Data

Files

1.split_data

Directory actions

More options

Directory actions

More options

Latest commit

History

1.split_data

Folders and files

parent directory

README.md

1. Split Data

Step 1: Split Data