kipoiseq

Standard set of data-loaders for training and making predictions for DNA sequence-based models.

All dataloaders in kipoiseq.dataloaders decorated with @kipoi_dataloader (SeqIntervalDl and StringSeqIntervalDl) are compatible Kipoi models and can be directly used when specifying a new model in model.yaml:

...
default_dataloader:
  defined_as: kipoiseq.dataloaders.SeqIntervalDl
  default_args:
    auto_resize_len: 1000 # override default args in SeqIntervalDl
    
dependencies:
  pip:
    - kipoiseq
...

Installation

pip install kipoiseq

Optional dependencies:

pip install cyvcf2, pyranges
conda install cyvcf2, pyranges

Getting started

from kipoiseq.dataloaders import SeqIntervalDl

dl = SeqIntervalDl.init_example()  # use the provided example files
# your own files
dl = SeqIntervalDl("intervals.bed", "genome.fa")

len(dl)  # length of the dataset

dl[0]  # get one instance. # returns a dictionary: 
# dict(inputs=<one-hot-encoded-array>, 
#      targets=<additional columns in the bed file>, 
#      metadata=dict(ranges=GenomicRanges(chr=, start, end)...

all = dl.load_all()  # load the whole dataset

# load batches of data
it = dl.batch_iter(32, num_workers=8)  # load batches of data in parallel using 8 workers
# returns a dictionary with all three keys: inputs, targets, metadata

it = dl.batch_train_iter(32, num_workers=8)
# returns a tuple: (inputs, targets), can be used directly with keras' `model.fit_generator`

More info:

Follow the getting-started colab notebook.
See docs

How to write your own data-loaders

Read the pytorch Data Loading and Processing Tutorial to become more familiar with transforms and dataloaders
Read the code for SeqIntervalDl in kipoiseq/dataloaders/sequence.py
- you can skip the @kipoi_dataloader and the long yaml doc-string. These are only required if you want to use dataloaders in Kipoi's model.yaml files.
Explore the available transforms (functional, class-based) or extractors (kipoiseq, genomelake)

Name		Name	Last commit message	Last commit date
Latest commit History 230 Commits
.circleci		.circleci
.dependabot		.dependabot
docs		docs
kipoiseq		kipoiseq
notebooks		notebooks
tests		tests
.gitignore		.gitignore
.pep8speaks.yml		.pep8speaks.yml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

kipoiseq

Installation

Getting started

How to write your own data-loaders

About

Releases

Packages

Languages

License

bfclarke/kipoiseq

Folders and files

Latest commit

History

Repository files navigation

kipoiseq

Installation

Getting started

How to write your own data-loaders

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages