Skip to content

Commit

Permalink
Merge branch 'develop'
Browse files Browse the repository at this point in the history
  • Loading branch information
Julian de Ruiter committed May 11, 2017
2 parents ee11693 + aeb6935 commit 796c546
Show file tree
Hide file tree
Showing 25 changed files with 304 additions and 114 deletions.
2 changes: 1 addition & 1 deletion .bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 0.3.1
current_version = 0.3.2

[bumpversion:file:setup.py]

Expand Down
6 changes: 3 additions & 3 deletions CONTRIBUTING.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Types of Contributions
Report Bugs
~~~~~~~~~~~

Report bugs at https://github.com/jrderuiter/imfusion/issues.
Report bugs at https://github.com/nki-ccb/imfusion/issues.

If you are reporting a bug, please include:

Expand Down Expand Up @@ -46,7 +46,7 @@ Submit Feedback
~~~~~~~~~~~~~~~

The best way to send feedback is to file an issue at
https://github.com/jrderuiter/imfusion/issues.
https://github.com/nki-ccb/imfusion/issues.

If you are proposing a feature:

Expand Down Expand Up @@ -105,5 +105,5 @@ Before you submit a pull request, check that it meets these guidelines:
your new functionality into a function with a docstring, and add the
feature to the list in README.rst.
3. The pull request should work for Python 2.7, 3.4 and 3.5. Check
https://travis-ci.org/jrderuiter/imfusion/pull_requests
https://travis-ci.org/nki-ccb/imfusion/pull_requests
and make sure that the tests pass for all supported Python versions.
7 changes: 7 additions & 0 deletions HISTORY.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,13 @@
History
=======

0.3.2 (2017-05-11)
------------------

* Properly added star-fusion support to star aligner (was previously not
fully merged).
* Changed documentation URLs to new repository.

0.3.1 (2017-05-09)
------------------

Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ dist: clean ## builds source and wheel package
python setup.py sdist bdist_wheel

conda: clean-pyc ## build a conda release
conda build --python 3.5 -c bioconda -c r -c jrderuiter conda
conda build --python 3.5 -c bioconda conda

conda-docker: clean-pyc
docker run -v `pwd`:/imfusion -t -i condaforge/linux-anvil /bin/sh -c 'cd /imfusion && ./scripts/conda_build_docker.sh'
Expand Down
14 changes: 5 additions & 9 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,9 +1,3 @@
.. image:: https://img.shields.io/travis/jrderuiter/imfusion/develop.svg
:target: https://travis-ci.org/jrderuiter/imfusion

.. image:: https://img.shields.io/coveralls/jrderuiter/imfusion/develop.svg
:target: https://coveralls.io/github/jrderuiter/imfusion

IM-Fusion
=========

Expand Down Expand Up @@ -42,12 +36,14 @@ Documentation
=============

IM-Fusion's documentation is available at
`jrderuiter.github.io/imfusion <http://jrderuiter.github.io/imfusion/>`_.
`nki-ccb.github.io/imfusion <http://nki-ccb.github.io/imfusion>`_.

References
==========
de Ruiter, JR. *et al.*, 2017. **"Identifying transposon insertions and
their effects from RNA-sequencing data"** (*Under revision*).

de Ruiter J.R., Kas S.M. *et al.* **"Identifying transposon insertions and their
effects from RNA-sequencing data"** Nucleic Acids Research 2017, *in press*.


License
=======
Expand Down
4 changes: 2 additions & 2 deletions conda/meta.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
package:
name: imfusion
version: 0.3.1
version: 0.3.2

build:
number: 0
Expand Down Expand Up @@ -71,7 +71,7 @@ test:
- featureCounts -v

about:
home: https://github.com/jrderuiter/imfusion
home: https://github.com/nki-ccb/imfusion
license: MIT
summary: "IM-Fusion - Tool for identifying transposon insertions
and their effects from RNA-sequencing data"
3 changes: 1 addition & 2 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -255,8 +255,7 @@

# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [(master_doc, 'im-fusion', u'IM-Fusion Documentation', [author],
1)]
man_pages = [(master_doc, 'imfusion', u'IM-Fusion Documentation', [author], 1)]

# If true, show URL addresses after external links.
#man_show_urls = False
Expand Down
2 changes: 1 addition & 1 deletion docs/extras.rst
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ that this script does expect the required dependencies to be installed
(FusionFilter, Repeatmasker and Blast).

.. _FusionFilter wiki: https://github.com/FusionFilter/FusionFilter/wiki/Building-a-Custom-FusionFilter-Dataset
.. _Python script: https://github.com/jrderuiter/imfusion/blob/develop/scripts/starfusion_build_reference.py
.. _Python script: https://github.com/nki-ccb/imfusion/blob/develop/scripts/starfusion_build_reference.py

Identifying fusions
~~~~~~~~~~~~~~~~~~~
Expand Down
7 changes: 6 additions & 1 deletion docs/getting_started.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
Getting started
===============


Overview
--------

Expand Down Expand Up @@ -39,6 +38,10 @@ supporting code is provided for interactive analyses (such as plotting
differential expression) and manually running certain steps of the pipeline.
For more details, see :doc:`api`.

For the STAR aligner, we also support identifying endogenous gene fusions
using STAR-Fusion. See :doc:`extras` for more details about this type of
analysis.

Required files
--------------

Expand All @@ -65,6 +68,8 @@ be either ‘SD’ or ‘SA’ for splice-donor or splice-acceptor sites respect
The field may also be left empty for other types of features, however these
features will not be used by IM-Fusion.

For generating the exon-level expression counts, IM-Fusion needs a
flattened exon representation of the reference gene features (in GTF format).
Example reference files
-----------------------

Expand Down
13 changes: 12 additions & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,19 @@ IM-Fusion has the following key features:
single insertion in a specific sample, or determine the general
effect of insertions on a given gene within the tumor cohort.

Additionally, by integrating STAR-Fusion into its STAR-based insertion
identification pipeline, IM-Fusion enables simultaneous identification of
transposon insertions and endogenous gene fusions. As shown in our manuscript,
this approach can identify endogenous gene-fusions driving tumorigenesis, which
are impossible to identify using targeted DNA-based transposon-sequencing
approaches.

.. _STAR-Fusion: https://github.com/STAR-Fusion/STAR-Fusion/wiki

For more details on the approach and a comparison with existing DNA-sequencing
approaches, please see our paper **"Identifying transposon insertions and
their effects from RNA-sequencing data"** (*Currently under revision*).
their effects from RNA-sequencing data"** (Nucleic Acids Research 2017,
*in press*).

.. toctree::
:maxdepth: 2
Expand All @@ -41,6 +51,7 @@ their effects from RNA-sequencing data"** (*Currently under revision*).
installation
getting_started
usage
extras
api
extras
contributing
Expand Down
13 changes: 7 additions & 6 deletions docs/usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ format) into a new Fasta file and builds the indices needed for alignment.
Separate sub-commands are provided for each supported aligner (currently STAR
and Tophat-Fusion).

The basic command for building a (STAR-based) reference is as follows:
The basic command for building a STAR-based reference is as follows:

.. code:: bash
Expand Down Expand Up @@ -90,7 +90,7 @@ reference, whilst the ``--output_dir`` argument specifies where the
sample output should be written.

An optional ``--assemble`` argument indicates whether IM-Fusion should perform
a reference-guided transcript assembly. If given, IM-Fusion runs Stringtie
a reference-guided transcript assembly. If given, IM-Fusion runs StringTie
after the RNA-seq alignment to detect novel gene transcripts based on the
RNA-seq alignment. The results of this assembly are subsequently used in the
insertion detection step to annotate insertions that involve novel transcripts.
Expand All @@ -105,10 +105,11 @@ The command for using Tophat-Fusion is nearly identical:
--output_dir output/sample_s1 \
--tophat_threads 4
However, both aligners do have some aligner-specific arguments concerning the
alignment. See the help of the respective sub-commands for more details. For
STAR, special attention should be paid to memory usage, as STAR requires
approximately 30GB of memory (per process) for loading the reference genome.
However, as was the case when building the reference genomes, both aligners do
have some aligner-specific arguments concerning the alignment. See the help of
the respective sub-commands for more details. Again, for STAR special
attention should be paid to memory usage, as STAR requires approximately
30GB of memory for loading the reference genome.

Quantifying expression (per sample)
-----------------------------------
Expand Down
75 changes: 43 additions & 32 deletions scripts/starfusion_build_reference.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,28 +21,41 @@ def main():
tmp_dir.mkdir(parents=True, exist_ok=False)

# Filter the patch chromosomes from the gtf, as these are
# likely not present in the fasta.
logging.info('- Generating cDNA sequences')
# likely not present in the fasta. Note we also filter rows without
# a transcript_id, as these seem to be problematic for STAR-Fusion.
logging.info('- Filtering GTF file')

gtf_path = tmp_dir / 'ref.gtf'
tmp_gtf_path = gtf_path.with_suffix('.gtf.tmp')

with tmp_gtf_path.open('wb') as file_:
check_call(
['grep', '-v', r'^\(MG\|JH\|GL\)', str(args.gtf)], stdout=file_)

with gtf_path.open('wb') as file_:
check_call(['grep', '-v', r'^\(MG\|JH\|GL\)', str(args.gtf)],
stdout=file_)
check_call(['grep', 'transcript_id', str(tmp_gtf_path)], stdout=file_)

tmp_gtf_path.unlink()

# Create cDNA_seqs file.
logging.info('- Generating cDNA sequences')

cdna_path = tmp_dir / 'cDNA_seqs.fa'
with cdna_path.open('wb') as file_:
script_path = args.ff_path / 'util' / 'gtf_file_to_cDNA_seqs.pl'
check_call(['perl', str(script_path), str(gtf_path), str(args.fasta)],
stdout=file_)
check_call(
['perl', str(script_path), str(gtf_path), str(args.fasta)],
stdout=file_)

# Build masked cDNA_seqs file using RepeatMasker.
# Note: requires library to be installed from http://www.girinst.org.
logging.info('- Masking repeats')

masked_path = cdna_path.with_suffix('.fa.masked')
check_call([str(args.rm_path / 'RepeatMasker'), '-pa', str(args.threads), '-s',
'-species', 'mouse', '-xsmall', str(cdna_path)])
check_call([
str(args.rm_path / 'RepeatMasker'), '-pa', str(args.threads), '-s',
'-species', 'mouse', '-xsmall', str(cdna_path)
])

# Create blastpairs.
logging.info('- Creating blast pairs')
Expand All @@ -51,37 +64,35 @@ def main():

pair_path = tmp_dir / 'blast_pairs.outfmt6'
with pair_path.open('wb') as file_:
check_call(['blastn',
'-query', str(cdna_path),
'-db', str(masked_path),
'-max_target_seqs', '10000',
'-outfmt', '6',
'-evalue', '1e-3',
'-lcase_masking',
'-num_threads', str(args.threads),
'-word_size', '11'],
stdout=file_)
check_call(
[
'blastn', '-query', str(cdna_path), '-db', str(masked_path),
'-max_target_seqs', '10000', '-outfmt', '6', '-evalue', '1e-3',
'-lcase_masking', '-num_threads', str(args.threads),
'-word_size', '11'
],
stdout=file_)

pair_gz_path = pair_path.with_suffix('.gene_syms.outfmt6.gz')
with gzip.open(str(pair_gz_path), 'wb') as file_:
script_path = (args.ff_path / 'util' /
'blast_outfmt6_replace_trans_id_w_gene_symbol.pl')
check_call(['perl', str(script_path), str(cdna_path), str(pair_path)],
stdout=file_)
check_call(
['perl', str(script_path), str(cdna_path), str(pair_path)],
stdout=file_)

# Prepare library.
logging.info('- Preparing library')
script_path = args.ff_path / 'util' / 'prep_genome_lib.pl'
check_call(['perl', str(script_path),
'--genome_fa', str(args.fasta),
'--gtf', str(gtf_path),
'--blast_pairs', str(pair_gz_path),
'--cdna_fa', str(cdna_path),
'--CPU', str(args.threads),
'--max_readlength', str(args.read_length),
'--output_dir', str(args.output_dir)])

# shutil.rmtree(str(args.tmp_dir))
script_path = args.ff_path / 'prep_genome_lib.pl'
check_call([
'perl', str(script_path), '--genome_fa', str(args.fasta), '--gtf',
str(gtf_path), '--blast_pairs', str(pair_gz_path), '--cdna_fa',
str(cdna_path), '--CPU', str(args.threads), '--max_readlength',
str(args.read_length), '--output_dir', str(args.output_dir)
])

shutil.rmtree(str(args.tmp_dir))


def parse_args():
"""Parses command line arguments."""
Expand All @@ -91,7 +102,7 @@ def parse_args():
parser.add_argument('--fasta', required=True, type=Path)
parser.add_argument('--gtf', required=True, type=Path)
parser.add_argument('--output_dir', required=True, type=Path)

parser.add_argument('--ff_path', required=False, default='', type=Path)
parser.add_argument('--rm_path', required=False, default='', type=Path)

Expand Down
4 changes: 2 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,12 +27,12 @@

setuptools.setup(
name='imfusion',
version='0.3.1',
version='0.3.2',
description=('Tool for identifying transposon insertions in '
'Insertional Mutagenesis screens from gene-transposon '
'fusions using single- and paired-end RNA-sequencing data.'),
long_description=README + '\n\n' + HISTORY,
url='https://github.com/jrderuiter/im-fusion',
url='https://github.com/nki-ccb/imfusion',
author='Julian de Ruiter',
author_email='[email protected]',
license='MIT license',
Expand Down
2 changes: 1 addition & 1 deletion src/imfusion/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@

__author__ = 'Julian de Ruiter'
__email__ = '[email protected]'
__version__ = '0.3.1'
__version__ = '0.3.2'
5 changes: 4 additions & 1 deletion src/imfusion/build/indexers/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -249,7 +249,10 @@ def configure_args(cls, parser):
help='Path to the transposon features (tsv).')

base_group.add_argument(
'--output_dir', type=pathlib.Path, required=True)
'--output_dir',
type=pathlib.Path,
required=True,
help='Path to write the built reference.')

# Optional blacklist arguments.
blacklist_group = parser.add_argument_group('Blacklist arguments')
Expand Down
1 change: 1 addition & 0 deletions src/imfusion/external/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# -*- coding: utf-8 -*-
24 changes: 24 additions & 0 deletions src/imfusion/external/star_fusion.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
"""Module containing functions for calling star-fusion."""

from .util import run_command


def star_fusion(junction_path, reference_path, output_dir=None, log_path=None):
"""Identifies endogenous fusions from an existing STAR alignment.
Parameters
----------
reference_path : pathlib.Path
Path to the reference genome.
out_base_path : pathlib.Path
Base output path for the built index.
log_path : pathlib.Path
Where to write the log output.
"""

args = [
'STAR-Fusion', '--genome_lib_dir', str(reference_path), '-J',
str(junction_path), '--output_dir', str(output_dir)
]
run_command(args=args, log_path=log_path)
2 changes: 1 addition & 1 deletion src/imfusion/insertions/aligners/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ def configure_args(cls, parser):
type=pathlib.Path,
required=True,
help='Path to the index of the augmented reference '
'generated by im-fusion build.')
'generated by imfusion-build.')

base_group.add_argument(
'--output_dir',
Expand Down
Loading

0 comments on commit 796c546

Please sign in to comment.