Skip to content

Commit

Permalink
convert from loom (fixes #15); expanded filtering; new PBMC dataset +…
Browse files Browse the repository at this point in the history
… demo movie; better tests (fixes #21)
  • Loading branch information
Benedikt Obermayer committed Oct 18, 2019
1 parent 1d03aa3 commit 0befa91
Show file tree
Hide file tree
Showing 12 changed files with 304 additions and 121 deletions.
9 changes: 9 additions & 0 deletions HISTORY.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,15 @@
History
=======

------
v0.7.0
------

- added conversion from .loom files
- cell filtering also supports downsampling
- added PBMC dataset hosted on figshare
- added demo movie

------
v0.6.0
------
Expand Down
32 changes: 25 additions & 7 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,11 @@ SCelVis: Easy Single-Cell Visualization
.. image:: https://zenodo.org/badge/185944510.svg
:target: https://zenodo.org/badge/latestdoi/185944510

You can find the URL for the demo linked to on the top right of the Github repository page.
|
.. image:: scelvis/assets/movie.gif
:height: 400px
:align: center

------------
Installation
Expand Down Expand Up @@ -52,12 +56,13 @@ A Docker container is also available via `Quay.io/Biocontainers <https://quay.io
Tutorial
--------

explore a simulated dummy dataset or 1000 cells from a 1:1 Mixture of Fresh Frozen Human (HEK293T) and Mouse (NIH3T3) Cells (10X v3 chemistry)
explore 1000 cells from a 1:1 Mixture of Fresh Frozen Human (HEK293T) and Mouse (NIH3T3) Cells (10X v3 chemistry) or a published dataset of ~14000 IFN-beta treated and control PBMCs from 8 donors (`GSE96583 <https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE96583>`_; see `Kang et al. <https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE96583>`_)

.. code-block:: shell
$ scelvis run --data-source /path/to/scelvis/examples/dummy.h5ad
$ scelvis run --data-source /path/to/scelvis/examples/hgmm_1k.h5ad
$ scelvis run --data-source https://files.figshare.com/18037739/pbmc.h5ad
and then point your browser to http://0.0.0.0:8050/.

Expand All @@ -70,12 +75,14 @@ Data sets are provided as HDF5 files (`anndata <https://anndata.readthedocs.io/e

For the input you can either specify one HDF5 file or a directory containing multiple such files.

You can use ``scanpy`` to create this HDF5 file directly or use the ``scelvis convert`` command for converting your single-cell pipeline output.
You can use `scanpy <http://scanpy.rtfd.io>`_ to create this HDF5 file directly or use the ``scelvis convert`` command for converting your single-cell pipeline output.

HDF5 Input
----------

for HDF5 input, you can do your analysis with `scanpy <http://scanpy.rtfd.io>`_ to create an anndata object ``ad``. SCelVis will use embedding coordinates from ``ad.obsm``, cell annotation from ``ad.obs`` and expression data directly from ``ad.X`` (this should contain normalized and log-transformed expression values for all genes). Information about the dataset will be extracted from strings stored in ``ad.uns['about_title']``, ``ad.uns['about_short_title']`` and ``ad.uns['about_readme']`` (assumed to be Markdown). Information about marker genes will be taken from entries starting with ``marker_`` in ``ad.uns``: entries called ``marker_gene`` (required!), ``marker_cluster``, ``marker_padj``, ``marker_LFC`` will create a table with the columns ``gene``, ``cluster``, ``padj``, and ``LFC``.
for HDF5 input, you can do your analysis with `scanpy <http://scanpy.rtfd.io>`_ to create an anndata object ``ad``. SCelVis will use embedding coordinates from ``ad.obsm``, cell annotation from ``ad.obs`` and expression data directly from ``ad.X`` (this should contain normalized and log-transformed expression values for all genes). If present, information about the dataset will be extracted from strings stored in ``ad.uns['about_title']``, ``ad.uns['about_short_title']`` and ``ad.uns['about_readme']`` (assumed to be Markdown). Information about marker genes will be taken either from the ``rank_genes_groups`` slot in ``ad.uns`` or from entries starting with ``marker_`` in ``ad.uns``: entries called ``marker_gene`` (required!), ``marker_cluster``, ``marker_padj``, ``marker_LFC`` will create a table with the columns ``gene``, ``cluster``, ``padj``, and ``LFC``.

If you prepared your data with ``Seurat`` (v2), you can use ``Convert(from = sobj, to = "anndata", filename = "data.h5ad")`` to get an HDF5 file.

Text Input
----------
Expand Down Expand Up @@ -122,7 +129,18 @@ For "raw" text input, you need to prepare at least three files in the input dire
$ scelvis convert --input-dir text_input --output data/text_input.h5ad --about-md text_input.md
in ``examples/dummy_raw.zip`` and ``examples/dummy_about.md`` we provide raw data for the dummy dataset.
in ``examples/dummy_raw.zip`` and ``examples/dummy_about.md`` we provide raw data for a simulated dummy dataset.

Loom Input
----------

for `loompy <http://loompy.org>`_ or `loomR <https://github.com/mojaveazure/loomR>`_ input, you can convert your data like this:

.. code-block:: shell
$ scelvis convert --i input.loom -m markers.tsv -a about.md -o loom_input.h5ad
if you prepared your data with ``Seurat`` (v3), you can use ``as.loom(sobj, filename="output.loom")`` to get a ``.loom`` file and then convert to ``.h5ad`` with the above command.

CellRanger Input
----------------
Expand All @@ -142,7 +160,7 @@ Alternatively, the output directory of ``CellRanger`` can be used. This is the d
EOF
$ scelvis convert --input-dir cellranger-out --output data/cellranger_input.h5ad --about-md cellranger.md
In ``examples/hgmm_1k_raw.zip`` we provide ``CellRanger`` output for the 1k 1:1 human mouse mix. Specifically, from the `outs` folder we selected
In ``examples/hgmm_1k_raw`` we provide ``CellRanger`` output for the 1k 1:1 human mouse mix. Specifically, from the ``outs`` folder we selected
- ``filtered_feature_bc_matrix.h5``
- tSNE and PCA projections from ``analysis/tsne`` and ``analysis/pca``
Expand Down
1 change: 1 addition & 0 deletions requirements/base.txt
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ numpy
pandas
anndata
scanpy
loompy

# Caching functionality for Flask.
flask-caching
Expand Down
17 changes: 12 additions & 5 deletions scelvis/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -145,10 +145,16 @@ def find(name, path):
logger.info("Looking for %s file", cellranger_needle)
needle_path = find(cellranger_needle, tmpdir)
if needle_path is None:
raw_needle = "coords.tsv"
logger.info("Looking for %s file", raw_needle)
needle_path = find(raw_needle, tmpdir)
format_ = "text"
text_needle = "coords.tsv"
logger.info("Looking for %s file", text_needle)
needle_path = find(text_needle, tmpdir)
if needle_path is None:
loom_needle = "data.loom"
logger.info("Looking for %s file", loom_needle)
needle_path = find(loom_needle, tmpdir)
format_ = "loom"
else:
format_ = "text"
else:
format_ = "cell-ranger"
input_dir = os.path.dirname(needle_path)
Expand Down Expand Up @@ -183,7 +189,8 @@ def find(name, path):
return """
<!doctype html>
<title>Convert File</title>
<h1>Upload ZIP or TAR.GZ of CellRanger Output</h1>
<h1>Upload ZIP or TAR.GZ of your data</h1>
<p>either containing CellRanger output, raw text files or a data.loom file<p>
<p>
The server will return a <tt>.h5a</tt> file that you can upload into the SCelVis visualization.
</p>
Expand Down
Binary file modified scelvis/assets/cells.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added scelvis/assets/movie.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
182 changes: 115 additions & 67 deletions scelvis/callbacks.py
Original file line number Diff line number Diff line change
Expand Up @@ -477,6 +477,12 @@ def toggle_filter_cells_controls(n, is_open):
def register_update_filter_cells_controls(app, token):
@app.callback(
[
Output("%s_filter_cells_ncells_div" % token, "style"),
Output("%s_filter_cells_ncells" % token, "marks"),
Output("%s_filter_cells_ncells" % token, "min"),
Output("%s_filter_cells_ncells" % token, "max"),
Output("%s_filter_cells_ncells" % token, "value"),
Output("%s_filter_cells_ncells" % token, "step"),
Output("%s_filter_cells_choice_div" % token, "style"),
Output("%s_filter_cells_choice" % token, "options"),
Output("%s_filter_cells_choice" % token, "value"),
Expand All @@ -493,56 +499,73 @@ def register_update_filter_cells_controls(app, token):
def update_filter_cells_controls(pathname, attribute, filters_json):
_, kwargs = get_route(pathname)
data = store.load_data(kwargs.get("dataset"))
hidden_slider = ({"display": "none"}, {0: "0", 1: "1"}, 0, 1, 1, 0)
hidden_checklist = ({"display": "none"}, [], None)
hidden_rangeslider = ({"display": "none"}, {0: "0", 1: "1"}, 0, 1, [0, 1], 0)

if attribute is None or attribute == "None":
return (
{"display": "none"},
[],
None,
{"display": "none"},
{0: "0", 1: "1"},
0,
1,
[0, 1],
0,
)
return hidden_slider + hidden_checklist + hidden_rangeslider
filters = json.loads(filters_json)
values = data.ad.obs_vector(attribute)
if not pd.api.types.is_numeric_dtype(values):
categories = list(data.ad.obs[attribute].cat.categories)
return (
{"display": "block"},
[{"label": v, "value": v} for v in categories],
filters[attribute] if attribute in filters else categories,
{"display": "none"},
{0: "0", 1: "1"},
0,
1,
[0, 1],
0,
)
else:
range_min = values.min()
range_max = values.max()
if attribute == "ncells":
ncells_tot = data.ad.obs.shape[0]
if attribute in filters:
val_min = filters[attribute][0]
val_max = filters[attribute][1]
ncells_selected = filters[attribute]
else:
val_min = range_min
val_max = range_max
ncells_selected = ncells_tot
return (
{"display": "none"},
[],
None,
{"display": "block"},
dict(
(int(t) if t % 1 == 0 else t, "{0:g}".format(t))
for t in ui.common.auto_tick([range_min, range_max], max_tick=4, tf_inside=True)
),
range_min,
range_max,
[val_min, val_max],
(range_max - range_min) / 1000,
(
{"display": "block"},
dict(
(int(t) if t % 1 == 0 else t, "{0:g}".format(t))
for t in ui.common.auto_tick([0, ncells_tot], max_tick=4, tf_inside=True)
),
0,
ncells_tot,
ncells_selected,
ncells_tot / 1000,
)
+ hidden_checklist
+ hidden_rangeslider
)
else:
values = data.ad.obs_vector(attribute)
if not pd.api.types.is_numeric_dtype(values):
categories = list(data.ad.obs[attribute].cat.categories)
return (
hidden_slider
+ (
{"display": "block"},
[{"label": v, "value": v} for v in categories],
filters[attribute] if attribute in filters else categories,
)
+ hidden_rangeslider
)
else:
range_min = values.min()
range_max = values.max()
if attribute in filters:
val_min = filters[attribute][0]
val_max = filters[attribute][1]
else:
val_min = range_min
val_max = range_max
return (
hidden_slider
+ hidden_checklist
+ (
{"display": "block"},
dict(
(int(t) if t % 1 == 0 else t, "{0:g}".format(t))
for t in ui.common.auto_tick(
[range_min, range_max], max_tick=4, tf_inside=True
)
),
range_min,
range_max,
[val_min, val_max],
(range_max - range_min) / 1000,
)
)


def register_update_filter_cells_filters(app):
Expand All @@ -554,9 +577,11 @@ def register_update_filter_cells_filters(app):
],
[
Input("url", "pathname"),
Input("meta_filter_cells_ncells", "value"),
Input("meta_filter_cells_choice", "value"),
Input("meta_filter_cells_range", "value"),
Input("meta_filter_cells_reset", "n_clicks"),
Input("expression_filter_cells_ncells", "value"),
Input("expression_filter_cells_choice", "value"),
Input("expression_filter_cells_range", "value"),
Input("expression_filter_cells_reset", "n_clicks"),
Expand All @@ -569,9 +594,11 @@ def register_update_filter_cells_filters(app):
)
def update_filter_cells_filters(
pathname,
meta_ncells_value,
meta_cat_value,
meta_range_value,
meta_reset_n,
expression_ncells_value,
expression_cat_value,
expression_range_value,
expression_reset_n,
Expand All @@ -584,26 +611,43 @@ def update_filter_cells_filters(
ctx = dash.callback_context

filters = json.loads(filters_json)
active_filters = set()
# if reset button was hit, remove entries in filters_json
attributes = list(filters.keys())
status = "active filters: "
# if reset button was hit, check all boxes using stored values in filters_json
attributes = filters.keys()
if ctx.triggered and "reset" in ctx.triggered[0]["prop_id"]:
for attribute in list(attributes):
for attribute in attributes:
del filters[attribute]
return (json.dumps(filters), status, status)

for cat_value, range_value, attribute in [
(meta_cat_value, meta_range_value, meta_attribute),
(expression_cat_value, expression_range_value, expression_attribute),
# else update filters_json depending on inputs
for ncells_value, cat_value, range_value, attribute in [
(meta_ncells_value, meta_cat_value, meta_range_value, meta_attribute),
(
expression_ncells_value,
expression_cat_value,
expression_range_value,
expression_attribute,
),
]:
if attribute is not None and attribute != "None":
values = data.ad.obs_vector(attribute)
if not pd.api.types.is_numeric_dtype(values):
filters[attribute] = sorted(cat_value)
if attribute == "ncells":
filters[attribute] = ncells_value
ncells_tot = data.ad.obs.shape[0]
if ncells_value < ncells_tot:
active_filters.add(attribute)
else:
filters[attribute] = range_value

status += ", ".join(attributes)
values = data.ad.obs_vector(attribute)
if not pd.api.types.is_numeric_dtype(values):
filters[attribute] = cat_value
if cat_value is not None and set(cat_value) != set(values):
active_filters.add(attribute)
else:
filters[attribute] = range_value
if range_value[0] > values.min() or range_value[1] < values.max():
active_filters.add(attribute)

status += ", ".join(active_filters)
return (json.dumps(filters), status, status)


Expand All @@ -623,19 +667,23 @@ def activate_filter_cells_reset(pathname, filters_json):
else:
filters = {}
disabled = True
attributes = filters.keys()
for attribute in attributes:
values = data.ad.obs_vector(attribute)
if not pd.api.types.is_numeric_dtype(values):
if filters[attribute] != list(data.ad.obs[attribute].cat.categories):
for attribute, selected in filters.items():
if attribute == "ncells":
ncells_tot = data.ad.obs.shape[0]
if selected < ncells_tot:
disabled = False
else:
range_min = values.min()
range_max = values.max()
val_min = filters[attribute][0]
val_max = filters[attribute][1]
if val_min > range_min or val_max < range_max:
disabled = False
values = data.ad.obs_vector(attribute)
if not pd.api.types.is_numeric_dtype(values):
if sorted(selected) != sorted(data.ad.obs[attribute].cat.categories):
disabled = False
else:
range_min = values.min()
range_max = values.max()
val_min = selected[0]
val_max = selected[1]
if val_min > range_min or val_max < range_max:
disabled = False

return (disabled, disabled)

Expand Down
Loading

0 comments on commit 0befa91

Please sign in to comment.