All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Name components using UPI hashes.
- Run multiple iterations of multiplet recovery during graph step specified using
--max-refinement-recursion-depth
. - Specify maximum number of edges that can be removed between two sub-components during multiplet recovery using
--max-edges-to-split
. - Support for MultiGraphs in
pmds_layout
- Support multiple targets in
plot_colocalization_diff_volcano
andplot_colocalization_diff_heatmap
. - If demultiplexing has a success rate lower than 50% the command will exit with a status of 1. This prevents further pipeline stages to be run on what is probably bad data.
- Add
depth
column todiscarded_edgelist.parquet
output of the GRAPH stage that indicates at which refinement iteration the edge is removed. - Add
edges_removed_in_multiplet_recovery_first_iteration
,edges_removed_in_multiplet_recovery_refinement
andfraction_edges_removed_in_refinement
to graph report.json. - Add
is_potential_doublet
andn_edges_to_split_doublet
columns to adata.obs. - Add
fraction_potential_doublets
andn_edges_to_split_potential_doublets
to annotate report.json. - Add
--max-edges-to-split
option tograph
to specify the maximum number of edges that can be removed between two sub-components during multiplet recovery. - Add
abundance_colocalization_plot
function to make scatter plots of selected marker-pairs' abundance. - Add
plot_polarity_diff_volcano
to make statistical comparison plots of selected component groups. - Add
get_differential_polarity
to statistically compare polarity scores of selected component groups.
- Remove the
components_recovered.csv
output from the GRAPH stage.
- better error message when the number of nodes is lower than the number of requested dimensions in
pmds_layout
.
- Improved memory usage when aggregating PXL files with precomputed layouts.
- Bump polars to stable 1.x series
- Fix a qc report crash issue when the layout stage is run in a pipeline due to an unsupported parameter type.
- Bump
umi_tools
version requirements
- Add minimum marker count
colocalization_min_marker_count
parameter to calculate colocalization score. - Add
density_scatter_plot
function to make two-marker abundance scatter plots with pseudo-density coloring. - Add
wpmds
option inpmds_layout
to compute edge weighted layouts. This is now set as the default layout algorithm. - Add
dsb_normalization
function for normalization of marker abundance. - Add a
Fraction of Outlier Cells
metric to the QC report. - Add a
Panel Version
metadata field to the QC report. - Add support for datasets generated using the
human-sc-immunology-spatial-proteomics-2
panel.
- The default value for
normalize_counts
inlocal_g
is nowFalse
instead ofTrue
. - The default transformation for the calculation of the colocalization score is now
rate-diff
instead oflog1p
. - Rename
edge_rank_plot
function tomolecule_rank_plot
.
- Fix a bug in
compute_transition_probabilities
whenk>1
where the stochastic matrix was not correctly row-normalized. - Fix a bug in
local_g
whenuse_weights=False
where the adjacency matrix was not correctly expended ifk>1
. - Fix a bug where
a_pixels_per_b_pixel
summary statistics where equal to theb_pixels_per_a_pixel
statistics. collapse
will return exit code 137 when one of the child processes is killed by the system (e.g. because it is to much memory). This allows e.g. Nextflow to retry the process with more memory automatically.- Hide the
Sample Description
metadata field in the QC report when no value is available. - Fix an issue where boolean parameters were formatted as integers in the Parameters section of the QC report.
- Fix a bug in aggregating files with precomputed layouts, where the lazy-loading of the layouts was not working correctly.
- Remove the
Pixel Version
metadata field from the QC report.
- Poor performance when writing many small layouts to pxl file (~45x speed-up). This should almost only impact test scenarios, since most real components should be large enough for this not to be an issue.
- Add
rate_diff_transformation
function withrate-diff
alias as an alternative option for transforming marker counts before colocalization calculation. - Add
local_g
function to compute spatial autocorrelation of marker counts per node. - Add
compute_transition_probabilities
function to compute transition probabilities for k-step random walks for node pairs in a graph. - Add QC plot showing UMIs per UPIA vs Tau.
- Add plot functions showing edge rank and cell counts.
- Add 2D and 3D graph plot functions.
- Add heatmap plot functions showing colocalization and differential colocalization.
- Add volcano plot (value difference vs log p-value) function for differential colocalization.
- Add a function to calculate the differential colocalization between two conditions.
- Performance improvements and reduced bundle size in QC report.
- Improved console output in verbose mode.
- Improved logging from multiprocessing jobs.
- Improved runtime for graph creation.
- Added PMDS layout algorithm.
- Add
--sample_name
option tosingle-cell amplicon
to overwrite the name derived from the input filename. - Add
--skip-input-checks
option tosingle-cell amplicon
to make input filename checks warnings instead of errors. PixelDataset
instances are now written to disk without creating intermediate files on-disk.- A nice string representation for the
Graph
class, to let you know how many nodes and edges there are in the current graph object instance. - Metric to collect molecules (edges) in cells with outlier distributions of antibodies (aggregates).
- Provide typed interfaces for all per-stage report files using pydantic.
- Centralize pixelator intermediate file lookup and access.
- Add a
precomputed_layouts
property toPixelDataset
to allow for loading precomputed layouts. - Add
pixelator single-cell layout
stage to pixelator, which allows users to compute layouts for a PXL file that can then be used to visualize the graph in 2D or 3D downstream. - Add minimum marker count
polarization_min_marker_count
parameter to calculate Polarity Score. - Add "log1p" as an alternative for
PolarizationNormalizationTypes
. - Add
convert_indices_to_integers
option when creating graphs. - Add a feature flag module to aid in the development of new features.
- Change name and description of
Avg. Reads per Cell
andAvg. Reads Usable per Cell
in QC report. - The output name of the
.pxl
file from theannotate
step is now*.annotated.dataset.pxl
. - The output name of the
.pxl
file from theanalysis
step is now*.analysis.dataset.pxl
. - The term
edges
inmetrics
andadata
is now replaced withmolecules
. - Renaming of variables in per-stage JSON reports.
- Changed name of TCRb to TCRVb5 antibody in human-immunology-panel file and bumped to version 0.5.0.
- Renaming of component metrics in adata.
- Use MPX graph compatible permutation strategy when calculating Moran's I related statistics.
- Marker filtering is now done after count transformation in polarization score calculation.
- Use the input read count at the annotate stage for the
fraction_antibody_reads_in_outliers
metric denominator instead of the total raw input reads. - Use common analysis engine to orchestrate running different "per component" analyses, like polarization and colocalization analysis (yielding a roughly 3x speed-up over the previous approach).
- The default transformation for the calculation of the polarity score is now
log1p
instead ofclr
.
- Fix a bug in how discarded UMIs are calculated and reported.
- Fix deflated counts in the edgelist after collapse.
- Fix a bug where an
r1
orr2
in the directory part of a read file would break file name sanity checks. - Fix a bug where the wrong
r1
orr2
in the filename would be removed when multiple matches are present. - Logging would cause deadlocks in multiprocessing scenarios, this has been resolved by switching to a server/client-based logging system.
- Fix a bug in the amplicon stage where read suffixes were not correctly recognized.
- Ensure deterministic results from
pmds_layout
(given a set seed). - Fix an issue with the
fraction_antibody_reads_usable_per_cell
metric where the denominator read count was not correctly averaged with the cell count.
- Remove multi-sample processing from all
single-cell
subcommands. - Remove
--input1_pattern
and--input2_pattern
fromsingle-cell amplicon
command. - Self-correlations, e.g. CD8 vs CD8 are no longer part of the colocalization results, as these values will always be undefined.
- Remove
umi_unique_count
andupi_unique_count
fromedgelist
. - Remove
umi
andmedian_umi_degree
fromcomponent
metrics. - Remove
normalized_rel
anddenoised
fromobsm
inanndata
. - Remove the
denoise
function. - Remove cell type selector in QC report for UMAP colored by molecule count plots.
- Remove
clr
as a transformation option inpixelator analysis
.
- Uninitialized value for
--polarization-n-permutations
- Bug in README shield formatting
This release introduces two major change in pixelator:
- the Graph backend has been switched from using igraph to using networkx
- the license has been changed from GLP2.0 to MIT
- Experimental 3D heatmap plotting feature.
- Optional caching of layouts to speed up computations in some scenarios.
experimental
mark that can be added to functions that are not yet production ready.- The underlying graph instance e.g. a networkx
Graph
instance is exposed as a property calledraw
from the pixelatorGraph
class. - Monte Carlo permutations supported when calculating Moran's I (
morans_z_sim
) inpolarization_scores
.
- The default (and only) graph backend in pixelator is now based on networkx.
mean_reads
andmedian_reads
in adata.obs tomean_reads_per_molecule
andmedian_reads_per_molecule
respectively.- Drop support for python 3.8 and 3.9.
- Change output format of
collapse
from csv to parquet. - Change input and output format of
graph
from csv to parquet. - Change input format of
annotate
from csv to parquet. - Rename the report to "qc report"
- Add a Reads per Molecule frequency figure to the sequencing section of the qc report.
- Remove placeholder warning of missing data for not yet implemented features.
- Change "Median antibody molecules per cell" to "Average antibody molecules per cell" in the qc report.
- Refactoring of the graph backend implementations module.
- Speeding up the
amplicon
step by roughly 3x.
- Nicer error messages when there are no components valid for computing colocalization.
- A bunch of warnings.
graph
no longer outputs the raw edge list.- igraph has been dropped as a graph backend for pixelator.
- Fixed broken pixeldataset aggregation for more than two samples.
- Fixed a bug in graph generation caused by accidentally writing the index to the parquet file.
For backwards compatibility, if there is a column named
index
in the edgelist, this will be removed and the user will get a warning indicating that this has happened.
- Fixed a bug in filtering pixeldataset causing it to return the wrong types.
- Fixed a bug in graph layout generation due to incorrect data frame concatenation.
- Add support for Python 3.11.
- Add early enablement work for a networkx backend for the graph stage.
- Fix report color axis in report figures not updating when selecting markers or cell types.
- Remove placeholder links in report tooltips.
- Fix a bug where aggregating data did not add the correct sample, and unique component columns.
- Lazy option for edge list loading (
pixeldataset.edgelist_lazy
), which returns apolars
LazyFrame
that can be used to operate on the edge list without reading all of it into memory. - Option (
ignore_edgelists
) to skip the edge lists when aggregating files. This defaults toFalse
.
- Types on the edge list in memory will utilize the
pandas
category
type for string, anduint16
for numeric values to lower the memory consumption when working with the edge list - Remove
--pbs1
and--pbs2
commandline arguments topixelator single-cell adapterqc
. - Restructure report figures.
- Improve metric names and tooltips in the report.
- Synchronize zoom level between the scatter plots in cell annotations section of the report.
- Add report placeholder for missing cell annotation data
- Add
Fraction of discarded UMIs
andAvg. Reads per Molecule
metrics to the report.
- Fix an issue where pixelator --version would return 0.0.0 when installing in editable mode.
- Unpin igraph dependency to allow for newer versions of igraph to be used.
- Cleanup README and point to the external documentation site.
- Change PyPi package name to pixelgen-pixelator.
- Fix an issue where
--keep-workdirs
option for pytest was not available when running pytest without restricting the testdir totests/integration
. - Fix an issue where pixelator --version would return 0.0.0.
clr
andrelative
transformation options for the colocalization computations inanalysis
- First public release of pixelator.