- Hotfix greater than/less than operations in PDBManager oligmer selection to include equality. #408.
- Fixes progress bar for
download_pdb_multiprocessing
. #394 - Add support for DSSP >4. Backwards compatibility is still supported. #355. Fixes #353.
- Fixes bug where RSA features are missing from nodes with insertion codes. #355. Fixes #354.
- Fix bug where the
deprotonate
argument is not wired up tographein.protein.graphs.construct_graphs
. #375 - Add missing modified residue
AYA
to constants #390 - Fix bug where the
deprotonate
argument is not wired up tographein.protein.graphs.construct_graphs
#375 - Fix cluster file loading bug in
pdb_data.py
#396 - Uses
cpdb
as default PDB file parser for improved performance. #323. - Improves storage of hetatm data in
graphein.protein.tensor.io.protein_to_pyg
#397.
- set logging to false by default and added mmcif support #402
- add metadata options for uniprot, ecnumber and CATH code to pdb manager #398
- bumped logging level down from
INFO
toDEBUG
at several places to reduced output length #391 - exposed
fill_value
andbfactor
option toprotein_to_pyg
function. #385 and #388 - Updated Foldcomp datasets with improved setup function and updated database choices such as ESMAtlas. #382
- Resolve issue with notebook version and
pluggy
in Dockerfile. #372 - Remove
typing_extension
as dependency since we now primarily support Python >=3.8 andLiteral
is included intyping
there.
- Fixes bug in pdb_manager for clustering sequences via mmseqs #377
- Remove hydrogen isotopes as well in
graphein.protein.graphs.deprotonate_structure
. #337 - Fixes bug in sidechain torsion angle computation for structures containing
PYL
/other non-standard amino acids (#357). Fixes #356. - Replaces RCSB PDB FTP urls with new API. #364
- In Pandas
1.2.0
and later, The default value of regex forSeries.str.replace()
will change fromTrue
toFalse
. So we need use regular expressions explicitly now, to suppress a FutureWarning. By @StevenAZy (#359)
- Improves the tensor->PDB writer (
graphein.protein.tensor.io.to_pdb
) by automatically unravelling residue-level b-factor predictions/annotations (#352).
- Adds support for PyG 2.4+ (#350)
- Fixes
add_sequence_neighbour_vector
to have a zero vector when no neighbor is feasible. Extend to handle insertion codes (#336).
- Fixes edge case in FoldComp database download if target directory has same name as database (#339)
- Pins BioPandas version to latest
- [Feature] - #305 Adds the
add_virtual_beta_carbon_vector
function inspired by RFdiffusion and ProteinMPNN.
- Chain selections are now specified with either
"all"
or a list of strings (e.g.["A", "B"]
) rather than a single selection string (e.g."AB"
). This is a necessary chain due to MMTF support which can have multicharacter chain identifiers. #307
- [Bugfix] - #305 Fixes
add_k_nn_edges
for the case when some residues were dropped before (e.g. when some alt_locs are removed). - [Bugfix] - #305 Removes obsolete
remove_insertions
inrgroup_df
construction. - [Bugfix] - #305 Fixes the construction of geometric features when beta-carbons or side chains are missing in non-glycine residues (for example in
H:CYS:104
in 3SE8). - [Bugfix] - #305 Fixes data types of geometric feature vectors:
object
->float
. - [Bugfix] - #301 Fixes the conversion of undirected NetworkX graph to directed PyG data.
- [Bugfix] - #334 Fixes the corner case of the NetworkX -> PyG conversion when input graph has no edges.
- Adds missing
stage
parameter tographein.ml.datasets.foldcomp_data.FoldCompDataModule.setup()
. #310 - Ensures exproting groups of PDB chains with PDBManager selects the first model for multu-model structures. #311
- Fixes bug with exporting PDBs with only one splitting strategy in PDBManager #311
- Fixes incorrect jaxtyping syntax for variable size dimensions #312
- Fixes shape of angle embeddings for
graphein.protein.tesnor.angles.alpha/kappa
. #315 - Fixes initialisation of
Protein
objects. #317 #318 - Fixes incorrect
rad
andembed
argument logic ingraphein.protein.tensor.angles.dihedrals/sidechain_torsion
#321 - Fixes incorrect start padding in pNeRF output #321
- Fixes
pyyaml
breaking installation #328 - Fixes setting ID for PyG data objects when loading from a path to a
.pdb
file #332
- Adds transform composition to FoldComp Dataset #312
- Adds entry point for biopandas dataframes in
graphein.protein.tensor.io.protein_to_pyg
. #310 - Adds support for
.ent
files tographein.protein.graphs.read_pdb_to_dataframe
. #310 - Obsolete residues with no replacement are now returned by
graphein.protein.utils.get_obsolete_mapping
. #310 - Adds the ability to store a dictionary of HETATM positions in
Data
/Protein
objects created in thegraphein.protein.tensor
module. #307 - Improved handling of non-standard residues in the
graphein.protein.tensor
module. #307 - Insertions retained by default in the
graphein.protein.tensor
module. I.e.insertions=True
is now the default behaviour.#307 plot_pyg_data
now also plots some geometric features if present. #305- Adds transform composition to FoldComp Dataset #312
- Improve FoldComp dataloading performance and include B factors (pLDDT) in output. #313 #315
- Add new helper functions to PDBManager #322 (@amorehead)
- Add non-standard 'CYX' to
RESI_THREE_TO_1
.
- [PDBManager] - #272 Adds a utility for creating custom dataset splits from the PDB.
- [FoldComp Dataset] - #284 - Create ML datasets from FoldComp databases.
- [ESM] - #284 - Wrapper for ESMFold batch folding & embedding.
- [Downloads] MMTF downloading now supported in download utilities. #272
- The
pdb_path
argument to many functions (e.g.graphein.protein.graphs.construct_graph
) has been renamed topath
as this can now accept MMTF files in addition to PDB files. Protein
tensors have coordinates renamed fromProtein.x
toProtein.coords
. #272
- Tensor types are now defined using
jaxtyping
, removing thetorchtyping
dependency #272 - Drops explicit Python 3.7 support. Colab now runs on 3.8+. #272
- Dockerfile now builds from
pytorch/pytorch:1.13.0-cuda11.6-cudnn8-runtime
(replacespytorch/pytorch:1.9.1-cuda11.1-cudnn8-runtime
) #272 - Missing
os
import fixed in [#297(#297). Fixes #296
- [Metrics] - #245 Adds a selection of structural metrics relevant to protein structures.
- [Tensor Operations] - #244 Adds suite of utilities for working directly with tensor-based representations of proteins (graphein.protein.tensor).
- [Tensor Operations] - #244 Adds suite of utilities for working with ESMfold (graphein.protein.folding_utils).
- [Feature] = #277 Adds support for pathlib paths for protein graph creation. #269
- [Logging] - #221 Adds global control of logging with
graphein.verbose(enabled=False)
. - [Logging] - #242 Adds control of protein graph construction logging. Resolves #238
-
[Bugfix] - [#222]#222 Fixes entrypoint for user-defined
df_processing_funcs
(#216) -
[Feature] = #263 Adds control of Alt Loc selection strategy. N.b. Default
ProteinGraphConfig
changed to include insertions by default (insertions=True
) andalt_locs="max_occupancy"
. -
[Feature] - #264 Adds entrypoint to
graphein.protein.graphs.construct_graph
for passing in a BioPandas dataframe directly. -
[Feature] - #229 Adds support for filtering KNN edges based on self-loops and chain membership. Contribution by @anton-bushuiev.
-
[Feature] - #234 Adds support for aggregating node features over residues (
graphein.protein.features.sequence.utils.aggregate_feature_over_residues
). -
[Bugfix] - #234 fixes use of nullcontext in silent graph construction.
-
[Bugfix] - #234 Fixes division by zero errors for edge colouring in visualisation.
-
[Bugfix] - #254 Fix peptide bond addition for all atom graphs.
-
[Bugfix] - #223 Fix handling of insertions in protein graphs. Insertions are now given IDs like:
A:SER:12:A
. Contribution by @manonreau. -
[Bugfix] - #229 Fixes bug in KNN edge computation. Contribution by @anton-bushuiev.
-
[Bugfix] - #220 Fixes edge metadata conversion to PyG. Contribution by @manonreau.
-
[Bugfix] - #220 Fixes centroid atom grouping & avoids unnecessary edge computation where none are found. Contribution by @manonreau.
-
[Bugfix] - #268 Fixes 'sequence' metadata feature for atomistic graphs, removing duplicate residues. Contribution by @kamurani.
- [Bugfix] - #234 - Fixes bugs and improves
conversion.convert_nx_to_pyg
andvisualisation.plot_pyg_data
. Removes distance matrix (dist_mat
) from defualt set of features converted to tensor.
- [Improvement] - #234 - Adds
parse_aggregation_type
to retrieve aggregation functions.
- [Bugfix] - #281 - Bugfix for nx->PyG conversion for graphs containing edges without "kind" attributes. Contribution by @rg314.
- [Improvement] - #234 - Adds 1 to 3 mappings to
graphein.protein.resi_atoms
.
- [Tensor Module] - #244 Documents new graphein.protein.tensor module.
- [CI] - #244 Updates to intersphinx maps
- [CI] - #244 CI now runs for python 3.8, 3.9 and torch 1.12.0 and 1.13.0
- [CI] - #244 Separate builds for core library and library with DL dependencies.
- [Licence] - #244 Bump to 2023
- [Bugfix] - #206 Fixes
KeyError
when usinggraphein.protein.edges.distance.node_coords
- [Bugfix] - Includes missing data files in
MANIFEST.in
#205
- [Bugfix] - #208 - Resolves SSL issues with RegNetwork.
- [Feature] - #208 support for loading local pdb files by
ProteinGraphDataset
andInMemoryProteinGraphDataset
.
by adding a params:
pdb_paths
and set theself.raw_dir
to the root path(self.pdb_path
) of pdb_paths list (the root path should be only one, pdb files should be under the same folder).it allows loading pdb files from the
self.pdb_path
instead of loading fromself.raw
. If you wish to download from af2 or pdb, just setpdb_paths
toNone
and it goes back to the former version.
- [Bugfix] - #208 explicitly installs
jupyter_contrib_nbextensions
in Docker.
- [Feature] - #186 adds support for scaling node sizes in plots by a computed feature. Contribution by @cimranm
- [Feature] - #189 adds support for parallelised download from the PDB.
- [Feature] - #189 adds support for: van der waals interactions, vdw clashes, pi-stacking interactions, t_stacking interactions, backbone carbonyl-carbonyl interactions, salt bridges
- [Feature] - #189 adds a
residue_id
column to PDB dfs to enable easier accounting in atom graphs. - [Feature] - #189 refactors torch geometric datasets to use parallelised download for faster dataset preparation.
- [Patch] - #187 updates sequence retrieval due to UniProt API changes.
- [Patch] - #189 fixes bug where chains and PDB identifiers were not properly aligned in
ml.ProteinGraphDataset
. - [Patch] - #201 Adds missing
MSE
tographein.protein.resi_atoms.RESI_NAMES
,graphein.protein.resi_atoms.RESI_THREE_TO_1
. #200 - [Patch] - #201 Fixes bug where check for same-chain always evaluates as False. #199
- [Patch] - #201 Fixes bug where deprotonation would only remove hydrogens based on
atom_name
rather thanelement_symbol
. #198 - [Patch] - #201 Fixes bug in ProteinGraphDataset input validation.
- #189 refactors PDB download util. Now returns path to download file, does not accept a config object but instead receives the output directory path directly.
- [Feature] - #165 adds support for direct AF2 graph construction.
- [Feature] - #165 adds support for selecting model indices from PDB files.
- [Feature] - #165 adds support for extracting interface subgraphs from complexes.
- [Feature] - #165 adds support for computing the radius of gyration of a structure.
- [Feature] - #165 adds support for adding distances to protein edges.
- [Feature] - #165 adds support for fully connected edges in protein graphs.
- [Feature] - #165 adds support for distance window-based edges for protein graphs.
- [Feature] - #165 adds support for transformer-like positional encoding of protein sequences.
- [Feature] - #165 adds support for plddt-like colouring of AF2 graphs
- [Feature] - #165 adds support for plotting PyG Data object (e.g. for logging to WandB).
- [Feature] - #170 Adds support for viewing edges in
graphein.protein.visualisation.asteroid_plot
. Contribution by @avivko. - [Patch] - #178 Fixes #171 and optimizes
graphein.protein.features.nodes.dssp
. Contribution by @avivko. - [Patch] - #174 prevents insertions always being removed. Resolves #173. Contribution by @OliverT1.
- [Patch] - #165 Refactors HETATM selections.
- [Feature] - #165 adds additional graph-level molecule features.
- [Feature] - #165 adds support for generating conformers (and 3D graphs) from SMILES inputs
- [Feature] - #163 Adds support for molecule graph generation from an RDKit.Chem.Mol input.
- [Feature] - #163 Adds support for multiprocess molecule graph construction.
- [Feature] - #165 adds support for 3D RNA graph construction.
- [Feature] - #165 adds support for generating RNA SS from sequence using the Nussinov Algorithm.
- [Patch] - #163 uses tqdm.contrib.process_map insteap of multiprocessing.Pool.map to provide progress bars in multiprocessing.
- [Fix] - #165 makes returned subgraphs editable objects rather than views
- [Fix] - #165 fixes global logging set to "debug".
- [Fix] - #165 uses rich progress for protein graph construction.
- [Fix] - #165 sets saner default for node size in 3d plotly plots
- [Dependency] - #165 Changes CLI to use rich-click instead of click for prettier formatting.
- [Package] - #165 Adds support for logging with loguru and rich
- [Package] - Pin BioPandas version to 0.4.1 to support additional parsing features.
- #165 adds RNA SS edges into graphein.protein.edges.base_pairing
- #163 changes separate filetype input paths to
graphein.molecule.graphs.construct_graph
. Interface is simplified to simplypath="some/path.extension"
instead of separate inputs likemol2_path=...
andsdf_path=...
.
- [Patch] - #158 changes the eigenvector computation method from
nx.eigenvector_centrality
tonx.eigenvector_centrality_numpy
. - [Feature] - #154 adds a way of checking that DSSP is executable before trying to use it. #154
- [Feature] - #157 adds support for small molecule graphs using RDKit. Resolves #155.
- [Feature] - #159 adds support for conversion to Jraph graphs for JAX users.
- #157 refactors config matching operators from
graphein.protein.config
tographein.utils.config
- #157 refactors config parsing operators from
graphein.utils.config
tographein.utils.config_parser
- [Feature] - #141 adds edge construction based on sequence distance.
- [Feature] - #143 adds equality and isomorphism testing functions between graphs, nodes and edges (#142)
- [Feature] - #144 adds support for chain-level and secondary structure-level graphs with associated visualisation tools and tutorial. Resolves #128
- [Feature] - #144 adds support for chord diagram visualisations.
- [Feature] - #144 adds support for automagically downloading new PDB files for obsolete structures.
- [Feature] - #150 adds support for hydrogen bond donor and acceptor counts node features. #145
- [Misc] - #144 makes visualisation functions accessible in the
graphein.protein
namespace. #138 - [Bugfix] - #147 fixes error in
add_distance_threshold
introduced in v1.2.1 that would prevent the edges being added to the graph. #146 - [Bugfix] - #149 fixes a bug in
add_beta_carbon_vector
that would cause coordinates to be extracted for multiple positions if the residue has an altloc. Resolves #148
- [Feature] - #124 adds support for vector features associated protein protein geometry. #120 #122
- [Feature] - #124 adds visualisation of vector features in 3D graph plots.
- [Feature] - #121 adds functions for saving graph data to PDB files.
- [Bugfix] - #136 changes generator comprehension when updating coordinates in subgraphs to list comprehension to allow pickling
- [Bugfix] - #136 fixes bug in edge construction functions using chain selections where nodes from unselected chains would be added to the graph.
- #124 refactors
graphein.protein.graphs.compute_rgroup_dataframe
and moves it tographein.protein.utils
. All internal references have been moved accordingly.
- [Feature] - #104 adds support for asteroid plots and distance matrix visualisation.
- [Feature] - #104 adds support for protein graph analytics (
graphein.protein.analysis
) - [Feature] - #110 adds support for secondary structure & surface-based subgraphs
- [Feature] - #113 adds CLI support(!)
- [Feature] - #116 adds support for onehot-encoded amino acid features as node attributes.
- [Feature] - #119 Adds plotly-based visualisation for PPI Graphs
- [Bugfix] - #110 fixes minor bug in
asa
where it would fail if added as a first/only dssp feature. - [Bugfix] - #110 Adds install for DSSP in Dockerfile
- [Bugfix] - #110 Adds conda install & DSSP to tests
- [Bugfix] - #119 Delaunay Triangulation computed over all atoms by default. Adds an option to restrict it to certain atom types.
- [Bugfix] - #119 Minor fixes to stability of RNA Graph Plotting
- [Bugfix] - #119 add tolerance parameter to add_atomic_edges
- [Documentation] - #104 Adds notebooks for visualisation, RNA SS Graphs, protein graph analytics
- [Documentation] - #119 Overhaul of docs & tutorial notebooks. Adds interactive plots to docs, improves docstrings, doc formatting, doc requirements.
- #119 - Refactor RNA Graph constants from graphein.rna.graphs to graphein.rna.constants. Only problematic if constants were accessed directly. All internal references have been moved accordingly.
- [Bugfix] - #107 improves robustness of removing insertions and hetatms, resolves #98
- [Packaging] - #108 fixes version mismatches in pytorch_geometric in docker install
- [Packaging] - #100 adds docker support.
- [Feature] - #96 Adds support for extracting subgraphs
- [Packaging] - #101 adds support for devcontainers for remote development.
- [Bugfixes] - #95 adds improved robustness for edge construction functions in certain edge cases. Insertions in the PDB were occasionally not picked up due to a brittle implementations. Resolves #74 and #98
- [Improvement] - #79 Replaces
Literal
references withtyping_extensions.Literal
for Python 3.7 support.
- [Bug] Adds a fix for #74. Adding a disulfide bond to a protein with no disulphide bonds would fail. This was fixed by adding a check for the presence of a minimum of two CYS residues.