Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP, API: Add simple API for KDA #83

Merged
merged 12 commits into from
Aug 8, 2024
Merged

WIP, API: Add simple API for KDA #83

merged 12 commits into from
Aug 8, 2024

Conversation

nawtrey
Copy link
Collaborator

@nawtrey nawtrey commented Mar 26, 2024

Description

  • Add core.py with new class object KineticModel used to create and store
    model information for kinetic diagrams

  • Add KineticModel import to __init__.py so new API is accessible by importing kda

  • Fixes issue ENH: Improve KDA API #68

Todos

  • Add documentation for KineticModel
  • Add documentation for KineticModel methods
  • Update main README to reflect API changes
  • Add, modify, etc. tests such that core.py code is covered
  • Update docs (i.e. `` instead of `)
  • Add core.py to docs (i.e. api.rst)
  • Update usage.rst to match README
  • Test KDA paper code to ensure there are no backwards incompatible changes

Upstream Todos (handle separately)

Fix issue #56 -- remove the usage of "name"/"val" dict keys for all kda functions. I don't think this is a particularly useful feature and it will likely require a lot of code to keep it working. Instead we should simply store the edge value under the NetworkX default "weight" key. There is no reason to store "k12" as an edge weight under "name" for edge (0, 1) when the information is already available in the edge tuple. Also, if we do this we can likely remove both graph_utils.generate_edges(G) and graph_utils.retrieve_rate_matrix(K) and simply use the NetworkX built-in functions (e.g. nx.from_numpy_array(K, create_using=nx.MultiDiGraph)).

Comments

I'm still working out how to handle some of the cycle-based tasks. I need to figure out a good solution for cycles that allows the user to either input the cycle info (i.e. direction, substrate transport per cycle, etc.) or input general state info and automatically determine the cycle info. I haven't come up with anything novel so I'll likely leave that for later.

As it stands the API is already much easier to use than before, especially for transition fluxes (see script below).

Testing/Example Script

Testing Script
import numpy as np

import kda

if __name__ == "__main__":

	K = np.array(
	    [
	        [0, 1, 0, 1],
	        [1, 0, 1, 1],
	        [0, 1, 0, 1],
	        [1, 1, 1, 0],
	    ]
	)

	model = kda.KineticModel(K=K, G=None)
	model.build_cycles()
	model.build_partial_diagrams()
	model.build_directional_diagrams()
	model.build_state_probabilities(symbolic=True)

	print(model.K)
	print(model.G)
	print(model.cycles)
	print(len(model.partial_diagrams))
	print(model.get_partial_diagram_count())
	print(len(model.directional_diagrams))
	print(model.get_directional_diagram_count())

	for (i, j) in model.G.edges():
		flux = model.transition_flux(i=i, j=j, net=False, symbolic=True)
		print(f">>> j_{i}{j} = {flux}")

Status

  • Ready to go

@nawtrey nawtrey added the API API-related changes label Mar 26, 2024
@nawtrey nawtrey marked this pull request as draft March 26, 2024 03:14
@nawtrey
Copy link
Collaborator Author

nawtrey commented Mar 27, 2024

Cycle/Operational Flux Solution 1

Regarding the storage of cycle information -- I think an initial solution might be to simply create a method that lets the user "register" a cycle with all of its information. Something like the following might work:

model = kda.KineticModel(K, G=None)
model.register_cycle(identifier="1", cycle=[1, 2, 3, 4], order=[1, 2], substrate=["H"])

Currently the cycles are stored in a KineticModel as a list of lists of ints, but I could find a way to integrate build_cycles() with register_cycle() so they don't store redundant info. For example, I could update build_cycles() to create a dictionary to store all the cycles info and register_cycle() could simply operate on that dictionary as the user registers cycles. For now I think that's what I will do, but eventually it might be worth making a new object for the cycles if there are enough cycle-specific methods and data we want to store.

As an example of what we want to be able to store, we use the following dictionary to store the cycle info for the EmrE analysis:

# define the cycles manually (for repeatability)
EmrE_cycles = {
	1: {"cycle": [0, 1, 3, 2, 4, 5, 7, 6], "order": [6, 0]}, 
	2: {"cycle": [0, 6, 7, 5, 4, 2], "order": [6, 0]}, 
	3: {"cycle": [0, 6, 7, 5, 3, 2], "order": [6, 0]}, 
	4: {"cycle": [0, 6, 7, 5, 3, 1], "order": [6, 0]},
        5: {"cycle": [0, 2, 4, 5, 3, 1, 7, 6], "order": [6, 0]},
	6: {"cycle": [0, 6, 7, 1, 3, 2], "order": [6, 0]}, 
	7: {"cycle": [0, 6, 7, 1], "order": [6, 0]}, 
	8: {"cycle": [0, 2, 3, 1, 7, 5, 4, 6], "order": [6, 0]}, 
	9: {"cycle": [0, 6, 4, 5, 7, 1], "order": [6, 0]}, 
	10: {"cycle": [0, 6, 4, 5, 3, 2], "order": [6, 0]}, 
	11: {"cycle": [0, 6, 4, 5, 3, 1], "order": [6, 0]}, 
        12: {"cycle": [0, 1, 7, 5, 3, 2, 4, 6], "order": [6, 0]},
	13: {"cycle": [0, 6, 4, 2, 3, 1], "order": [6, 0]}, 
	14: {"cycle": [0, 6, 4, 2], "order": [6, 0]}, 
	15: {"cycle": [0, 1, 3, 5, 7, 6, 4, 2], "order": [0, 2]}, 
	16: {"cycle": [0, 2, 4, 6, 7, 1], "order": [0, 2]}, 
	17: {"cycle": [0, 2, 4, 5, 7, 1], "order": [0, 2]}, 
	18: {"cycle": [0, 2, 4, 5, 3, 1], "order": [0, 2]}, 
	19: {"cycle": [0, 2, 3, 5, 7, 1], "order": [0, 2]}, 
	20: {"cycle": [0, 1, 7, 6, 4, 5, 3, 2], "order": [0, 2]},
        21: {"cycle": [0, 1, 3, 2], "order": [0, 2]},
	22: {"cycle": [1, 7, 6, 4, 5, 3], "order": [1, 3]}, 
	23: {"cycle": [1, 7, 6, 4, 2, 3], "order": [2, 4]}, 
	24: {"cycle": [1, 7, 5, 4, 2, 3], "order": [2, 4]}, 
	25: {"cycle": [1, 7, 5, 3], "order": [1, 3]}, 
	26: {"cycle": [2, 3, 5, 7, 6, 4], "order": [2, 4]}, 
	27: {"cycle": [2, 3, 5, 4], "order": [2, 4]},
        28: {"cycle": [6, 7, 5, 4], "order": [7, 5]},
	}

This is very tedious to create and error-prone, but the benefit of having all the info up front is it makes analysis a lot easier. So I think the goal should be to find a way to get all this info into the KineticModel as simply as possible (from a user perspective) so it is a better experience down the road.

@nawtrey
Copy link
Collaborator Author

nawtrey commented May 27, 2024

Cycle/Operational Flux Solution 2

Perhaps we can require the user to submit the node positions for
their graph at creation time as well as a CW/CCW choice.

This provide us a few benefits:

  1. The node positions can be used to determine the correct
    cycle order for a given cycle since the cycles are basically
    just sets of nodes where their positions create polygons.
  2. Since we can use the node positions to determine if a cycle
    is CW/CCW, the user would never need to input cycle "order".

This would require a little more info up front, but if the user
ever wanted to graph their diagram (very likely) they would
need this information anyways. It would also be readily available
now that it is stored in the graph nodes
(e.g. pos = nx.get_node_attributes(G,'pos')). The main caveat
is that whatever algorithm we use for determining CW/CCW cycles
would have to be able to handle a range of polygons including
both concave and convex polygons. It looks like there are simple
ways to determine this (see here, and particularly this answer).

Now, this feature is only useful in the context of summing
net cycle fluxes to create operational flux expressions. To
fully automate this procedure, we would need to accompany
the CW/CCW cycle handler function with a function that can
assign meaning and labels to cycles. Perhaps register_cycles()
could simply allow a user to store 1. the cycle node list, 2.
the cycle label (e.g. "1", "2", etc.) and 3. the ligand turnover
information (e.g. "H", "D", "HD").

With both of these functions implemented (CW/CCW cycle function,
register cycles function) a user could do something like the
following:

# minimum required diagram info
K = np.array(
    [
        [0, 1, 0, 1],
        [1, 0, 1, 1],
        [0, 1, 0, 1],
        [1, 1, 1, 0],
    ]
)
pos = {0: [1, 1], 1: [-1, 1], 2: [-1, -1], 3: [1, -1]}

# create the model
model = kda.KineticModel(K, G=None, pos=pos, CCW=True)
model.build_cycles()

# register the cycles
cycle_labels = ["a", "b", "c"]
cycle_substrates = [["R"], ["R", "L"], ["L"]]
for cycle, label, substrate in zip(model.cycles, labels, cycle_substrates):
  model.register_cycle(cycle=cycle, label=label, substrate=substrate)

# calculate the operational fluxes
J_L = model.operational_flux(substrate="L", symbolic=True)
J_R = model.operational_flux(substrate="R", symbolic=True)

Behind the scenes, KineticModel.operational_flux() would take the sum
of the correct net cycle fluxes by

  1. collecting the cycles that contribute to the flux of L or R
    (using the registered cycles info),
  2. determining the node order for each contributing cycle according to
    the chosen CCW=True convention (using the node positions/polygon algo),
  3. finding the net cycle flux for each cycle using existing methods, and
  4. taking the sum of the net cycle fluxes.

I like this solution a fair amount more because it requires less information
up front than manually entering all cycles, the info it needs is already
required for plotting diagrams, and it is more inline with the theory as
desribed by Hill (where you choose a cycle convention up front). It also
avoids the risk of upstream cycle-related changes resulting in issues for
KDA users since the cycle directions will not rely on upstream code, making
it easier to reproduce identical results across users and time (this has been
an issue in the past, particularly for kinetic diagrams with multiple
Hamiltonian cycles).

@nawtrey
Copy link
Collaborator Author

nawtrey commented Jun 12, 2024

Tests are in now, only 2 lines are uncovered but they are low priority.

With the size of this PR I'm going to shelve the cycle flux considerations for later and focus on wrapping this up so it is available for use.

* Add `core.py` with new class object
`KineticModel` used to create and store
model information for kinetic diagrams

* Add `KineticModel` import to `__init__.py`
so new API is accessible by importing `kda`

* Addresses issue #68
* Update existing tests to leverage the new API

* Add `test_3_state_model_symbolic` to verify
`KineticModel.transition_flux` symbolic results

* Add `test_3_state_model_numeric` to verify
`KineticModel.transition_flux` numeric results

* Add `test_transition_flux_matching_indices` and
`test_transition_flux_mismatched_symbolic` to
verify appropriate errors are raised in
`KineticModel.transition_flux`
* Add docstring to `KineticModel`

* Add docstrings to all `KineticModel` methods
* Change `transition_flux` to `get_transition_flux`
to follow method naming convention

* Update call signature of `get_transition_flux`
so initial/final states are more clear (`i` -> `state_i`)

* Misc documentation changes
* Remove redundant code from `test_kda.py`.
Code was previously moved to `conftest.py`.

* Add hack to
`KineticModel.build_state_probabilities`
to handle `key` parameters. Can probably
find a more elegant solution but this works
for now.
* Update/fix documentation for `core.py`
classes and methods

* Update documentation in `__init__.py`

* Add `core.py` to `api.rst`
@nawtrey
Copy link
Collaborator Author

nawtrey commented Aug 8, 2024

Locally the KDA paper code runs without error:

KDA paper code completed in 179.15 seconds

I'll check off the associated task.

* Updates `README` to include details about
calculating transition fluxes with the new API.
Similar changes are made to `usage.rst`.

* Fixes a minor documentation error in `core.py`
* Adds `test_invalid_inputs` to verify a
`RuntimeError` is raised when no inputs
are given when building a `KineticModel`

* Remove unused code in `test_kda.py`
@nawtrey
Copy link
Collaborator Author

nawtrey commented Aug 8, 2024

Okay I think this is ready to go at this point. The API is working as intended and already makes things much easier for a user, especially if transition fluxes are needed. The kda manuscript code still runs without issue, all tests pass (with 100% patch coverage) and the doc pages look good. Any further changes should probably be made in targeted PR's for clarity.

I'll open a separate issue regarding the cycle fluxes API and maybe one for plotting as well. For most users, the diagram plotting is probably not crucial but it would be nice to print the kinetic diagram and cycles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API API-related changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Change graphs to store rates as edge weights instead of attributes
1 participant