Version 3 with cached cross chunk edges #454

akhileshh · 2023-08-06T20:03:24Z

Adds a new column family for cached cross chunks edges.
Adds MaxAgeGCRule for previous column family with supervoxel cross chunk edges; only needed during ingest and they get deleted eventually.
Edits make use of cached cross chunk edges.

Summary of changes in pychunkedgraph.ingest:

Layer 2 creation is mostly unchanged; stores cross chunk edges with supervoxels
- The column family used to store these edges now has a max age garbage collection rule
- During ingest, these edges can be used to cache higher layer cross chunk edges; will be deleted eventually by BigTable's garbage collection routines.
When ingesting layer 3, cross edges for children (layer 2) get updated and "lifted" by using the previously mentioned supervoxel cross chunk edges, these have a different column family so they're retained forever.
- At the same time, cross edges for parents at layer 3 will get created by merging cross edges of their children, these are intermediate and will be lifted when ingesting the next parent layer.
For each layer > 3 until root layer:
- Update children cross chunk edges by "lifting" the edges created during the previous layer ingest.
- Add parent cross chunk edges by merging children cross chunk edges; they will be updated when ingesting the next layer.

This assumes all chunks at lower layer have been created before creating the current layer so we can no longer queue parent chunk jobs automatically when its children chunks are complete.

We must now ingest/create one layer at a time.

Summary of changes in pychunkedgraph.graph.edits:

Edits are expected to be faster now; going to layer 2 to extract cross chunk edges is no longer necessary since they're cached at each layer.
During an edit, these cached cross chunk edges must be updated from both directions - to and from the newly created nodes and its existing neighbors.
- Most changes in this module are to handle this step.
- Caching these edges has also made the edits logic simpler and cleaner.
- When updating new cross edges, we need to ensure descendants get replaced by the highest parent.
- For splits, we need to filter out inactive cross edges after the local graph is read from bucket storage.

nkemnitz · 2023-09-06T08:32:16Z

pychunkedgraph/graph/cache.py


    def parents_multiple(self, node_ids: np.ndarray, *, time_stamp: datetime = None):
+        node_ids = np.array(node_ids, dtype=NODE_ID)


Just saw this here (and some other places) - same as in #458: np.array will by default create a copy. np.asarray will avoid copies, if the requirements are already met.

sdorkenw

Overall this looks good besides the one point - a tricky one though - that I marked

sdorkenw · 2023-09-08T02:33:16Z

pychunkedgraph/graph/edits.py

+            new_cx_edges_d[layer] = edges
+            assert np.all(edges[:, 0] == new_id)
+        cg.cache.cross_chunk_edges_cache[new_id] = new_cx_edges_d
+        entries = _update_neighbor_cross_edges(


I think this here can introduce problems if a neighboring node is a neighbor to multiple new_l2_ids.

_update_neighbor_cross_edges looks right to me. It writes a complete new set of L2 edges for a node. But if the same node is updated multiple times, then only the last update is reflected. Maybe the logic here takes care of this somehow but then it still introduces multiple unnecessary writes.

So, if I am correct about this, the solution would be to consolidate this call across all new_l2_ids to only make one call per neighboring node id.

sdorkenw · 2023-09-08T02:34:06Z

pychunkedgraph/graph/edits.py

+            new_cx_edges_d[layer] = edges
+            assert np.all(edges[:, 0] == new_id)
+        cg.cache.cross_chunk_edges_cache[new_id] = new_cx_edges_d
+        entries = _update_neighbor_cross_edges(


same issue as above

* feat: convert edges to ocdbt * feat: worker function to convert edges to ocdbt * feat: ocdbt option, consolidate ingest cli * fix(ingest): move fn to utils * fix(ingest): move ocdbt setup to a fn * add tensorstore req, fix build kaniko cache * feat: copy fake_edges to column family 4 * feat: upgrade atomic chunks * fix: rename abstract module to parent * feat: upgrade higher layers, docs * feat: upgrade cli, move common fns to utils * add copy_fake_edges in upgrade fn * handle earliest_timestamp, add test flag to upgrade * fix: fake_edges serialize np.uint64 * add get_operation method, fix timestamp in repair, check for parent * check for l2 ids invalidated by edit retries * remove unnecessary parent assert * remove unused vars * ignore invalid ids, assert parent after earliest_ts * check for ids invalidated by retries in higher layers * parallelize update_cross_edges * overwrite graph version, create col family 4 * improve status print formatting * remove ununsed code, consolidate small common module * efficient check for chunks not done * check for empty chunks, use get_parents * efficient get_edit_ts call by batching all children * reduce earliest_ts calls * combine bigtable calls, use numpy unique * add completion rate command * fix: ignore children without cross edges * add span option to rate calculation * reduce mem usage with global vars * optimize cross edge reading * use existing layer var * limit cx edge reading above given layer * fix: read for earliest_ts check only if true * filter cross edges fn with timestamps * remove git from dockerignore, print stats * shuffle for better distribution of ids * fix: use different var name for layer * increase bigtable read timeout * add message with assert * fix: make span option int * handle skipped connections * fix: read cross edges at layer >= node_layer * handle another case of skipped nodes * check for unique parent count * update graph_id in meta * uncomment line * make repair easier to use * add sanity check for edits * add sanity check for each layer * add layers flag for cx edges * use better names for functions and vars, update types, fix docs

* feat(ingest): use temporarily cached cross chunk edges * fix: switch to using partners vector instead of 2d edges array * fix(edits): l2 - use and store cx edges that become relevant only at l2 * chore: rename counterpart to partner * fix: update partner cx edges * feat(edits): use layer relevant partners * fix tests * persist cross chunk layers with each node * fix: update cross chunk layers in edits * fix: update cross layer from old ids in l2 * update deprecated utcnoww * fix split tests * Bump version: 3.0.0 → 3.0.1 * fix: missed timestamp arg * update docs, remove unnecessary methods * revert structural changes * fix new tests; revert bumpversion.cfg

akhileshh requested a review from sdorkenw August 6, 2023 20:04

akhileshh force-pushed the pcgv3 branch from 5e1f12f to f3d3e5b Compare August 11, 2023 14:03

akhileshh changed the title ~~WIP~~ WIP V3 Aug 11, 2023

akhileshh marked this pull request as ready for review August 23, 2023 22:56

akhileshh changed the title ~~WIP V3~~ Version 3 with cached cross chunk edges Aug 23, 2023

akhileshh requested a review from fcollman August 24, 2023 00:43

akhileshh force-pushed the pcgv3 branch from 07dafa9 to bc571c8 Compare September 5, 2023 16:02

nkemnitz reviewed Sep 6, 2023

View reviewed changes

sdorkenw requested changes Sep 8, 2023

View reviewed changes

akhileshh requested a review from sdorkenw September 8, 2023 15:58

akhileshh force-pushed the pcgv3 branch 2 times, most recently from 1ddb0a7 to 17dfc10 Compare September 25, 2023 00:06

akhileshh force-pushed the pcgv3 branch from a8dc5f6 to bad9d4f Compare September 27, 2023 16:54

akhileshh force-pushed the pcgv3 branch from bbf80bb to 9381f87 Compare October 12, 2023 17:27

akhileshh force-pushed the pcgv3 branch from 9381f87 to 92b9078 Compare November 21, 2023 22:02

akhileshh force-pushed the pcgv3 branch from 92b9078 to d0a34e1 Compare December 2, 2023 17:34

akhileshh force-pushed the pcgv3 branch from d0a34e1 to fdb7aae Compare January 14, 2024 16:34

akhileshh force-pushed the pcgv3 branch 3 times, most recently from b13d8ec to d90813d Compare April 23, 2024 16:31

akhileshh force-pushed the pcgv3 branch 2 times, most recently from c460a5a to 6a2c5da Compare May 12, 2024 16:10

akhileshh force-pushed the pcgv3 branch from ff5b3ad to e638d8e Compare May 24, 2024 22:00

akhileshh force-pushed the pcgv3 branch 2 times, most recently from bf90549 to d2d9d44 Compare August 16, 2024 20:43

akhileshh force-pushed the pcgv3 branch 2 times, most recently from 280f9fe to cc4cd46 Compare September 3, 2024 01:29

akhileshh force-pushed the pcgv3 branch from cc4cd46 to 77947f1 Compare September 13, 2024 15:24

akhileshh force-pushed the pcgv3 branch 2 times, most recently from 6605bab to dcbecd1 Compare September 29, 2024 19:42

akhileshh and others added 24 commits December 5, 2024 20:32

fix: resolve column filter ambiguity(2)

4e1ce08

reset version v3

9e49fd7

breakup long fn

b42a59c

gh actions for pcgv3

fb0e5d3

segregate update nodes logic

b171f2e

fix(edits): overwrite children partners when superseded by parents

a72d0ff

fix: unique edges always, predecing edit ts, allow same segment merge

ea65ca6

Bump version: 3.0.0 → 3.0.1

7432d8a

fix(edits): mask all descendants when updating cx edges

bd4dd27

Bump version: 3.0.1 → 3.0.2

02c727d

fix(edits): use supervoxels to get the correct cross edge parents

9b0694e

Bump version: 3.0.2 → 3.0.3

257ad9e

fix(edits/split): filter out inactive cross edges

d1dbdae

fix(edits/split): filter out inactive cross edges AT EACH LAYER

c6002b0

migration debug code

1609624

use parent timestamps to lift cx edges

8268398

make dynamic mesh dir graph specific

c93efe9

fix(upgrade): use hierarchy from supervoxels

d5fa9fe

fix(upgrade): include cx edges at node_ts explicitly

1037341

adds job type guard, flush_redis prompts, improved status output

f1100ad

fix(upgrade): include timestamps for partner supervoxel parents

e62390a

fix(upgrade): use timestamps of partners at layers > 2

b8bcc3c

akhileshh force-pushed the pcgv3 branch from 66bc97a to b8bcc3c Compare December 5, 2024 20:33

version 3.0.9

53b8e41

akhileshh force-pushed the pcgv3 branch from 3bb23ab to 01b25b0 Compare December 9, 2024 22:33

feat: use mesh dir and dynamic dir from metadata

47f2d2f

akhileshh force-pushed the pcgv3 branch 2 times, most recently from 033ba89 to 47f2d2f Compare December 16, 2024 20:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Version 3 with cached cross chunk edges #454

Version 3 with cached cross chunk edges #454

akhileshh commented Aug 6, 2023 •

edited

Loading

nkemnitz Sep 6, 2023 •

edited

Loading

sdorkenw left a comment

sdorkenw Sep 8, 2023 •

edited

Loading

sdorkenw Sep 8, 2023


		def parents_multiple(self, node_ids: np.ndarray, *, time_stamp: datetime = None):
		node_ids = np.array(node_ids, dtype=NODE_ID)

Version 3 with cached cross chunk edges #454

Are you sure you want to change the base?

Version 3 with cached cross chunk edges #454

Conversation

akhileshh commented Aug 6, 2023 • edited Loading

nkemnitz Sep 6, 2023 • edited Loading

Choose a reason for hiding this comment

sdorkenw left a comment

Choose a reason for hiding this comment

sdorkenw Sep 8, 2023 • edited Loading

Choose a reason for hiding this comment

sdorkenw Sep 8, 2023

Choose a reason for hiding this comment

akhileshh commented Aug 6, 2023 •

edited

Loading

nkemnitz Sep 6, 2023 •

edited

Loading

sdorkenw Sep 8, 2023 •

edited

Loading