-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Version 3 with cached cross chunk edges #454
base: main
Are you sure you want to change the base?
Conversation
pychunkedgraph/graph/cache.py
Outdated
|
||
def parents_multiple(self, node_ids: np.ndarray, *, time_stamp: datetime = None): | ||
node_ids = np.array(node_ids, dtype=NODE_ID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just saw this here (and some other places) - same as in #458: np.array
will by default create a copy. np.asarray
will avoid copies, if the requirements are already met.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks good besides the one point - a tricky one though - that I marked
pychunkedgraph/graph/edits.py
Outdated
new_cx_edges_d[layer] = edges | ||
assert np.all(edges[:, 0] == new_id) | ||
cg.cache.cross_chunk_edges_cache[new_id] = new_cx_edges_d | ||
entries = _update_neighbor_cross_edges( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this here can introduce problems if a neighboring node is a neighbor to multiple new_l2_ids
.
_update_neighbor_cross_edges
looks right to me. It writes a complete new set of L2 edges for a node. But if the same node is updated multiple times, then only the last update is reflected. Maybe the logic here takes care of this somehow but then it still introduces multiple unnecessary writes.
So, if I am correct about this, the solution would be to consolidate this call across all new_l2_ids to only make one call per neighboring node id.
pychunkedgraph/graph/edits.py
Outdated
new_cx_edges_d[layer] = edges | ||
assert np.all(edges[:, 0] == new_id) | ||
cg.cache.cross_chunk_edges_cache[new_id] = new_cx_edges_d | ||
entries = _update_neighbor_cross_edges( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same issue as above
1ddb0a7
to
17dfc10
Compare
b13d8ec
to
d90813d
Compare
c460a5a
to
6a2c5da
Compare
bf90549
to
d2d9d44
Compare
280f9fe
to
cc4cd46
Compare
6605bab
to
dcbecd1
Compare
* feat: convert edges to ocdbt * feat: worker function to convert edges to ocdbt * feat: ocdbt option, consolidate ingest cli * fix(ingest): move fn to utils * fix(ingest): move ocdbt setup to a fn * add tensorstore req, fix build kaniko cache * feat: copy fake_edges to column family 4 * feat: upgrade atomic chunks * fix: rename abstract module to parent * feat: upgrade higher layers, docs * feat: upgrade cli, move common fns to utils * add copy_fake_edges in upgrade fn * handle earliest_timestamp, add test flag to upgrade * fix: fake_edges serialize np.uint64 * add get_operation method, fix timestamp in repair, check for parent * check for l2 ids invalidated by edit retries * remove unnecessary parent assert * remove unused vars * ignore invalid ids, assert parent after earliest_ts * check for ids invalidated by retries in higher layers * parallelize update_cross_edges * overwrite graph version, create col family 4 * improve status print formatting * remove ununsed code, consolidate small common module * efficient check for chunks not done * check for empty chunks, use get_parents * efficient get_edit_ts call by batching all children * reduce earliest_ts calls * combine bigtable calls, use numpy unique * add completion rate command * fix: ignore children without cross edges * add span option to rate calculation * reduce mem usage with global vars * optimize cross edge reading * use existing layer var * limit cx edge reading above given layer * fix: read for earliest_ts check only if true * filter cross edges fn with timestamps * remove git from dockerignore, print stats * shuffle for better distribution of ids * fix: use different var name for layer * increase bigtable read timeout * add message with assert * fix: make span option int * handle skipped connections * fix: read cross edges at layer >= node_layer * handle another case of skipped nodes * check for unique parent count * update graph_id in meta * uncomment line * make repair easier to use * add sanity check for edits * add sanity check for each layer * add layers flag for cx edges * use better names for functions and vars, update types, fix docs
* feat(ingest): use temporarily cached cross chunk edges * fix: switch to using partners vector instead of 2d edges array * fix(edits): l2 - use and store cx edges that become relevant only at l2 * chore: rename counterpart to partner * fix: update partner cx edges * feat(edits): use layer relevant partners * fix tests * persist cross chunk layers with each node * fix: update cross chunk layers in edits * fix: update cross layer from old ids in l2 * update deprecated utcnoww * fix split tests * Bump version: 3.0.0 → 3.0.1 * fix: missed timestamp arg * update docs, remove unnecessary methods * revert structural changes * fix new tests; revert bumpversion.cfg
033ba89
to
47f2d2f
Compare
MaxAgeGCRule
for previous column family with supervoxel cross chunk edges; only needed during ingest and they get deleted eventually.Summary of changes in
pychunkedgraph.ingest
:layer
>3
untilroot
layer:This assumes all chunks at lower layer have been created before creating the current layer so we can no longer queue parent chunk jobs automatically when its children chunks are complete.
We must now ingest/create one layer at a time.
Summary of changes in
pychunkedgraph.graph.edits
: