Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incremental "mark alive" pass for cyclic GC #126511

Draft
wants to merge 7 commits into
base: 3.13
Choose a base branch
from

Conversation

nascheme
Copy link
Member

@nascheme nascheme commented Nov 6, 2024

This adds a "mark alive" pass to the cyclic GC, done incrementally in order to reduce pause times for full GC collections. The "mark alive" pass works by starting from known GC roots and then using tp_traverse to mark everything alive that's reachable from that. Those objects will be skipped when the next full (gen 2) collection happens.

Based on my benchmarking it is quite effective at reducing GC pause times (latency). Here is some timing stats for a benchmark I ran. Timing with the "mark alive" feature turned off:

gc times: total 3.846s mark 0.001s max 77711us avg 371us 
gc timing full Q50: 14438.00
gc timing full Q75: 16572.00
gc timing full Q90: 23492.00
gc timing full Q95: 31689.00
gc timing full Q99: 41860.00 

Meaning of terms:

  • total - total time spent inside the cyclic GC
  • mark - time spent inside the "mark alive" process
  • max - maximum GC pause
  • avg - average GC pause

The "gc timing full" are the times taken for full (generation 2) GC collections. Qxx is the quantile of the time, units of microseconds.

With the mark alive feature on:

gc times: total 5.664s mark 3.938s max 16287us avg 616us
gc timing full Q50: 1112.02
gc timing full Q75: 1113.28
gc timing full Q90: 1232.18
gc timing full Q95: 1286.10
gc timing full Q99: 2176.05 

This benchmarking shows the overall time in the GC has slightly increased but the pause times have drastically decreased. The 99% quantile pause time is 19x shorter. It's possible with additional optimization the overall time can be further reduced. If it can't be made comparable in overall cost, I think it could be turned on via a feature like the PYTHON_GC_PRESET=min-latency, as proposed in gh-124772.

This is still a WIP. I would like to compare the pause times and overall performance with the incremental GC that is in the 3.14 and main branches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DO-NOT-MERGE interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant