Skip to content

Latest commit

 

History

History
2747 lines (2530 loc) · 54.1 KB

bm-20241025-azure-x86_64-brandtbucher-justin_no_externs-3.14.0a1+-5791853-pystats-gc_traversal-vs-base.md

File metadata and controls

2747 lines (2530 loc) · 54.1 KB

Execution counts

Execution counts for Tier 1 instructions.

The "miss ratio" column shows the percentage of times the instruction executed that it deoptimized. When this happens, the base unspecialized instruction is not counted.

Name Base Count Head Count Change
LOAD_FAST 1,740 1,740 0.0%
STORE_FAST 1,140 1,140 0.0%
LOAD_CONST 780 780 0.0%
PUSH_NULL 540 540 0.0%
POP_TOP 480 480 0.0%
LOAD_GLOBAL_MODULE 480 480 0.0%
RESUME_CHECK 480 480 0.0%
LOAD_ATTR_MODULE 360 360 0.0%
RETURN_VALUE 300 300 0.0%
LOAD_FAST_LOAD_FAST 300 300 0.0%
CALL_NON_PY_GENERAL 300 300 0.0%
CALL_PY_EXACT_ARGS 300 300 0.0%
CALL 260 260 0.0%
LOAD_ATTR 260 260 0.0%
LOAD_ATTR_INSTANCE_VALUE 240 240 0.0%
GET_ITER 180 180 0.0%
BUILD_LIST 180 180 0.0%
RETURN_CONST 180 180 0.0%
CALL_BUILTIN_CLASS 180 180 0.0%
FOR_ITER_RANGE 180 180 0.0%
LOAD_GLOBAL_BUILTIN 180 180 0.0%
BUILD_TUPLE 120 120 0.0%
CALL_FUNCTION_EX 120 120 0.0%
JUMP_BACKWARD 120 120 0.0%
LOAD_DEREF 120 120 0.0%
LOAD_GLOBAL 120 120 0.0%
POP_JUMP_IF_FALSE 120 120 0.0%
POP_JUMP_IF_NOT_NONE 120 120 0.0%
CALL_BUILTIN_FAST_WITH_KEYWORDS 120 120 0.0%
LOAD_ATTR_METHOD_NO_DICT 120 120 0.0%
LOAD_ATTR_METHOD_WITH_VALUES 120 120 0.0%
TO_BOOL 100 100 0.0%
BINARY_OP_ADD_FLOAT 60 60 0.0%
MAKE_FUNCTION 60 60 0.0%
NOP 60 60 0.0%
BINARY_OP 60 60 0.0%
CALL_INTRINSIC_1 60 60 0.0%
COPY_FREE_VARS 60 60 0.0%
FOR_ITER 60 60 0.0%
IS_OP 60 60 0.0%
JUMP_FORWARD 60 60 0.0%
LIST_EXTEND 60 60 0.0%
MAKE_CELL 60 60 0.0%
POP_JUMP_IF_TRUE 60 60 0.0%
SET_FUNCTION_ATTRIBUTE 60 60 0.0%
STORE_DEREF 60 60 0.0%
STORE_FAST_STORE_FAST 60 60 0.0%
BINARY_OP_SUBTRACT_FLOAT 60 60 0.0%
BINARY_SUBSCR_TUPLE_INT 60 60 0.0%
CALL_METHOD_DESCRIPTOR_NOARGS 60 60 0.0%
CALL_METHOD_DESCRIPTOR_O 60 60 0.0%
CALL_PY_GENERAL 60 60 0.0%
COMPARE_OP_INT 60 60 0.0%
TO_BOOL_BOOL 60 60 0.0%
UNPACK_SEQUENCE_TWO_TUPLE 60 60 0.0%
ENTER_EXECUTOR 60 60 0.0%
BINARY_SUBSCR 20 20 0.0%
UNPACK_SEQUENCE 20 20 0.0%

Pair counts

Pair counts for top 100 opcode pairs

Pairs of specialized operations that deoptimize and are then followed by the corresponding unspecialized instruction are not counted as pairs.

Not included in comparative output.

Predecessor/Successor Pairs

Top 5 predecessors and successors of each Tier 1 opcode.

This does not include the unspecialized instructions that occur after a specialized instruction deoptimizes.

Not included in comparative output.

Specialization stats

Specialization stats by family

BINARY_OP

specialization stats for BINARY_OP family
Kind Base Count Base Ratio Head Count Head Ratio Change
deferred

Lists the number of "deferred" (i.e. not specialized) instructions executed.

60 1.5% 60 1.5% 0.0%
hit

Specialized instructions that complete.

3,780 96.9% 3,780 96.9% 0.0%
miss

Specialized instructions that deopt.

60 1.5% 60 1.5% 0.0%

BINARY_SUBSCR

specialization stats for BINARY_SUBSCR family
Kind Base Count Base Ratio Head Count Head Ratio Change
hit

Specialized instructions that complete.

60 75.0% 60 75.0% 0.0%
Success Base Count Base Ratio Head Count Head Ratio Change
Success 20 100.0% 20 100.0% 0.0%
Failure 0 0.0% 0 0.0%

CALL

specialization stats for CALL family
Kind Base Count Base Ratio Head Count Head Ratio Change
hit

Specialized instructions that complete.

64,380 99.6% 64,380 99.6% 0.0%
Success Base Count Base Ratio Head Count Head Ratio Change
Success 260 100.0% 260 100.0% 0.0%
Failure 0 0.0% 0 0.0%

COMPARE_OP

specialization stats for COMPARE_OP family
Kind Base Count Base Ratio Head Count Head Ratio Change
hit

Specialized instructions that complete.

1,920 100.0% 1,920 100.0% 0.0%

FOR_ITER

specialization stats for FOR_ITER family
Kind Base Count Base Ratio Head Count Head Ratio Change
deferred

Lists the number of "deferred" (i.e. not specialized) instructions executed.

60 25.0% 60 25.0% 0.0%
hit

Specialized instructions that complete.

180 75.0% 180 75.0% 0.0%

LOAD_ATTR

specialization stats for LOAD_ATTR family
Kind Base Count Base Ratio Head Count Head Ratio Change
deferred

Lists the number of "deferred" (i.e. not specialized) instructions executed.

60 5.5% 60 5.5% 0.0%
hit

Specialized instructions that complete.

840 76.4% 840 76.4% 0.0%
Success Base Count Base Ratio Head Count Head Ratio Change
Success 180 90.0% 180 90.0% 0.0%
Failure 20 10.0% 20 10.0% 0.0%

LOAD_GLOBAL

specialization stats for LOAD_GLOBAL family
Kind Base Count Base Ratio Head Count Head Ratio Change
hit

Specialized instructions that complete.

660 84.6% 660 84.6% 0.0%
Success Base Count Base Ratio Head Count Head Ratio Change
Success 120 100.0% 120 100.0% 0.0%
Failure 0 0.0% 0 0.0%

STORE_SUBSCR

specialization stats for STORE_SUBSCR family
Kind Base Count Base Ratio Head Count Head Ratio Change
hit

Specialized instructions that complete.

29,970,000 100.0% 29,970,000 100.0% 0.0%

TO_BOOL

specialization stats for TO_BOOL family
Kind Base Count Base Ratio Head Count Head Ratio Change
deferred

Lists the number of "deferred" (i.e. not specialized) instructions executed.

60 37.5% 60 37.5% 0.0%
hit

Specialized instructions that complete.

60 37.5% 60 37.5% 0.0%
Success Base Count Base Ratio Head Count Head Ratio Change
Success 20 50.0% 20 50.0% 0.0%
Failure 20 50.0% 20 50.0% 0.0%
Failure kind Base Count Base Ratio Head Count Head Ratio Change
sequence 20 100.0% 20 100.0% 0.0%

UNPACK_SEQUENCE

specialization stats for UNPACK_SEQUENCE family
Kind Base Count Base Ratio Head Count Head Ratio Change
hit

Specialized instructions that complete.

60 75.0% 60 75.0% 0.0%
Success Base Count Base Ratio Head Count Head Ratio Change
Success 20 100.0% 20 100.0% 0.0%
Failure 0 0.0% 0 0.0%

Specialization effectiveness

specialization effectiveness

All entries are execution counts. Should add up to the total number of Tier 1 instructions executed.

Instructions Base Count Base Ratio Head Count Head Ratio Change
Basic

Instructions that are not and cannot be specialized, e.g. LOAD_FAST.

7,320 61.9% 7,320 61.9% 0.0%
Not specialized

Instructions that could be specialized but aren't, e.g. LOAD_ATTR, BINARY_SLICE.

900 7.6% 900 7.6% 0.0%
Specialized hits

Specialized instructions, e.g. LOAD_ATTR_MODULE that complete.

3,540 29.9% 3,540 29.9% 0.0%
Specialized misses

Specialized instructions, e.g. LOAD_ATTR_MODULE that deopt.

60 0.5% 60 0.5% 0.0%

Deferred by instruction

Breakdown of deferred (not specialized) instruction counts by family
Name Base Count Base Ratio Head Count Head Ratio Change
TO_BOOL 60 25.0% 60 25.0% 0.0%
BINARY_OP 60 25.0% 60 25.0% 0.0%
FOR_ITER 60 25.0% 60 25.0% 0.0%
LOAD_ATTR 60 25.0% 60 25.0% 0.0%
BINARY_SLICE 0 0.0% 0 0.0%
STORE_SLICE 0 0.0% 0 0.0%
CACHE 0 0.0% 0 0.0%
BINARY_SUBSCR 0 0.0% 0 0.0%
GET_ITER 0 0.0% 0 0.0%
MAKE_FUNCTION 0 0.0% 0 0.0%

Misses by instruction

Breakdown of misses (specialized deopts) instruction counts by family
Name Base Count Base Ratio Head Count Head Ratio Change
BINARY_OP_ADD_FLOAT 60 100.0% 60 100.0% 0.0%
CACHE 0 0.0% 0 0.0%
GET_ITER 0 0.0% 0 0.0%
MAKE_FUNCTION 0 0.0% 0 0.0%
NOP 0 0.0% 0 0.0%
POP_TOP 0 0.0% 0 0.0%
PUSH_NULL 0 0.0% 0 0.0%
RETURN_VALUE 0 0.0% 0 0.0%
BUILD_LIST 0 0.0% 0 0.0%
BUILD_TUPLE 0 0.0% 0 0.0%

Call stats

Inlined calls and frame stats

This shows what fraction of calls to Python functions are inlined (i.e. not having a call at the C level) and for those that are not, where the call comes from. The various categories overlap.

Also includes the count of frame objects created.

Base Count Base Ratio Head Count Head Ratio Change
Calls to PyEval_EvalDefault 60 12.5% 60 12.5% 0.0%
Calls to Python functions inlined 420 87.5% 420 87.5% 0.0%
Calls via PyEval_EvalFrame (total) 60 12.5% 60 12.5% 0.0%
Calls via PyEval_EvalFrame (vector) 60 12.5% 60 12.5% 0.0%
Calls via PyEval_EvalFrame (generator) 0 0.0% 0 0.0%
Calls via PyEval_EvalFrame (legacy) 0 0.0% 0 0.0%
Calls via PyEval_EvalFrame (function vectorcall) 60 12.5% 60 12.5% 0.0%
Calls via PyEval_EvalFrame (build class) 0 0.0% 0 0.0%
Calls via PyEval_EvalFrame (slot) 0 0.0% 0 0.0%
Calls via PyEval_EvalFrame (function ex) 60 12.5% 60 12.5% 0.0%
Calls via PyEval_EvalFrame (api) 0 0.0% 0 0.0%
Calls via PyEval_EvalFrame (method) 0 0.0% 0 0.0%
Frame objects created 0 0.0% 0 0.0%
Frames pushed 480 100.0% 480 100.0% 0.0%

Object stats

Allocations, frees and dict materializatons

Below, "allocations" means "allocations that are not from a freelist". Total allocations = "Allocations from freelist" + "Allocations".

"Inline values" is the number of values arrays inlined into objects.

The cache hit/miss numbers are for the MRO cache, split into dunder and other names.

Base Count Base Ratio Head Count Head Ratio Change
Method cache collisions 21 16 -23.8%
Method cache misses 29 24 -17.2%
Method cache hits 191 196 2.6%
Immortal increfs 30,312,429 28.3% 30,312,424 28.3% -0.0%
Immortal decrefs 57,304,742 37.9% 57,304,737 37.9% -0.0%
Mortal increfs 76,910,969 71.7% 76,910,964 71.7% -0.0%
Mortal decrefs 93,791,316 62.1% 93,791,311 62.1% -0.0%
Allocations from freelist 65,300 0.4% 65,300 0.4% 0.0%
Frees to freelist 69,380 69,380 0.0%
Allocations 16,935,880 99.6% 16,935,880 99.6% 0.0%
Allocations to 512 bytes 16,879,660 99.3% 16,879,660 99.3% 0.0%
Allocations to 4 kbytes 26,940 0.2% 26,940 0.2% 0.0%
Allocations over 4 kbytes 29,280 0.2% 29,280 0.2% 0.0%
Frees 16,942,161 16,942,161 0.0%
Inline values 0 0
Interpreter mortal increfs 3,960 0.0% 3,960 0.0% 0.0%
Interpreter mortal decrefs 5,260 0.0% 5,260 0.0% 0.0%
Interpreter immortal increfs 2,220 0.0% 2,220 0.0% 0.0%
Interpreter immortal decrefs 1,620 0.0% 1,620 0.0% 0.0%
Materialize dict (on request) 0 0
Materialize dict (new key) 0 0
Materialize dict (too big) 0 0
Materialize dict (str subclass) 0 0
Method cache dunder hits 0 0
Method cache dunder misses 0 0

GC stats

GC collections and effectiveness

Collected/visits gives some measure of efficiency.

Generation Base Collections Base Objects collected Base Object visits Head Collections Head Objects collected Head Object visits
0 0 0 0 0 0 0
1 0 0 0 0 0 0
2 3,840 160 4,500,842,200 3,840 160 4,500,842,200

Optimization (Tier 2) stats

statistics about the Tier 2 optimizer
Base Count Base Ratio Head Count Head Ratio Change
Optimization attempts

The number of times a potential trace is identified. Specifically, this occurs in the JUMP BACKWARD instruction when the counter reaches a threshold.

60 60 0.0%
Traces created

The number of traces that were successfully created.

60 100.0% 60 100.0% 0.0%
Trace stack overflow

A trace is truncated because it would require more than 5 stack frames.

0 0.0% 0 0.0%
Trace stack underflow

A potential trace is abandoned because it pops more frames than it pushes.

0 0.0% 0 0.0%
Trace too long

A trace is truncated because it is longer than the instruction buffer.

0 0.0% 0 0.0%
Trace too short

A potential trace is abandoced because it it too short.

0 0.0% 0 0.0%
Inner loop found

A trace is truncated because it has an inner loop

0 0.0% 0 0.0%
Recursive call

A trace is truncated because it has a recursive call.

0 0.0% 0 0.0%
Low confidence

A trace is abandoned because the likelihood of the jump to top being taken is too low.

0 0.0% 0 0.0%
Executors invalidated

The number of executors that were invalidated due to watched dictionary changes.

0 0.0% 0 0.0%
Traces executed

The number of traces that were executed

179,940 179,940 0.0%
Uops executed

The total number of uops (micro-operations) that were executed

391,807,440 217,743.4% 391,807,440 217,743.4% 0.0%
Base Count Base Ratio Head Count Head Ratio Change
Optimizer attempts

The number of times the trace optimizer (_Py_uop_analyze_and_optimize) was run.

60 60 0.0%
Optimizer successes

The number of traces that were successfully optimized.

60 100.0% 60 100.0% 0.0%
Optimizer no memory

The number of optimizations that failed due to no memory.

0 0.0% 0 0.0%
Remove globals builtins changed

The builtins changed during optimization

0 0.0% 0 0.0%
Remove globals incorrect keys

The keys in the globals dictionary aren't what was expected

0 0.0% 0 0.0%

Trace length histogram

trace length histogram
Range Base Count Base Ratio Head Count Head Ratio Change
<= 1 0 0.0% 0 0.0%
<= 2 0 0.0% 0 0.0%
<= 4 0 0.0% 0 0.0%
<= 8 0 0.0% 0 0.0%
<= 16 0 0.0% 0 0.0%
<= 32 0 0.0% 0 0.0%
<= 64 0 0.0% 0 0.0%
<= 128 60 100.0% 60 100.0% 0.0%

Optimized trace length histogram

optimized trace length histogram
Range Base Count Base Ratio Head Count Head Ratio Change
<= 1 0 0.0% 0 0.0%
<= 2 0 0.0% 0 0.0%
<= 4 0 0.0% 0 0.0%
<= 8 0 0.0% 0 0.0%
<= 16 0 0.0% 0 0.0%
<= 32 0 0.0% 0 0.0%
<= 64 60 100.0% 60 100.0% 0.0%

Trace run length histogram

trace run length histogram
Range Base Count Base Ratio Head Count Head Ratio Change
<= 1 0 0.0% 0 0.0%

Uop execution stats

uop execution stats
Name Base Count Head Count Change
_SET_IP 30,159,240 30,159,240 0.0%
_CHECK_VALIDITY 30,159,240 30,159,240 0.0%
_CHECK_PERIODIC 30,099,300 30,099,300 0.0%
_GUARD_NOT_EXHAUSTED_RANGE 30,091,860 30,091,860 0.0%
_ITER_CHECK_RANGE 30,091,860 30,091,860 0.0%
_MAKE_WARM 30,091,860 30,091,860 0.0%
_ITER_NEXT_RANGE 30,031,800 30,031,800 0.0%
_LOAD_FAST_3 30,029,940 30,029,940 0.0%
_STORE_FAST_4 29,971,860 29,971,860 0.0%
_STORE_SUBSCR_LIST_INT 29,970,000 29,970,000 0.0%
_LOAD_FAST_1 29,970,000 29,970,000 0.0%
_LOAD_FAST_4 29,970,000 29,970,000 0.0%
_JUMP_TO_TOP 29,911,920 29,911,920 0.0%
_EXIT_TRACE 179,940 179,940 0.0%
_START_EXECUTOR 179,940 179,940 0.0%
_LOAD_FAST_2 121,740 121,740 0.0%
_POP_TOP 61,800 61,800 0.0%
_CHECK_FUNCTION 61,800 61,800 0.0%
_LOAD_CONST_INLINE_BORROW 61,800 61,800 0.0%
_STORE_FAST_2 61,800 61,800 0.0%
_GET_ITER 59,940 59,940 0.0%
_BUILD_LIST 59,940 59,940 0.0%
_BINARY_OP 59,940 59,940 0.0%
_CALL_BUILTIN_CLASS 59,940 59,940 0.0%
_CHECK_VALIDITY_AND_SET_IP 59,940 59,940 0.0%
_LOAD_CONST_INLINE_BORROW_WITH_NULL 59,940 59,940 0.0%
_STORE_FAST_1 59,940 59,940 0.0%
_STORE_FAST_3 59,940 59,940 0.0%
_PUSH_NULL 7,440 7,440 0.0%
_LOAD_CONST_INLINE 7,440 7,440 0.0%
_CALL_BUILTIN_FAST_WITH_KEYWORDS 3,720 3,720 0.0%
_CALL_NON_PY_GENERAL 3,720 3,720 0.0%
_CHECK_IS_NOT_PY_CALLABLE 3,720 3,720 0.0%
_LOAD_FAST_6 3,720 3,720 0.0%
_BINARY_OP_ADD_FLOAT 1,860 1,860 0.0%
_BINARY_OP_SUBTRACT_FLOAT 1,860 1,860 0.0%
_COMPARE_OP_INT 1,860 1,860 0.0%
_GUARD_BOTH_FLOAT 1,860 1,860 0.0%
_GUARD_IS_NOT_NONE_POP 1,860 1,860 0.0%
_GUARD_IS_TRUE_POP 1,860 1,860 0.0%
_GUARD_NOS_FLOAT 1,860 1,860 0.0%
_GUARD_NOS_INT 1,860 1,860 0.0%
_LOAD_FAST_5 1,860 1,860 0.0%
_STORE_FAST_5 1,860 1,860 0.0%
_STORE_FAST_6 1,860 1,860 0.0%

Pair counts

Pair counts for top 100 Non-JIT uop pairs

Pairs of specialized operations that deoptimize and are then followed by the corresponding unspecialized instruction are not counted as pairs.

Not included in comparative output.

Unsupported opcodes

unsupported opcodes

Optimizer errored out with opcode

Optimization stopped after encountering this opcode

Rare events

Counts of rare/unlikely events
Event Base Count Head Count Change
set class

Setting an object's class, obj.__class__ = ...

0 0
set bases

Setting the bases of a class, cls.__bases__ = ...

0 0
set eval frame func

Setting the PEP 523 frame eval function _PyInterpreterState_SetFrameEvalFunc()

0 0
builtin dict

Modifying the builtins, __builtins__.__dict__[var] = ...

0 0
func modification

Modifying a function, e.g. func.__defaults__ = ..., etc.

0 0
watched dict modification

A watched dict has been modified

0 0
watched globals modification

A watched globals() dict has been modified

0 0

Meta stats

Meta statistics
Base Count Head Count Change
Number of data files 20 20 0.0%

Stats gathered on: 2024-10-26