Efficient storage of code object debug information #324

markshannon · 2022-03-18T12:52:02Z

markshannon
Mar 18, 2022
Collaborator

Code objects include a lot of information, much of it only used for tracing and debugging of various sorts.
It would be good if that information used less space when we aren't using it.

Classifying the fields of a code object we have (ignoring computed attributes and int fields):

Needed for normal execution:

co_code : The bytecode
co_consts: Constants
co_names: Names of attributes and global variables.
co_exceptiontable: The exception handler table.
co_localsplusnames: Parameter names.

Debug info, all bytes objects:

co_localspluskinds: Describes the "kind" of locals, used for signatures and locals()
co_linetable: Line information
co_endlinetable: Ditto, for fancy traceback printing
co_columntable: More stuff for fancy traceback printing

Currently this information uses about 8 bytes per instruction, in large part due to having column table entries for each cache entry.

A single table

I would like to replace the four debug objects with a single bytes object with the following format:
[0] 0 if compact, 1 if expanded.
[1 - 3] len(co_localspluskinds) (16 million max local variables seems sufficient)
[4 - 4+len] co_localspluskinds
[...] Location info in compact or expanded form.

The expanded form should be an easily searchable table; one fixed-width entry per instruction.

Offset of instruction start (including preceding EXTENDED_ARGs)
Instruction length (including EXTENDED_ARGs prefix and inline cache)
Resumption point (where we currently have RESUME). 1 bit
Line start, for tracing line events. 1 bit.
Start line
Start column
End line
End column

I think we can impose some reasonable limits on the size of the fields, even in the expanded form.

E.g. We know the instruction length cannot exceed 10 or so, so we can fit it into 4 bits. It might make sense, then, to limit the start offset to 28 bits and fit both into 32 bits.

Then we can fit the whole entry into 96 bits:

Offset start and length: 32 bits
Start line (30 bits), resumption and line-start flags: 32 bits:
Start column(8 bits), End line delta (16 bits), End column (8 bits). (or 10/12/10 bits?)

This should save memory, as most tables would remain in their compact form, and it saves 3 pointers and 3 bytes object headers per code object.
It should also speed up tracing and debugging, as once expanded, the table is quickly searchable.

iritkatriel · 2022-03-18T14:08:35Z

iritkatriel
Mar 18, 2022
Maintainer

Instruction length (including EXTENDED_ARGs prefix and inline cache)

What would this be used for? Can it not be derived from the offset and the next instruction's offset?

0 replies

markshannon · 2022-03-18T14:55:00Z

markshannon
Mar 18, 2022
Collaborator Author

It could be derived, but it makes code using the table simpler if each entry is self contained.

0 replies

markshannon · 2022-03-22T15:28:10Z

markshannon
Mar 22, 2022
Collaborator Author

It looks like we will need the "Resumption point" bits for instrumentation, so it probably makes sense to implement this before instumentation, to avoid added yet another bytes object to the code object. (We're keeping RESUME so this is not necessary for instrumentation.)

Let's do this in two steps:

Create the "expanded" array first, as that is the one we will need for instrumentation.
Implement the compressed form.

0 replies

markshannon · 2022-04-04T15:50:26Z

markshannon
Apr 4, 2022
Collaborator Author

Here's a possible scheme for the compact format, taking advantage of patterns in most Python code.
It is a bit convoluted, but should be compact and reasonably fast to unpack. Packing might be slowed down a bit by looking for spans.

The data is organized into 4bit nybbles.

The first nybble describes the format. The following zero or more nybbles contain necessary data.

Code	Meaning	Following data	Length
0-5	Short, same line. `start_line == end_line == previous_line`.	1 nybble: `start_column - (16*code)`. 1 nybble: `end_column - start_column`	3
6	One line (shorter form). `start_line == end_line`	Variable length signed int: `start_line - prev_line`. 2 nybbles: `start_column`. 1 nybble: `end_column-start_column`.	5+
7	One line (longer form). `start_line == end_line`	Variable length signed int: `start_line - prev_line`. 2 nybbles: `start_column`. 2 nybbles: `end_column`.	6+
8-10	Indented. One line. `start_line == end_line == previous_line+(code-7)`	1 nybble `start_column/4`. 1 nybble: `end_column-start_column`.	3
11	Span. From start of earlier location to end of earlier location.	2 nybbles: 2 unsigned backward offsets in table of locations (1 nybble each)	3
12	Two lines	Variable length signed int: `start_line - prev_line`. Variable length unsigned int: `end_line - start_line`. 2 nybbles: `start_column`. 2 nybbles: `end_column`.	7+
13-14	Unused	---	---
15	No location info	None	1

Variable length unsigned ints are encoded by splitting into 3 bit chunks and adding 8 to all but the last.
So 7 is encoded as 7. 26 is encoded as 11, 2
Variable length signed ints are encoded by converting them to (abs(val)<<1) | (1 if val < 0 else 0) then encoding them as unsigned.

1 reply

gvanrossum Apr 4, 2022
Maintainer

@pablogsal ^^

pablogsal · 2022-04-04T18:27:13Z

pablogsal
Apr 4, 2022
Collaborator

We need to be careful about the unpacking part because calling PyFrame_GetLineNumber is already one of the most heavy parts of profilers, especially since 3.10.

The format looks convoluted but sensible. Further than that, we should get some numbers on how much this format will compress the existing format and the difference in decoding speed. I guess that since most cases will be in code 0-7 and those require little processing it may be fast.

On the other hand, one thing that will happen with this format is that to get the current line number you need to parse all other debugging information (the column numbers), which may slow down the unpacking compared to only have to deal with separate structs for line numbers and column numbers.

0 replies

markshannon · 2022-04-05T11:17:47Z

markshannon
Apr 5, 2022
Collaborator Author

@pablogsal The idea is to expand the table only once, when first needed. Scanning it repeated would be rather slow, as you point out.
The compact form can only be scanned forward, and in tandem with the bytecode.

I think the following are true:

All code objects need debug info
We only need it occasionally
If we do need the debug info we may well need it often

Therefore, it makes sense (to me) to have compact debug info for all code objects, which when first used is expanded into a form suitable for random access.

3 replies

pablogsal Apr 5, 2022
Collaborator

The idea is to expand the table only once, when first needed

How the expansion would look like?

For instance, as I mentioned many profilers just call constantly PyFrame_GetLineNumber. How are we storing the expansion to ensure that call is as fast as possible?

markshannon Apr 5, 2022
Collaborator Author

#324 (comment) under the section "A single table"

pablogsal Apr 5, 2022
Collaborator

Ah, I see. The other problem we would have is the one that we always have for out-of-process debuggers and profilers: we need to ensure that we provide some functions that can be copied into these tools to decode the compressed format from the raw bytes, as these cannot force the analysed process to unpack the table.

markshannon · 2022-04-07T18:13:51Z

markshannon
Apr 7, 2022
Collaborator Author

Although the above implementation is very compact, it is difficult to scan, and needs to be expanded, which needs extra code.
Not only that, it can only be parsed in conjunction with the bytecode array as the length of each instruction is not available from the location table alone.

I think we would be better doing something like the current line number table, which can be scanned backwards as well as forwards.

To be able to scan backwards we need an additional bit per entry as a marker. Using 1 bit out of 4 is excessive, so we are back to using bytes rather than nybbles. Given the smallest viable entry (if location info is present) is two bytes, then we have a few bits spare to add the instruction delta, making the table self contained.

Format	First byte	Subsequent bytes	Meaning
Short	1nnnnbbb	0ccceeee	Same line as previous entry. Column = `nnnn`*8 + `ccc`, end column = column + `eeee`. `n` < 10
Indent	11xxxbbb	0ccceeee	One line. Line = previous + xxx-1. Column = `ccc`*4, end column = column + `eeee`. 2 <= x < 6
Long form	11110bbb	signed varint `l`, varint `e`, unsigned varint `c` ,unsigned varint `x`	Line = previous + `l`, end_line = line + `e`, column = c, end_column = x
None	11111bbb	---	No location info

bbb+1 is the number of code units covered

The variable sized int encoding requires all bytes start with 0, and one bit is needed to signal extension, so are composed of 6 bit chunks.

0 replies

markshannon · 2022-04-08T13:29:27Z

markshannon
Apr 8, 2022
Collaborator Author

Assuming 2/3rds of instructions are handled by either the "short" or "indent" codes, and the remainder by the "long" form,
then we need about 3 bytes per instruction. Not as good as #324 (comment) at about 2 bytes per instruction, but it is less than half the current size.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Efficient storage of code object debug information #324

{{title}}

Replies: 8 comments 4 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Efficient storage of code object debug information #324

markshannon Mar 18, 2022 Collaborator

Needed for normal execution:

Debug info, all bytes objects:

A single table

Replies: 8 comments · 4 replies

iritkatriel Mar 18, 2022 Maintainer

markshannon Mar 18, 2022 Collaborator Author

markshannon Mar 22, 2022 Collaborator Author

markshannon Apr 4, 2022 Collaborator Author

gvanrossum Apr 4, 2022 Maintainer

pablogsal Apr 4, 2022 Collaborator

markshannon Apr 5, 2022 Collaborator Author

pablogsal Apr 5, 2022 Collaborator

markshannon Apr 5, 2022 Collaborator Author

pablogsal Apr 5, 2022 Collaborator

markshannon Apr 7, 2022 Collaborator Author

markshannon Apr 8, 2022 Collaborator Author

markshannon
Mar 18, 2022
Collaborator

Replies: 8 comments 4 replies

iritkatriel
Mar 18, 2022
Maintainer

markshannon
Mar 18, 2022
Collaborator Author

markshannon
Mar 22, 2022
Collaborator Author

markshannon
Apr 4, 2022
Collaborator Author

gvanrossum Apr 4, 2022
Maintainer

pablogsal
Apr 4, 2022
Collaborator

markshannon
Apr 5, 2022
Collaborator Author

pablogsal Apr 5, 2022
Collaborator

markshannon Apr 5, 2022
Collaborator Author

pablogsal Apr 5, 2022
Collaborator

markshannon
Apr 7, 2022
Collaborator Author

markshannon
Apr 8, 2022
Collaborator Author