Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

document CLVM back references #324

Merged
merged 1 commit into from
Sep 24, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 59 additions & 0 deletions docs/clvm.md
Original file line number Diff line number Diff line change
Expand Up @@ -249,6 +249,11 @@ The number of skipped bits is also the number of total bytes the size is encoded

The number of size bytes includes the first.


:::note

It is possible, although discouraged, to encode the length of the atom in more bytes than necessary to fit the number. i.e. have unnecessary leading zeroes. This is similar to [UTF-8 overlong encoding](https://en.wikipedia.org/wiki/UTF-8#Overlong_encodings). It is not safe to compare CLVM programs in serialized form, since identical programs may compare not equal. To compare programs, use tree hash.

:::

### Cons Pairs
Expand All @@ -259,6 +264,60 @@ For example, `(1 . 2)` would be represented as `0xFF0102`. Once you read `0xFF`,

Lists are typically chains of cons pairs that end in a nil terminator.

### Back references

As of the hard fork at block height 5 496 000, CLVM serialization was extended with *back references*. This feature allows to refer back to previous CLVM structure, that should be duplicated in the deserialized output. This feature is also sometimes referred to as CLVM compression.

The compression comes from being able to collapse repeated structures. It only needs to be included once, and then referred back to every time it is repeated. This is especially helpful in a block generator where the same puzzle reveal may be included multiple times, for coins secured by the same puzzle. The curried parameters are not repeated, but the underlying puzzle is.

A back reference is introduced by a `0xFE` byte. This byte is followed by an atom that's interpreted as a *path*. The path points into a tree of previously parsed expressions (environment). The lookup works the same as into the CLVM execution [Environnment](#Environment).

CLVM trees are parsed bottom-up, left to right. As each atom is parsed, it is prepended to the environment. As each pair is parsed, it pops the top two values of the environment, forms a pair that is then prepended to the environment. Each back-reference performs a path lookup into the environment and prepends the resulting sub tree to the environment.

For example, the following buffer is a serialization of `("foobar" . ("foobar" . NIL))`, `ff86666f6f626172fe01`. It is parsed in the order described in the tree below:

```
[3]
/ \
1 2 (backref)
```

The environment is looks like this in each step:

1. parse atom "foobar"
```
[]
/ \
/ \
"foobar" NIL
```

2. parse back reference `01`
```
[]
/ \
/ \
/ \
/ \
/ \
[] []
/ \ / \
/ \ / \
"foobar" NIL "foobar" NIL
```

3. parse pair. pop top 2 items and form a pair
```
[]
/ \
/ \
/ \
"foobar" []
/ \
/ \
"foobar" NIL
```

## Programs as Parameters

CLVM does not have operators for defining and calling functions. However, it does allow programs to be passed into the environment, as well as executing a value as a program with a new environment.
Expand Down
Loading