Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spec: HEAD blob response should include content-length header #482

Conversation

waynr
Copy link
Contributor

@waynr waynr commented Oct 14, 2023

My distribution implementation recently began passing all conformance tests, so I began testing my registry by pushing images using docker. My registry implementation happens to strictly adhere to the OCI image spec in terms of required fields, including on the OCI descriptor.

So when docker tried pushing image manifests without the size field in the config descriptor, my registry returned a MANIFEST_INVALID error in its response.

I submitted a couple issues to moby:

And after digging in for a while, I figured out what was going on.

Just before pushing a manifest docker issues a HEAD PUT /v2/<repository>/blobs/<digest> request for the manifest config blob and seems to use the content-length header in the response to that request to set the config descriptor size field.

If there is no content-length header in the response, the field presumably gets set to 0. In combination with the omitempty tag on the Descriptor type's size field and the fact that whatever json marshalling library is being used considers 0 to be empty/unset, when docker serializes the Descriptor struct it omits the size field entirely.

As weird as it is for me to say this given how much time I wasted on this, I think the omitempty behavior is right for the Descriptor type, otherwise my registry would have unknowingly accepted manifests with incorrectly 0-valued descriptor size fields. It wouldn't help registries implemented in weaker-typed languages to remove the omitempty because those languages lack the ability to effectively express the difference between optional and required values anyway (not impossible, just harder to do than in a strongly-typed language).

So because there is a major distribution client with this behavior I am proposing that we update the spec to suggest that it's a good idea to set the content-length field. I might even go as far as to say it should be a MUST rather than a SHOULD, but I suspect folks might balk at that.

@sudo-bmitch
Copy link
Contributor

sudo-bmitch commented Oct 14, 2023

Just before pushing a manifest docker issues a HEAD PUT /v2/<repository>/blobs/<digest> request for the manifest config blob and seems to use the content-length header in the response to that request to set the config descriptor size field.

I think this should be treated as a bug in Docker since it knows the digest and size of the blob it just pushed. They shouldn't need to depend on the registry for this value.

If there is no content-length header in the response, the field presumably gets set to 0. In combination with the omitempty tag on the Descriptor type's size field and the fact that whatever json marshalling library is being used considers 0 to be empty/unset, when docker serializes the Descriptor struct it omits the size field entirely.

I think that's also a bug in Docker, since we do not specify size as omitempty, but it may be allowed for the Docker media types.

As weird as it is for me to say this given how much time I wasted on this, I think the omitempty behavior is right for the Descriptor type, otherwise my registry would have unknowingly accepted manifests with incorrectly 0-valued descriptor size fields. It wouldn't help registries implemented in weaker-typed languages to remove the omitempty because those languages lack the ability to effectively express the difference between optional and required values anyway (not impossible, just harder to do than in a strongly-typed language).

It's been in the image-spec as a required field so I don't think omitempty would be right for this. But as mentioned above, we don't have any say on the Docker media types, they can do this in their own descriptor.

So because there is a major distribution client with this behavior I am proposing that we update the spec to suggest that it's a good idea to set the content-length field. I might even go as far as to say it should be a MUST rather than a SHOULD, but I suspect folks might balk at that.

This looks like a good change to me. I suspect it's the current behavior in most, if not all well know registries (I did a spot check on distribution, zot, and quay). I think we also want it on the GET request, but there the HTTP RFCs may cover us, and clients have the content to check the actual length of the body. Checking for how this was handled for HEAD requests, it looks like the RFCs specifically call out that implementations may exclude that header:

@waynr
Copy link
Contributor Author

waynr commented Oct 15, 2023

It's been in the image-spec as a required field so I don't think omitempty would be right for this. But as mentioned above,

My argument was for anyone who is building a descriptor, omitempty causing the field to be omitted is a better signal to the consumer of that descriptor that something is wrong than a size of 0. A 0 size descriptor is always going to be wrong, isn't it? If we do allow a 0 size then that puts a burden on all consumers of descriptors to validate the manifest in more tedious ways than simply relying on a non-zero value being correct. Though maybe it doesn't really matter. My registry doesn't really have any use for this field since it keeps track of the number of bytes consumed as they are forwarded through to the storage backend and stores that value in the blobs table. But it definitely would have accepted incorrect manifests if the size field had just been incorrectly serialized with a value of 0 rather than omitted.

The fundamental problem here is that golang just doesn't have a strong enough type system to cleanly express the difference between a default/zero value for a type and that type being entirely optional (my recollection of kubernetes api libraries was that they use references to numeric types to express set vs unset values, but that's gross in its own right).

we don't have any say on the Docker media types, they can do this in their own descriptor.

Also, for what it's worth I think Docker is using this type at least in some places rather than defining their own.

https://datatracker.ietf.org/doc/html/rfc7231#section-4.3.2

The HEAD method is identical to GET except that the server MUST NOT
send a message body in the response (i.e., the response terminates at
the end of the header section).

I thought this was probably invalid/bad when I was looking at it yesterday morning: https://github.com/distribution/distribution/blob/c78d6f99ae5505e4328a323221ad012eacdf0754/registry/handlers/blob.go#L35-L36

🙃

@waynr
Copy link
Contributor Author

waynr commented Oct 15, 2023

https://datatracker.ietf.org/doc/html/rfc7231#section-4.3.2

The HEAD method is identical to GET except that the server MUST NOT
send a message body in the response (i.e., the response terminates at
the end of the header section).
I thought this was probably invalid/bad when I was looking at it yesterday morning: https://github.com/distribution/distribution/blob/c78d6f99ae5505e4328a323221ad012eacdf0754/registry/handlers/blob.go#L35-L36

After writing this I thought to myself there might be some hidden/implicit handling of content body going on behind the scenes and sure enough, there is:

https://cs.opensource.google/go/go/+/refs/tags/go1.21.3:src/net/http/fs.go;l=352

...so I stand corrected.

@tianon
Copy link
Member

tianon commented Oct 15, 2023

A 0 size descriptor is always going to be wrong, isn't it? If we do allow a 0 size then that puts a burden on all consumers of descriptors to validate the manifest in more tedious ways than simply relying on a non-zero value being correct.

https://explore.ggcr.dev/?image=tianon/scratch:zerobytes 👀

(It's technically valid and has a well known digest, but not a MUST in the spec so not totally supported across all registries, and has been the subject of many previous heated debates in the OCI.)

@waynr
Copy link
Contributor Author

waynr commented Oct 15, 2023

https://explore.ggcr.dev/?image=tianon/scratch:zerobytes 👀

Well shoot, I stand corrected again :).

@thaJeztah
Copy link
Member

I was wondering to what extent the spec should describe Content-Length at all, as it's already part of the HTTP specification, and as such would either be repeating that specification (or potentially be incomplete or diverge from it). A quick search showed me that there's already parts of the spec that describe this field though, so perhaps that ship has sailed (maybe those parts should be evaluated though). (All that said; perhaps we should look at adding a section about the scope of the spec, and refer to the relevant HTTP RFCs , to at least acknowledge that where the OCI spec doesn't define details, users must refer to the HTTP specification).

Specifying that this header "SHOULD" be included on its own probably don't fix much, a that doesn't make it required, so clients wouldn't be able to depend on its being there (and still require alternatives).

@sudo-bmitch
Copy link
Contributor

All that said; perhaps we should look at adding a section about the scope of the spec, and refer to the relevant HTTP RFCs , to at least acknowledge that where the OCI spec doesn't define details, users must refer to the HTTP specification.

I'd be a strong proponent of that, and something I'm considering for a significant redesign of the spec (after a 1.1 release).

Specifying that this header "SHOULD" be included on its own probably don't fix much, a that doesn't make it required, so clients wouldn't be able to depend on its being there (and still require alternatives).

Clients shouldn't depend on it, but since the HTTP RFC's specifically call out this field as something that MAY be excluded, this felt like a hint to implementers to say it SHOULD be included.

@waynr
Copy link
Contributor Author

waynr commented Oct 19, 2023

I was wondering to what extent the spec should describe Content-Length at all

To the extent that in order for registries to be compatible with a major client implementation they really do need to include it.

Specifying that this header "SHOULD" be included on its own probably don't fix much,

The alternative to including this in the spec is that the next person to work on a registry implementation has to go through the same pain as I went through last week very tediously inferring the actual behavior of the client (and the client code itself is too disorganized, spread across multiple repos, to really understand what it's doing).

a that doesn't make it required, so clients wouldn't be able to depend on its being there (and still require alternatives).

We could specify that clients SHOULD NOT depend on it being there and clarify that the reason for registries to implement it is that there are popular clients in the wild that do depend on it (even if we fix those clients moving forward, older versions will continue to persist pretty far into the future).

@sudo-bmitch sudo-bmitch merged commit f218235 into opencontainers:main Nov 1, 2023
5 checks passed
@jdolitsky jdolitsky mentioned this pull request Nov 7, 2023
8 tasks
@jdolitsky jdolitsky mentioned this pull request Jan 11, 2024
8 tasks
@sudo-bmitch sudo-bmitch mentioned this pull request Feb 1, 2024
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants