Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RequestMetadata: Add an additional_handlers field for additional tool details #306

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ColdrickSotK
Copy link

@ColdrickSotK ColdrickSotK commented Aug 6, 2024

It is possible for more than one tool to be involved in the sending of a gRPC request, for example when a client-side proxy is used it is interesting to know the name and version of both the proxy and the original client tool.

To facilitate this in RequestMetadata, this PR adds a repeated additional_handlers field containing ToolDetails messages, allowing proxies and other related tools to also be recorded in the metadata sent with requests without modifying the original values.

Copy link

google-cla bot commented Aug 6, 2024

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Copy link
Collaborator

@EdSchouten EdSchouten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though Protobuf does support recursion, I think we should avoid it if we can, especially in this case where the recursion is linear.

Please make the existing tool details field repeated.

@ColdrickSotK
Copy link
Author

Even though Protobuf does support recursion, I think we should avoid it if we can, especially in this case where the recursion is linear.

Please make the existing tool details field repeated.

I had a memory of some discussion about potentially having a tree structure of tool details in the June monthly meeting, but just using a simple repeated field is fine for my use case.

Is a repeated field wire-compatible with a non-optional scalar field like tool_details, or do we need to deprecate the old field in favour of a repeated one?

@EdSchouten
Copy link
Collaborator

EdSchouten commented Aug 6, 2024

For Protobuf’s binary format it is 100% safe to upgrade a singular field to a repeated one. In fact, a repeated field that only has a single element is encoded exactly the same way as a singular field. When an old client unmarshals a new message that contains multiple elements, the last one is retained. See:

https://protobuf.dev/programming-guides/encoding/#last-one-wins

The only annoyance is Protobuf’s JSON format. At least the Go protojson package is too stubborn to permit parsing scalars as lists or vice versa. But especially for RequestMetadata this should never be an issue. The protocol only documents circumstances in which this message is encoded in binary form, base64 encoded, and then attached to a HTTP header. So especially for that message I’d be far less concerned. Even if gRPC-Web is used, this message in particular is never JSON encoded.

If we want to be super conservative, we could add a separate repeated field and at the same time deprecate the old one. This is also what we did for the multiple digest functions for remote execution in the capabilities response. But again, I don’t think that is worth the hassle.

@ColdrickSotK ColdrickSotK force-pushed the sotk/request-metadata/tool-details-tree branch from c11dd37 to fbf86d2 Compare August 6, 2024 16:03
Copy link
Collaborator

@EdSchouten EdSchouten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but please run the git hook script to regenerate the .pb.go files.

@ColdrickSotK ColdrickSotK force-pushed the sotk/request-metadata/tool-details-tree branch from fbf86d2 to 2de9d44 Compare August 7, 2024 10:52
@ColdrickSotK ColdrickSotK changed the title RequestMetadata: Make ToolDetails a recursive structure RequestMetadata: Make tool_details repeated Aug 7, 2024
@EdSchouten
Copy link
Collaborator

EdSchouten commented Aug 7, 2024

I think it looks good to go. But let’s give it a final round of discussion during the monthly gathering.

Copy link
Contributor

@sluongng sluongng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a breaking change.

Instead of changing the existing field, could you please add a newer field and mark the old one as deprecated?

To clarify, any client/server implementations would need to be backward compatible because their counterpart could be using multiple different versions of remote apis. For example: our RBE server supports a wide range of Bazel versions: from 5.0 to the latest 8.x on HEAD. So we cannot have tool_details being a singular struct in some requests and an array in others.

@EdSchouten
Copy link
Collaborator

So we cannot have tool_details being a singular struct in some requests and an array in others.

Please see the discussion we had above, which specifically discusses the compatibility aspects of Protobuf when upgrading singular fields to repeated fields.

@sluongng
Copy link
Contributor

sluongng commented Aug 7, 2024

Right. But the generated Go code would still flip around between a struct and a slice and thus, is a breaking change for the Go module.

@EdSchouten
Copy link
Collaborator

Right. But the generated Go code would still flip around between a struct and a slice and thus, is a breaking change for the Go module.

Which is completely acceptable, because that's what go mod is supposed to cover. Existing code will continue to build. If people do a go get -u, then they can fix those compilation issues at the same time.

@bergsieker
Copy link
Collaborator

In fact, a repeated field that only has a single element is encoded exactly the same way as a singular field. When an old client unmarshals a new message that contains multiple elements, the last one is retained. See:

https://protobuf.dev/programming-guides/encoding/#last-one-wins

As I read that link, the "last one wins" behavior is only true for primitive types--strings and numbers. For nested messages, as is the case here, the subfields will actually be merged. Merging will almost certainly break whatever metrics/dashboards people have built around this data.

@bergsieker
Copy link
Collaborator

Should we specify any concept of ordering in this repeated field? For example, if we're talking about Bazel plus one proxy, I'd probably want Bazel's information to be in position 0 as the "originator" of the request. I'm guessing that the easiest implementation will just have successive layers of handlers append to the end, but it may be useful to codify that in the API itself.

Combined with the issues of merging fields for legacy clients, I'm wondering if we should keep the existing scalar field as-is, and add a "repeated ToolDetails additional_handlers" that can be populated with anything that proxied/modified the request along the way.

@ColdrickSotK
Copy link
Author

Combined with the issues of merging fields for legacy clients, I'm wondering if we should keep the existing scalar field as-is, and add a "repeated ToolDetails additional_handlers" that can be populated with anything that proxied/modified the request along the way.

It sounds like this might be the most frictionless way forward with this change, I'll update the MR.

@ColdrickSotK ColdrickSotK force-pushed the sotk/request-metadata/tool-details-tree branch 2 times, most recently from 0b190b7 to c47c044 Compare October 3, 2024 10:55
@ColdrickSotK ColdrickSotK changed the title RequestMetadata: Make tool_details repeated RequestMetadata: Add an additional_handlers field for additional tool details Oct 3, 2024
@@ -2141,4 +2141,7 @@ message RequestMetadata {
// There is no expectation that this value will have any particular structure,
// or equality across invocations, though some client tools may offer these guarantees.
string configuration_id = 7;

// The details of other tools involved in handling the request, eg. client-side proxies.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please clarify whether or not this field should duplicate the information in the tool_details field above, and whether there's any ordering requirement (e.g., the order of additional_handlers implies what order the handlers actually executed in).

My personal opinion is that additional_handlers should NOT duplicate tool_details, and that there is no ordering requirement.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your opinion on this, updated the comment to make this clear.

@@ -2141,4 +2141,7 @@ message RequestMetadata {
// There is no expectation that this value will have any particular structure,
// or equality across invocations, though some client tools may offer these guarantees.
string configuration_id = 7;

// The details of other tools involved in handling the request, eg. client-side proxies.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please clarify whether tool_details is the handler closest to the user or closest to the server. For instance, if the request originates with Bazel and passes through proxy1 and proxy2 before reaching the server, should tool_details be Bazel or proxy2?

My opinion is that it should be Bazel as the originator of the request.

@EdSchouten
Copy link
Collaborator

As I read that link, the "last one wins" behavior is only true for primitive types--strings and numbers. For nested messages, as is the case here, the subfields will actually be merged. Merging will almost certainly break whatever metrics/dashboards people have built around this data.

What's wrong with merging in this specific case? ToolDetails contains two simple string fields. In this case both of them will simply be overwritten.

@bergsieker
Copy link
Collaborator

As I read that link, the "last one wins" behavior is only true for primitive types--strings and numbers. For nested messages, as is the case here, the subfields will actually be merged. Merging will almost certainly break whatever metrics/dashboards people have built around this data.

What's wrong with merging in this specific case? ToolDetails contains two simple string fields. In this case both of them will simply be overwritten.

Ah, you are right. I had misread that documentation as "the string fields will be concatenated." On a second reading that's clearly not the case.

I think there's still some awkwardness with ordering--IIRC there's no guarantee of the serialization order, so servers expecting a singular field could technically end up with either entry 0 or entry 1. It would also be possible to end up with a tool_name from one tool and a tool_version from the other.

@EdSchouten
Copy link
Collaborator

EdSchouten commented Oct 3, 2024

I think there's still some awkwardness with ordering--IIRC there's no guarantee of the serialization order, so servers expecting a singular field could technically end up with either entry 0 or entry 1.

I don't think that's the case. It's literally "the last one wins". It is impossible to serialize Protobuf messages in such a way that list entries are stored in a different order in the byte stream as they were listed in the original array. So a legacy server will have tool_details.${field} be equal to a client's tool_details[n].${field}, where n is the highest value where the resulting string is non-empty.

It would also be possible to end up with a tool_name from one tool and a tool_version from the other.

Yes, but this only happens if the last entry/entries leave one of tool or tool_version unset. Because then it will use the one from the entry before, or before that one, or [...]. Not ideal, but arguably also not the end of the world. In my opinion it's a better tradeoff than needing to hold on to historical baggage.

It is possible for more than one tool to be involved in the sending of a
gRPC request, for example when a client-side proxy is used it is
interesting to know the name and version of both the proxy and the
original client tool.

To facilitate this in RequestMetadata, this commit adds a new repeated
`additional_handlers` field to the message which can contain details of
other tools involved in the handling of a request.
@ColdrickSotK ColdrickSotK force-pushed the sotk/request-metadata/tool-details-tree branch from c47c044 to 43408cc Compare October 4, 2024 12:44
@ColdrickSotK
Copy link
Author

It would also be possible to end up with a tool_name from one tool and a tool_version from the other.

Yes, but this only happens if the last entry/entries leave one of tool or tool_version unset. Because then it will use the one from the entry before, or before that one, or [...]. Not ideal, but arguably also not the end of the world. In my opinion it's a better tradeoff than needing to hold on to historical baggage.

It's not the end of the world, but it seems like the kind of bug that makes another bug harder to investigate. And this merging behaviour means that when clients using the repeated field are talking to a server using the old proto, the most useful information in the chain (ie. the original invoker) gets lost. Granted I doubt that many use-cases will ever end up in that usage pattern.

I like the idea of having a separate field for additional related tools; it keeps this as strictly an additional feature, without impacting existing workflows. It also seems to map well onto the intended usage to me, the tool_details field specifies details of the most relevant tool to the actual request, with the new field being additional supplementary metadata to track exactly where the request has been.


// The details of other tools involved in handling the request, eg. client-side proxies.
// This field MUST NOT include the details of the tool which initially invoked the gRPC
// request (ie. it MUST NOT duplicate `RequestMetadata.tool_details`). The order of
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "i.e., it MUST NOT" (note punctuation of "i.e.,")

// client -> proxy1 -> proxy2 -> server
//
// `RequestMetadata.tool_details` SHOULD contain a ToolDetails message about `client`,
// and `RequestMetadata.additional_handlers` MAY contain two ToolDetails messages,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that both tool_details and additional_handlers probably both SHOULD (not MAY) contain the required information. Technically all of this is optional, in an optional header, but within that scope I don't see a good rationale for making one SHOULD and the other MAY.

Copy link
Collaborator

@bergsieker bergsieker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small comments and then I'm good with this, pending approval from other reviewers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants