-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ETag validity on composite resource #157
Comments
Do you mean This is a very valid point. What about computing a composite ETag? cc @smyrman |
@rs Composite ETag can't be cached in the DB, so we need to recompute it on the fly, compare with the supplied ETag and return 304 if they match. I guess if we have some context ( but not Go context.Context) exported in the request, and we put some important values there to let the following middle-ware functions make decisions based on them. |
Okay, I think now I am understanding why middle-ware path is not considered here. In Go |
Yes. That’s why you can set a custom ResponseSender on Computing a checksum of all the Etags shouldn’t be too expensive. |
@rs Thanks, then custom
What do you mean by this? I thought the ETags are check-sums themselves. My thinking is computing ETag for composite and list resources will be the right way ( and is cheap, compared to network latency), and this might be done in the |
Etag is an opac string. We can combine all etags of the (sub-)resources using the hash algo of our choosing to create a composite etag. Why do you want to fix this issue outside of rest layer? It is a real problem that need to be addressed. |
Well, now that it is clear we can't make post Now, why would you want to combine all ETags, is it speed the only concern? Wouldn't be the same if you only calculate ETag (md5 hash) to the composite resource response? Combining multiple-ETags seems more work. |
Summing n string is less expensive than summing n maps with arbitrary number of fields (and subfields) stored as Functionally, I agree, both approaches are equivalent. |
Just looked at the code. Accessing the Etag during projection evaluation might be tricky actually. I think We have the same problem with |
What summing algorithm do you propose for Etags? Append all Etags and then do final md5 on the whole byte array, or XORing them all? |
I would go for md5. The xor route would be interesting but would require to either guarantee all Etags are same length or use padding. All rest-layer Etags are currently same size but it could change in the future or we may want to support pre-existing Etags with different format. |
Will the md5 hashed Etag be set in the header, while the raw MD5s are made available as One -case to be aware of:
|
One solution would be to have a composite Etag with a separator like |
I think it is best if each resource brings its own |
I’m not a fan of poluting the payload with metadata. |
Okay, but we are polluting the payload in |
What’s wrong with the composite etag described above? |
There is one particular use-case, where you get I understand that all this is optional, so maybe the best place to do this formatting is in the |
I suppose we have three or four cases for the composite e-tag:
I suppose this will let you update the root-resource without a data race, but not an embedded sub-resource, e.g. if you want to fetch all resources through one big request with embeds, and update individual resources later without a re-fetch.
Neither am I. However, as @Dragomir-Ivanov points out, the |
Btw, making the header values e-tag of format |
@rs Any thoughts on "polluting" the sub-resource payload with |
I have no better solution to propose so let’s do that :/ |
I was tinkering with the |
I suppose that means that rs/rest-layer-mongo#20 would need to change to a compatible format?
Have you actually observed this? From me just skimming the code, it looks like the order is pretty much guaranteed (unless they come in a random order from the Storer layer). Expanded references are inserted at a given map field location, so evaluation order should not matter.
Connections are inserted at a given slice index:
|
I suppose it is true that they can come from the DB at different times, but does that matter? |
Maybe I am not understanding the code enough, but for composite resource's ETag we will need to hash with MD5: root resource ETag, sub-resource A ETag, sub-resource B ETag, etc. We will need to apply the MD5 hash in the same order, for us to have the same composite-resource ETag. We will need ether store&compute this hash after all sub-resources have been obtained from storage, or use algorithm where ordering doesn't matter (XOR). |
Ok, I follow you now. I was assuming you would do the first, compute the hash after all sub-resources have been fetched. If you want to do the incremental hash while you are fetching resources, then yes, you would need to use another algorithm than MD5. |
I was thinking about the order being consistent in the final JSON response, not in terms of evaluation order, which is what you ask for -- sorry for the confusion. |
In the light of this, xor is the way to go. We can handle different sizes using padding when necessary. |
Before proposing any code, lets try to agree on a way forward with this. This is what we currently have in
So we are already polluting the item, and there is no mention we wanted Now what if, when PUT/POST an item we check for HTTP Header What about requesting Then for composite ETag, if I will stop here for now, and we can discuss |
Why a new query param and not just a projection field? |
Can I recommend that we splitting the issue in several parts to make it easier to solve? Part AI think the first problem to solve is
@rs idea to prefix with an MD5 could help resolve this regardless if sub-resources are polluted with
E.g. For the list view, we change the implementation to let the header E-tag be a MD5 sum of the entire payload (also in case of HEAD, which means we must do more work for HEAD requests). This probably requires some interface changes, like moving Part B
We create a new issue to optionally include DB |
Part A is the original issue basically... |
+1 I would put the db etag first in your example. |
Guys, I was hoping we would focus on the @rs In order to achieve @smyrman I fail to understand what |
Actually neither, it helps with better cache miss rate! If you don't have either E-tag or no-cache headers, as you have discovered with clients, you get fake browser cache hits when you have embedded items. E.g. in our case where we have a "dashboard"-like resource with multiple "widgets" as sub-resources. If we add or remove or re-order "widgets" on the dashboard, the dashboard won't correctly refresh without deleting cache in the browser because the browser think it's up to date. The composite E-tag with both the db-Etag and MD5 sum of the full payload would make the browser |
Actually I am advocating in case of Lets focus on the |
@smyrman I think that two-part ETag is unnecessary. Perhaps if you can give me some concrete example, I can find it useful. |
With the two-part ETag you keep the opportunity to use the db etag part to perform conditional writes. |
For the itemList case, you just need an MD5 sum of the full Json payload (no db E-tag, as there is no relevant one to use), and everything should just work, so that's a good place to start. The two-part is only valid for the detail view for now (when ID is given in the URL), and, as @rs mentiones, you can then do conditional writes with the same E-tag with the root resource, e.g. if I want to PATCH the "description" field of my "dashboard" only if nobody else have done so since my last read. |
@rs I am also talking about the same thing, however I fail to see why:
Keep in mind, that we already have polluted @smyrman There might be very well the case, when one extract a list of items, then wants to update one or more items from the list, without any re-fetch. So there are cases where |
Let's imagine you have the following projection {
"title": "A title",
"description": "Some description",
"author": {
"name": "John Doe"
}
} Because author is a sub-resource, you can't just give the root (db) etag, you have to add the content of author so if the author name change you get a different etag. So as defined, you would MD5 the whole response instead of trying to get individual etags, which by itself, could be your final etag. But now you want to PATCH the description in a CAS way. Normally you would use the Etag, but if it is an MD5 of a certain projection, we lost the ability to perform conditional writes. So using If we return both the root (db) etag and the MD5 of the response like Said otherwise, we don't break one existing feature to fix the caching. |
I totally agree with this, however what I am saying is this. Following the already implemented logic in the
Trying to update/PATCH the All I am proposing is to extend the current logic we have for Browser caching will not be compromised, because the HTTP ETag header will be fingerprinf of the whole HTTP body ( root + sub-resources ). Don't you think that is more flexible, as one can patch either to |
Do you think it is practical to use the etag of a projected sub-resource to update it? In your example you don't even have it's ID and you would have to guess its path. The item list use-case is already kind of stretched, but at least it's part of the same resource. I think the main use-case is to use the standardized etag to perform conditional mutations without having to pick the etag into a custom field of the payload. |
I vave haver disputed that, but this is already implemented. I am all for embedding more For this issue there is two cases:
The example from @Dragomir-Ivanov might not be complete, but if you embed the full sub-resource including the ID, then having the E-tag would be useful. Anyway I suggest we create a new issue / PR to address embeding of _etag in subresources, and to better control the inclusion of them in the payload, because I think this requires a bit more thought-process, not to mention the code changes. I am not against it, if I come across that way. Does this make sense? |
@smyrman The first post in this issue is about ETag on a composite resource, so I personally think this thread is good for discussing that matter. There is no code yet, nor PR result from this ( but I am willing to do the necessary changes when we settle on a solution ).
When supplying the I understand your concern about not braking the API usage, but if we are to brake something, better be before v1.0. In a single root item case(no sub-resources), we can have the HTTP response ETag header to be equal to the item's |
Keep in mind that in PUT/PATCH case, the user will need to set the |
Sure, but the HTTP spec says that you are supposed to use the value of the Again, I don't like having |
@rs That was my idea the whole time. We should make this Now are you talking about single resource without sub-resources or with sub-resources.
|
In the ResponseFormatter/SendFormatter this is currently an impossible call to make, as the peojection is already applied. To keep both simpler code and a consistent user experience, it might be better to always go for case 2 and use the two-part ID. |
That breaks the spec, does it not? |
Yes, maybe it breaks the spec. It will be ETag: "[root item tag]-[body md5 sum]". |
It does not break the spec and is not confusing to the user. The fact that it’s a composite etag is opaque to the user. The user still send the full etag, it is rest layer which choose to only use the first part for mutations. |
And I agree with @smyrman, we should start by just fixing the caching bug using composite etags. Let’s handle etag switch in a different issue. |
Sorry, I was only reffing to the quotes around both pars - we should have only one set of quotes. |
Lets suppose we have this request:
If I execute this in a row, first time I get 200 then 304, which is as it is supposed to be.
But then I change some field values of
services
andvisits
behind the scenes.Then retry the query above, I still get 304, although the result of the query IS CHANGED.
ETag comparison has its place on updating concrete resources, but I think we should leave generation of ETags for compound resources, and for resouseLists to be generated in a middleware function executed after
rest-layer
.The text was updated successfully, but these errors were encountered: