Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to disallow repeated keys in CBOR #2681

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

timmc
Copy link
Contributor

@timmc timmc commented May 20, 2024

Fixes #2662 by adding a visitKey method to CompositeDecoder; map and set serializers should call this so that decoders have an opportunity to throw an error when a duplicate key is detected. Calls to this method are added to:

  • MapLikeSerializer.readElement to handle maps
  • CborReader.decodeElementIndex to handle data classes

A new config option Cbor.forbidDuplicateKeys can be set to true to enable this new behavior. This can form the basis of a Strict Mode in the future.

* duplicate keys are encountered.
*/
@ExperimentalSerializationApi
public fun visitKey(key: Any?) { }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not happy with this name but couldn't think of anything better -- suggestions very welcome. :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking duplication of keys should be more like a kind of common logics, and should be provided via a well-implemented utility class, rather than a function in top-level interface for each format to implement. IMO, most of them will be just some copy-paste.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiaozhikang0916 The challenge is that the behaviour needs to be configurable. There is no central configuration mechanism in the library, only formats and SerialDescriptors. There is also the need to do this in an ABI/API compatible way. Formats already need to do various things to play nice/support the various library features. This would just be another one. It also makes sense as various formats have their own duplication semantics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially tried plumbing the information through on the deserialization layer and it went very badly. This is only the start of what would be required: dev...timmc:timmc/all-strict (although in practice it would be a config object, not a single boolean -- this was just a failed proof of concept.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the duplication check in other formats ( if they need ), would share some common logic, like holding a set and putting keys in it. These common logic can be extracted to a utility class.


On the other hand, I am against adding this function in Decoding interface. It is implemented by each format with different behaviours, and only for internal using, which means not public, and users writing custom serializers should never call this function.

IMO, you may probably leave some instructions for format developers about how to achieve duplication check, rather than a public default funtion in this base interface, and make it open for users.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be possible to circumvent the MapSerializer if you have a map specific decoder. This decoder could intercept the decoding of the key, and then check there (before the result is returned to the serializer).

Copy link
Contributor Author

@timmc timmc Nov 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to implement whatever is needed, but I don't have a good understanding of what would be acceptable to the maintainers. It sounds like visitKey isn't well-liked, and the options might be:

  • A. Each format that wants a duplicate-detection option provides its own replacement for a MapLikeSerializer-like serializer, fully reimplementing the logic. (This would be done via a serializers module, right?)
  • B. Copy MapLikeSerializer to a new utility class NoDuplicatesMapLikeSerializer and add the duplicate detection; formats use this in their serializers module if desired.
  • C. Like B, but just change MapLikeSerializer to make readElement non-final so that it can be overridden by classes like HashMapSerializer (or copies of them) rather than copying the implementation.
  • D. Add some other extension point to CompositeDecoder that is either special-purposed for duplicate key detection or is more abstract and allows for a wide range of behaviors (including duplicate key detection).
  • E. Change the map builder to reject duplicate keys (haven't looked into feasibility here)

Which of these seems most palatable? Are there other possibilities?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One concern I have with anything that involves the serializers module is that it risks interference with application-configured serializers. Each relevant format would have to document warnings like "if you specify an alternative serializer for maps, you'll need to reimplement duplicate key detection". Having it be performed in shared code (i.e. MapLikeSerializer) would at least allow the warning to recommend subclassing that shared code.

import kotlinx.serialization.DuplicateMapKeyException
import kotlin.test.Test

class CborStrictModeTest {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Add a (passing) test for duplicate key detection when deserializing a data class. I have a test locally that fails -- I think I need to add visitKey call to some inlined code somewhere, but I haven't found the appropriate callsite yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I need to modify the serialization plugin's code generator in order to do this! Where does that code even live?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timmc The plugin code lives in the main Kotlin repository, but you don't need to go there.

I suppose you mean to detect multiple keys in object serializiation (like this Json { "a": "something", "a": "something else" }). To support that your format will need to plug in to decodeElementIndex and mark/record seen indices. Note that for some formats collections can be validly expressed as repeated key/value pairs (protobuf does this), so depending on how CBOR works you may need to do a slightly more extensive check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I had previously tried modifying decodeElementIndex and wasn't getting the results I expected. Maybe I'll take another crack at it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! I see, the decodeElementIndex in CborReader. Got it.

@timmc timmc mentioned this pull request May 20, 2024
@timmc timmc force-pushed the timmc/cbor-strict branch from 4f56296 to d5b4bba Compare November 30, 2024 04:38
Fixes Kotlin#2662 by adding
a `visitKey` method to `CompositeDecoder`; map and set serializers should
call this so that decoders have an opportunity to throw an error when a
duplicate key is detected. Calls to this method are added to:

- `MapLikeSerializer.readElement` to handle maps
- `CborReader.decodeElementIndex` to handle data classes

A new config option `forbidDuplicateKeys` can be set to true to enable this
new behavior. This can form the basis of a Strict Mode in the future.
@timmc timmc force-pushed the timmc/cbor-strict branch from d5b4bba to d14f488 Compare November 30, 2024 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants