Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FMWK-459 Fix requiring id property from entities by AerospikeCache #752

Merged
merged 24 commits into from
Jun 20, 2024

Conversation

agrgr
Copy link

@agrgr agrgr commented Jun 10, 2024

No description provided.

@agrgr agrgr added the bug label Jun 10, 2024
@agrgr agrgr requested review from roimenashe and reugn June 10, 2024 10:35
@roimenashe
Copy link
Collaborator

roimenashe commented Jun 10, 2024

@agrgr As we talked about, I'm not sure we want to allow it - if new versions of Spring Data Aerospike doesn't allow to store/read a POJO without @id field why would the Spring Data Cache be any different?

Also according to AerospikeCache's code we are always treating the given key as an Aerospike Key so it doesn't support cache by bins or basically anything other than the Aerospike Key itself (it probably shouldn't).

I don't think this change makes much sense, we need to ask the guys who requested this change for their motivation - if we missing a validation or accidentally broke the API that's a different story and can be fixed in a different way (possibly a simple upgrade to latest) but I think we shouldn't include this change and keep validating @id when interacting with Spring Cache (otherwise providing a none PK field would still fail but with less self-explanatory error).

@agrgr
Copy link
Author

agrgr commented Jun 10, 2024

I don't think this change makes much sense, we need to ask the guys who requested this change for their motivation

Asked them on GitHub, let's freeze this PR for now and wait. We will also continue discussions in the meantime.

@roimenashe
Copy link
Collaborator

@agrgr After consideration, let's go with this approach which says Spring Data and Spring Cache are separated and we are not enforcing schema annotations in Spring Cache (in this case @id) and set the key from the given key directly. Once you change this PR to "Ready for review" I'll review and let's merge

@agrgr
Copy link
Author

agrgr commented Jun 10, 2024

Added using plain hash codes from Object, let's discuss whether to use cryptographic hash functions or external libraries etc. to approach the balance of computational complexity and collision resistance.

@agrgr agrgr marked this pull request as ready for review June 11, 2024 11:18
@roimenashe
Copy link
Collaborator

@agrgr Few questions about the new hashcode change:

  1. Do we use hashcode on every key, even simple supported keys (currently String but potentially integer and byte[] in the future) according to the code - it looks like this is the case, why? Isn’t it better to store the key as the original key (e.g. String) if it’s not a compound key (combination of multiple method params)
  2. This introduce a breaking change, if we are going with that approach I think we should consider adding this in the next major release of Spring Data Aerospike, existing keys will be unreachable and the same record will be inserted as a new key:
    1. If existing record doesn’t have ttl (0 = use server's default = no expiration) they will be kept in the database as zombies (no one will read them and no one will delete them)
    2. It will trigger a read on the main datasource which will create a spike after the upgrade)
  3. What is the main motivation of switching to hashcode? Less likely to collide? same “string” values are considered as the same record and might be ok to collide from the user’s perspective (considered that same record, meaning read from the cache if exists), if the user overrides hashcode/equals basically the same object with the same values could be considered as 2 different objects which we do not necessarily want (will be read twice from the main datasource, inserted twice to Aerospike's cache instead of a second cache read).
  4. It's also worth mentioning that:
    1. Hashcode is harder to debug (can’t see the actual key value in the database/AQL/IDE), you can’t reverse hashcode to the original value, you have to get the original value convert to hashcode and compare (when sometimes you don’t know the original value, that’s what you are looking for) -> a bad user experience
    2. Digest is always a fixed size of 20 bytes, hashcode will result in less storage (in record’s metadata) only if sendKey is true (which is the default in Spring Data but not always the case)

@agrgr
Copy link
Author

agrgr commented Jun 16, 2024

Few questions about the new hashcode change:

  1. Added using Kryo serialization (which is optimized for performance) for every key which now makes the process unified and platform independent, and faster than standard JVM serialization. It can be configured for the particular needs. The result of serialization is hashed using SHA-256, again, for consistency (e.g., for distributed cache systems when different computers write cache).
  2. Yes, it is a breaking change.
    2.1. There can be zombie records after upgrade, however, it is a one-time event, and cache likely has mechanisms of regular cleaning.
    2.2. Yes, there likely will be one spike in creating new records after upgrade.
  3. The main motivation of switching to serialization and hashcode is high consistency on different platforms, less collisions and faster speed (needs benchmarks). The format and purpose of the toString() method implementation in Java are platform-independent, however, the actual value produced by toString() can vary between different JVM instances or runs of the same program due to the nature of hashCode() and memory allocation differences.
    4.1. Not sure there is a situation where the actual value of cache key must be human-readable. Besides, with the current approach a customer can use our API to test what cache key they are going to get.

@roimenashe
Copy link
Collaborator

roimenashe commented Jun 17, 2024

@agrgr Are we sure we want to introduce lz4-java to the project? last commit was more than a year ago and the latest release 1.8.0 is from 2021, it reminds me the blockhound library where we had troubles with it but no one maintained it so eventually we stopped using it, and also it may introduce security vulnerabilities.
Is there a better - more popular and actively maintained library for fast serialization/compression?

@agrgr
Copy link
Author

agrgr commented Jun 18, 2024

Is there a better - more popular and actively maintained library for fast serialization/compression?

@roimenashe, as far as I can say, it actually is the most popular vulnerabilities-free (at least, at the moment) xxHash library for Java. Despite the latest release in 2021, it does not have notifications about stopped support, and the latest commit is a year ago. Besides, we would be able to replace this implementation with another one if needed as AerospikeCacheKeyProcessor is designed to return AerospikeCacheKey which is implementation-agnostic (can take a String or a long number).

It looks like Apache Commons has 32 bit xxHash implementation, however, it has no 64 bit, and it is not mentioned on xxHash page.

Let's discuss, theoretically we can introduce something else instead of very fast xxHash (like Murmur3), however, not sure whether the fact that the library had latest release 3 years ago is decisive.

@agrgr
Copy link
Author

agrgr commented Jun 18, 2024

Is there a better - more popular and actively maintained library for fast serialization/compression?

@roimenashe, replaced hashing with Apache's Murmur3 128 bit. The library looks up-to-date, they have a recent release.

@agrgr
Copy link
Author

agrgr commented Jun 20, 2024

Tested overriding AerospikeCacheKeyProcessor bean with a custom implementation in a demo project

AerospikeCacheConfiguration defaultCacheConfiguration) {
this(aerospikeClient, aerospikeConverter, defaultCacheConfiguration, new LinkedHashMap<>());
AerospikeCacheConfiguration defaultCacheConfiguration,
AerospikeCacheKeyProcessor cacheKeyProcessor) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding AerospikeCacheKeyProcessor to both constructors introduces a breaking change (the user needs to add this param when defining AerospikeCacheManager bean).

We probably need to update the "Caching with Spring Boot and Aerospike" blog posts:

  1. https://medium.com/aerospike-developer-blog/caching-with-spring-boot-and-aerospike-17b91267d6c
  2. https://aerospike.com/blog/caching-with-spring-boot-and-aerospike/

Comment on lines +11 to +12
public class AerospikeCacheKeyProcessorImpl implements AerospikeCacheKeyProcessor {

Copy link
Collaborator

@roimenashe roimenashe Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this class is a singleton so why not use a bean by marking this class as @Component, to execute configureKryo() on initialization we can use @PostConstruct or an alternative

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sense here is this class is an implementation of AerospikeCacheKeyProcessor. If users want to use a different implementation, they can do so by overriding AerospikeCacheKeyProcessor bean

/**
* Wrapper class used for caching with methods that receive either a String or a long number
*/
public class AerospikeCacheKey {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only String and long? if the purpose is to align with Aerospike supported PK types - "blob" (byte[]) should be supported as well: https://aerospike.com/docs/reference/limitations

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a wrapper class that receives hash of the cache key. However, we can also support byte array, added it

@agrgr agrgr merged commit 2d2af38 into main Jun 20, 2024
8 checks passed
@agrgr agrgr deleted the FMWK-459-cache-entities-without-id branch June 20, 2024 11:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AerospikeCache requires ID property on entity even though key has already been provided
3 participants