Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sync): Let the clients set layers UUID #2233

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open

Conversation

almet
Copy link
Member

@almet almet commented Oct 22, 2024

This make it possible to synchronize datalayers before their creation on the server, allowing at the same time to solve issues related to them not being saved (e.g. duplication of geometries)

This does the following:

  • Introduce a new datalayer._future_uuid property, used before the server actually saves the datalayer ;
  • Add this information to the metadata used for syncing ;
  • Change the default route for datalayer creation to contain an UUID ;
  • Use this UUID as the source of truth when saving the datalayer on the server.

default=uuid.uuid4, primary_key=True, serialize=False, unique=True
),
),
]
Copy link
Member Author

@almet almet Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually unsure this migration is needed, but this is what I got after updating the model. I'm curious about your feedback on this ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess because we've set it to editable ? But I'm not sure yet why we needed this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we needed to make it editable? Or why the migration is required?

On the former: My understanding is that we need to make it editable, otherwise it's not possible (to my knowledge) to have it provided by the caller on creation. I'll actually try another time to do it without editable=True, as it seems better indeed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I thought editable was mean to allow or not to pass it through the form, but I may be wrong, so you'll tell us the truth soon :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That link says:

If False, the field will not be displayed in the admin or any other ModelForm. They are also skipped during model validation. Default is True.

(My goal was to actually not skip the validation)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we let the DB validate it ? As it will, at the end, no ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's safer to check the inputs before passing them to the DB 💇

@almet almet marked this pull request as ready for review October 23, 2024 14:44
@@ -61,6 +61,13 @@ export class DataLayer {
this._isDirty = false
this._isDeleted = false
this.setUmapId(data.id)

if (!this.umap_id) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we reuse createdOnServer here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now can, this is actually a leftover from a previous iteration, where I couldn't actually rely on createdOnServer. Let's change this!

@@ -36,7 +36,7 @@ const LAYER_MAP = LAYER_TYPES.reduce((acc, klass) => {
}, {})

export class DataLayer {
constructor(map, data) {
constructor(map, data, sync, future_uuid) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get why sync is needed here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, at this time, I did not understand the need of future_uuid, can you elaborate a bit on why we can't just have an id/uuid ?

Copy link
Member Author

@almet almet Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not understand the need of future_uuid / why we can't just have an id/uuid ?

We need a way to distinguish between three different scenarii here:

  1. Creation happened locally, but is not yet synced on the server ;
  2. Creation is coming from a remote peer ;
  3. Creation is coming from an already saved map, with existing layers.

future_uuid is meant to solve the first two, when a layer is created locally on the map, but the server isn't yet aware of it. In this case, we need to decide on the uuid locally, but we need a way to know that the server doesn't know yet about it, so we can ask it to save. That's the purpose of this field.

In an earlier iteration, I tried to track the state of the server by using the version info provided by the server, but I believe this design is better because it's more explicit that the uuid isn't yet saved. (Also, the other way of doing it didn't cut it because we weren't able to differentiate between a future UUID and an already existing one)

don't get why sync is needed here.

Ah :-) sync is a flag to tell you if you need to replicate the creation on other peers via the sync protocol. It is not used at the moment, but it is passed by map.createDataLayer(). I can remove it!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove it!

Thanks! I'd prefer the sync attribute to stay outside data classes as much as possible

  1. Creation is coming from a remote peer

Isn't actually what the sync attribute is for ? Or otherwise I don't get why we need to distinguish this case from the others (what do we need to do/prevent for this case ?).

I'm in favor of the goal of this PR, which is to move the uuid creation from the backend to the frontend. Doing so, I'm expecting that the front client that creates a new datalayer is the source of truth, and both other clients and server should follow.

And even for the other cases, it's not clear to me why we need to distinguish them. For me, always doing a PUT to the server would be good (as we are the source of truth when we save now), and then when creating a datalayer with data coming from the server, I'd expect that the server only act as a storage, so as soon as the datalayer is created on the frontend, it behave like any datalayer.

I feel like I'm missing something obvious, but I can't find it ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what I did in this PR is one way to do it. It's not perfect, but it works without adding complexity to the overall design.

Isn't actually what the sync attribute is for ? Or otherwise I don't get why we need to distinguish this case from the others (what do we need to do/prevent for this case ?).

Sort of, but sync is missing part of the info I need: the fact that it's not saved yet on the server. This is not provided by the sync property, as it's just saying that it's coming from another peer. It doesn't say if that peer has saved the data yet (or not).

I think that _future_uuid is an elegant way to solve this: it stores the uuid so it's possible to pass it to the server on save, and and the same time it's telling us that this datalayer hasn't been saved yet on the server (info that was provided before by looking at this.umap_id !== undefined.

And even for the other cases, it's not clear to me why we need to distinguish them. For me, always doing a PUT to the server would be good (as we are the source of truth when we save now), and then when creating a datalayer with data coming from the server, I'd expect that the server only act as a storage, so as soon as the datalayer is created on the frontend, it behave like any datalayer.

I agree that it would be the ultimate goal to merge the two /create/ and /update endpoints.

I will have another look at this tomorrow, to see if it's possible to merge the two DataLayerCreate and DataLayerUpdate classes together. The reason I didn't do it yet, is because it feels to be a bigger step.

With that being said, having these two endpoints merged in one might not remove the requirement of the future_uuid field, as we will still need to communicate about the datalayer (with other peers) before it's saved to the server, and convey the fact that it's not saved yet.

it's not clear to me why we need to distinguish them.

Why do we need to know that it's not saved yet? For the same reasons we check if umap_id is defined or not: because if it's not saved we don't make a request to the server asking for it. Instead, we create it.

I'd expect that the server only act as a storage, so as soon as the datalayer is created on the frontend, it behave like any datalayer.

I agree, and I believe that's what's happening right now in this PR 🙌

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I still don't get the _future_uuid pattern :(

But I'm certainly missing the problem we are trying to solve.

For what I understand, I'd:

  • make any peer always PUT (when dirty, of course)
  • on reset/delete, always send a DELETE to the server, and don't care if it's a 404 (or rely on reference_version if we want to be explicit, but I'm not sure it's worth it)

I'm also tempted to not sync to other peers the datalayer unless it's persisted (if client A creates a datalayer without saving it, then sync, then client B adds data in this datalayer without saving, then client A cancel the datalayer creation, what happen to client B data ?).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also tempted to not sync to other peers the datalayer unless it's persisted (if client A creates a datalayer without saving it, then sync, then client B adds data in this datalayer without saving, then client A cancel the datalayer creation, what happen to client B data ?).

The delete of the datalayer will also be synced, so we're good here 👍

Copy link
Member Author

@almet almet Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I'm certainly missing the problem we are trying to solve.

The problem is: we want the ability to sync layers creation and property update before server-save.

It's not at odds to what you're proposing here:

For what I understand, I'd:

  • make any peer always PUT (when dirty, of course)
  • on reset/delete, always send a DELETE to the server, and don't care if it's a 404 (or rely on reference_version if we want to be explicit, but I'm not sure it's worth it)

As I said yesterday I want to have another look at how to merge the two views together, to always do a PUT.

The only bit that stays in the way is the _future_uuid field, as merging the two views won't make the _future_uuid concept go away.

The current code looks at dl.umap_id == undefined in a few places and uses this information to act differently pre-save. I believe these cases to be here for a reason, and _future_uuid is just a way to not get in the way of it. We could do this with another field like not_saved_on_server or similar, but that will stay conceptually the same.

set _reference_version(version){
console.debug("set _reference_version", version)
if (version !== undefined) {
this.__reference_version = version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason we use double underscore here ?

Copy link
Member Author

@almet almet Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a need to react to the change of the _reference_version, to be able to unset the _future_uuid property.

Specifically, this is useful when receiving a (potentially new) reference version from another peer (this is a signal that a save has been triggered, and as such everybody needs to know that the old "future uuid" is now the "real uuid".).

That's why I'm using a setter here. the double underscore is to avoid infinite recursions!

map_id: this.map.options.umap_id,
pk: this.umap_id,
pk: this.umap_id || this._future_uuid,
created: this.createdOnServer,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of passing created, I'd better make here a condition and ask for datalayer_update or datalayer_create depending on it

@@ -563,4 +563,9 @@ export const SCHEMA = {
type: Object,
impacts: ['data'],
},

_reference_version: {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my understanding, this property does not stand in the json metadata, and I understand the schema as a (simple to be refined) representation of it.

Copy link
Member Author

@almet almet Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we now check that the properties are part of the schema before applying them in the updaters, the replication of the _reference_version field wasn't, well… replicated 🙄 .

As a solution, I added it to the schema. My way of thinking about the schema isn't only a representation of the JSON metadata, it's a way to tell how the application should react when a property is updated on the map / datalayer.

As such, some properties might not belong to the JSON files, but still need to trigger changes. I believe we'll need to set properties specific to the sync.

I see different ways to solve the same problem:

  • Let them in the schema here, and mark them with a specific thing, to tell they are sync-related.
  • Create another schema, with these specific fields
  • My favorite: special-case these in the updaters. After-all, it's part of the sync protocol, so it could be done there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we may actually put the reference_version in the json instead of in the headers ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not ! As this might cause some other problems down the line, is it okay to ship this as-is, and iterate on the subject in a later PR?

@@ -1,7 +1,7 @@
import * as Utils from '../utils.js'
import { HybridLogicalClock } from './hlc.js'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment unrelated to this PR, but this "biome noise" is a bit of a pity, while the main scope of biome in my understanding is to remove all formatting noise from PR. How is it that this passed the CI ? @davidbgk any idea ?

@@ -31,8 +31,8 @@ class BaseUpdater {
}
}

getDataLayerFromID(layerId) {
if (layerId) return this.map.getDataLayerByUmapId(layerId)
getDataLayerFromID(layerId, future_uuid) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we totally replace the layerId by the uuid ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the layerId is an UUID already. Here, we have potentially two different UUIDs. These are from calls in different times:

  • A layerId is provided when the updated property of the layer is from a layer that's already saved on the server
  • A futureUUID is provided when we're dealing with yet-to-be-saved layers.

I believe we need a way to tell them apart, and we also won't look for them the same way (see getDataLayerByUmapId, which is looking using dl._future_uuid if it's provided, otherwise dl.umap_id)

upsert({ value, metadata }) {
console.log('upsert', value, metadata)
// Upsert only happens when a new datalayer is created.
this.map.createDataLayer(value, false, metadata.future_uuid)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I would expect the uuid to be in the value, not separate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could also be in the value, but since it's not yet saved on the server I believe it's better to keep it this way. After all, metadata is that: a way to identify the data. Once it will be saved, then it will be part of the data (and also be sent in the metadata)

@almet
Copy link
Member Author

almet commented Oct 24, 2024

Just so things are clear in terms of who does what: @yohanboniface will be taking over the work on this PR, and merge it once it's ready.

Just noting a few tasks that still need to be done:

  • Check if the editable=True is required on the model uuid field
  • Try to merge Django views for datalayer create and update
  • Remove console calls
  • Add a functional test checking if layers are synchronised (e.g if points belong to the proper layer on all peers)
  • Squash "fixup" commits together (with git rebase -i master)

almet and others added 9 commits October 24, 2024 12:39
This make it possible to synchronize datalayers before their creation on
the server, allowing at the same time to solve issues related to them
not being saved (e.g. duplication of geometries)
Co-authored-by: Alexis Métaireau <[email protected]>
Otherwise, the test is sometimes flaky and fails.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants