Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement optimistic get_path() in LocalFileIdManager #38

Merged
merged 1 commit into from
Oct 28, 2022

Conversation

dlqqq
Copy link
Collaborator

@dlqqq dlqqq commented Oct 28, 2022

  • removes autosync_interval trait
  • removes sync_all() from public API
  • adds test to ensure _sync_all() isn't called in best case (no files moved out-of-band)
  • closes optimistic get_path() #27

- removes autosync_interval trait
- removes sync_all() from public API
@dlqqq
Copy link
Collaborator Author

dlqqq commented Oct 28, 2022

@kevin-bates I wanted you to see this since you were the original person who had brought this proposal back from the dead. Thank you again for reminding me to tackle this. This is a huge win and it would likely have been forgotten if you didn't bring it up. 😁

Feel free to leave some review comments, but I'll have to merge and cut a release now. RTC is expecting a file ID release very quickly so they can have lead time to find any issues with the current implementation.

@dlqqq dlqqq merged commit 328d893 into jupyter-server:main Oct 28, 2022
path, ino = row
stat_info = self._stat(path)

if stat_info and ino == stat_info.ino and self._check_timestamps(stat_info):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dlqqq - I think the requirement that timestamps must match (both here and below) is too strict and matching inodes should be sufficient, otherwise, a file that is modified out-of-band (e.g., touch my_notebook.ipynb) will orphan all of its external referencing object (comments, etc.).

The code currently makes a statement that the following scenarios are the same but takes the side of scenario A (which is far less likely than scenario B):

Scenario A:
rm foo/my_notebook.ipynb, touch foo/my_notebook.ipynb (i.e., create same-named file) and inode happens to be preserved between the operations

Scenario B:
touch foo/my_notebook.ipynb (i.e., modify file)

If there are a dozen comments referencing the FileID for foo/my_notebook.ipynb, they have all been orphaned. Yes, that's the correct thing to do in scenario A, but definitely not correct in scenario B, and I think we should err on the side of (much) higher probability.

Will watchdog help? If scenario A occurs while the server is running? yes - as it will send a delete event, at which time the row will be dropped. But we're in the same boat if the server is not running, and I still believe we should err on the side that only timestamp updates occurred due to scenario B.

That is, we should be optimistic here as well. 😄

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct in observing that an out-of-band edit would cause a new file ID to be generated in this case. This is because the initial design philosophy for file ID was to provide the strongest possible guarantees by default, even covering weird edge cases like deleting a file and then creating a new one at the same path. However, note that currently, this quirk is specific to ext* filesystems. Windows and MacOS both support file creation timestamps, and we are currently using those.

This would be fixed by implementing crtime (file creation time) for ext4. This still won't work for ext3 and under, but we might be OK with that. I'm surprised I hadn't made an issue for this, so here you go: #40

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right on. This is a good example of the slippery slope that I've alluded to in the past.

You're choosing this "weird edge case" (that also requires the new file to have the same inode as the old file!) over simple (everyday) modifications. Could you forgo the timestamp checks against modification time when creation time is not available and you know the inodes are equal? All references should not be orphaned because a file was modified and the user happened to be on some filesystem that doesn't track crtime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

optimistic get_path()
2 participants