-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement optimistic get_path() in LocalFileIdManager #38
Conversation
- removes autosync_interval trait - removes sync_all() from public API
@kevin-bates I wanted you to see this since you were the original person who had brought this proposal back from the dead. Thank you again for reminding me to tackle this. This is a huge win and it would likely have been forgotten if you didn't bring it up. 😁 Feel free to leave some review comments, but I'll have to merge and cut a release now. RTC is expecting a file ID release very quickly so they can have lead time to find any issues with the current implementation. |
path, ino = row | ||
stat_info = self._stat(path) | ||
|
||
if stat_info and ino == stat_info.ino and self._check_timestamps(stat_info): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dlqqq - I think the requirement that timestamps must match (both here and below) is too strict and matching inodes should be sufficient, otherwise, a file that is modified out-of-band (e.g., touch my_notebook.ipynb
) will orphan all of its external referencing object (comments, etc.).
The code currently makes a statement that the following scenarios are the same but takes the side of scenario A (which is far less likely than scenario B):
Scenario A:
rm foo/my_notebook.ipynb
, touch foo/my_notebook.ipynb
(i.e., create same-named file) and inode happens to be preserved between the operations
Scenario B:
touch foo/my_notebook.ipynb
(i.e., modify file)
If there are a dozen comments referencing the FileID for foo/my_notebook.ipynb
, they have all been orphaned. Yes, that's the correct thing to do in scenario A, but definitely not correct in scenario B, and I think we should err on the side of (much) higher probability.
Will watchdog
help? If scenario A occurs while the server is running? yes - as it will send a delete event, at which time the row will be dropped. But we're in the same boat if the server is not running, and I still believe we should err on the side that only timestamp updates occurred due to scenario B.
That is, we should be optimistic here as well. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct in observing that an out-of-band edit would cause a new file ID to be generated in this case. This is because the initial design philosophy for file ID was to provide the strongest possible guarantees by default, even covering weird edge cases like deleting a file and then creating a new one at the same path. However, note that currently, this quirk is specific to ext*
filesystems. Windows and MacOS both support file creation timestamps, and we are currently using those.
This would be fixed by implementing crtime
(file creation time) for ext4. This still won't work for ext3 and under, but we might be OK with that. I'm surprised I hadn't made an issue for this, so here you go: #40
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right on. This is a good example of the slippery slope that I've alluded to in the past.
You're choosing this "weird edge case" (that also requires the new file to have the same inode as the old file!) over simple (everyday) modifications. Could you forgo the timestamp checks against modification time when creation time is not available and you know the inodes are equal? All references should not be orphaned because a file was modified and the user happened to be on some filesystem that doesn't track crtime
.
autosync_interval
traitsync_all()
from public API_sync_all()
isn't called in best case (no files moved out-of-band)