WIP: fs type detection for linux, dev/inode cache keys #1842

ThomasWaldmann · 2016-11-13T03:43:19Z

fstype(path) -> determine filesystem type (at least the most important ones with stable inodes)
has_stable_inodes(path) -> True/False

Use for files cache.

codecov-io · 2016-11-13T03:59:00Z

Codecov Report

❗ No coverage uploaded for pull request base (master@df8205a). Click here to learn what that means.
The diff coverage is 77.27%.

@@            Coverage Diff            @@
##             master    #1842   +/-   ##
=========================================
  Coverage          ?   82.34%           
=========================================
  Files             ?       21           
  Lines             ?     6769           
  Branches          ?     1164           
=========================================
  Hits              ?     5574           
  Misses            ?      887           
  Partials          ?      308

Impacted Files	Coverage Δ
src/borg/platform/__init__.py	`76.47% <100%> (ø)`
src/borg/platform/base.py	`70.66% <50%> (ø)`
src/borg/archive.py	`82.16% <75%> (ø)`
src/borg/cache.py	`85.31% <83.33%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update df8205a...caf45f3. Read the comment docs.

enkore · 2016-11-13T10:37:13Z

Hmmm. I'm not sure that this is a good direction, because it isn't the average platform specific code, it's highly platform specific and it's hard to test / maintain. I'd say for #909 it would be much easier to make it a flag or env var. Not only because this seems quite difficult to get right, but also because only a few cases really profit from #909 -- things contained in the same archive are already indexed through their (dev, ino) for hardlink handling, which I'd expect to be a more typical use case than the rsync/hardlink importer.

ThomasWaldmann · 2016-11-13T12:55:04Z

Well, at first I also rather disliked the slightly different APIs, header files, etc. on different platforms. But then I realized that we don't need to support all the platforms / all the filesystems. If a platform is not supported, it will just do it based on the path, as before. That's also the reason why I only whitelisted a few, popular filesystems for linux.

Also, I realized that giving cmdline options to control inode vs. path only works if all source filesystems work in the same way and not if you backup a stable-inode fs that has a unstable-inode fs mounted somewhere.

Yes, the main motivation for me to do this was the rsnapshot/rsync+hardlink importer.

There are some other motivations, though:

keys: packb((dev, ino)) is likely smaller/faster than sha256(path).
renamed dirs / differently mounted filesystems: path-based keys can't cope with that, no problem for inode-based. Guess this was the main motivation why file_known_and_unchanged is wasteful and should be converted to inodes #909 was filed back then.

enkore · 2016-11-13T13:01:05Z

Ok let's try it

ThomasWaldmann · 2016-11-13T15:07:06Z

src/borg/cache.py

 from .remote import cache_if_remote

+FS_WITH_STABLE_INODES = {'extfs', 'btrfs', 'xfs', 'zfs', }


hmm, move this to platform code?

guess it depends on the platform's implementation of a filesystem whether it has stable inodes?

Maybe make the platform API just more specific, eg. fs_inodes_stable(path)

ThomasWaldmann · 2016-11-13T15:08:02Z

src/borg/cache.py

@@ -452,25 +455,34 @@ def chunk_decref(self, id, stats):
        else:
            stats.update(-size, -csize, False)

-    def file_known_and_unchanged(self, path_hash, st, ignore_inode=False):
+    def file_cache_key(self, hash, path, st, ignore_inode):
+        if ignore_inode or fstype(path) not in FS_WITH_STABLE_INODES:


the fstype() call could be done less frequently (only at fs boundaries) and the result kept somewhere.

enkore · 2016-11-13T15:08:39Z

src/borg/cache.py

-    def file_known_and_unchanged(self, path_hash, st, ignore_inode=False):
+    def file_cache_key(self, hash, path, st, ignore_inode):
+        if ignore_inode or fstype(path) not in FS_WITH_STABLE_INODES:
+            # we don't use the path directly (but its hash) to safe memory


*save

v- dito

enkore · 2016-11-13T15:10:52Z

src/borg/cache.py

+    def file_cache_key(self, hash, path, st, ignore_inode):
+        if ignore_inode or fstype(path) not in FS_WITH_STABLE_INODES:
+            # we don't use the path directly (but its hash) to safe memory
+            cache_key = hash(safe_encode(path))


This would be incompatible with existing files caches (incompatible in the sense of rechunking / data is worthless). Since the keys are different it would also ~double the on-disk size until 10 (cache TTL) archives are created.

Also, hash() is pseudo-random for each Python invocation.

So I guess this bit is more of a placeholder for the actual hash we used before? ;)

ad 1: that's the price (only 1 time a problem per data set). size issue can be avoided by either killing the files cache or using the env var to decrease kept generations.

ad 2: hash is a param of this function, not the builtin.

ThomasWaldmann · 2016-11-14T03:19:51Z

fixed & rebased.

ThomasWaldmann · 2016-11-19T17:48:44Z

the fstype detection could be also interesting to skip some tests when some feature is not supported or known-broken on some filesystem. See e.g. the atime trouble in #1820.

ThomasWaldmann · 2016-11-20T17:48:16Z

any other feedback on current code before I rebase / resolve conflicts?

enkore · 2016-11-20T18:59:26Z

lgtm

ThomasWaldmann · 2016-11-20T20:38:23Z

man statfs -> f_fsid (and read the "below"). seems like made for what we want.

it still sucks as everybody uses a different include file, long vs 2 ints, etc. :(

e.g. btrfs has stable inode numbers on linux, but elsewhere it could maybe be implemented in a strange way that does not have stable inode numbers.

ThomasWaldmann · 2016-11-20T22:22:58Z

src/borg/platform/linux.pyx

@@ -261,6 +261,7 @@ def umount(mountpoint):
 cdef extern from "sys/statfs.h":
    struct statfs_t "statfs":
        long f_type
+        # fsid_t f_fsid


I could not get this working, did always get compiler errors here ...

ThomasWaldmann · 2016-11-20T22:23:11Z

src/borg/platform/linux.pyx

-    return MAGIC_TO_NAME.get(buf.f_type)
+    return dict(
+        fstype=MAGIC_TO_NAME.get(buf.f_type),
+        #fsid=...,


or there.

My goal was to have either a bytestring of appropriate size or an int in the fsid value.

textshell · 2017-01-28T21:24:03Z

random idea from irc: Have a external programm to query for maj:min and ship an example / contrib that uses blkid to get the info. Would only work as root. But most backups that care about these things run as root anyway.

That would allow us to just use something that really is able to look into file systems etc and is maintained by people who work with low-level file system aspects. Also it does have a lot more bits, so it should be safer.

On Linux the script could do something like this ($1 beeing the device node):
blkid -o export $1 | grep -e "^UUID" -e "^TYPE" | sha1sum

enkore · 2017-01-28T21:34:31Z

blkid probably wants a device path... not sure how easy it would be to get that from a mountpoint (in a somewhat portable manner). Or leave it to the script.

ThomasWaldmann · 2017-02-27T17:37:05Z

another issue that came up:
assume files cache has key (dev_uuid, inode_no) -> value (mtime_ns, size).
assume a specific disk has a file "foo" sitting on inode X of mtime M and size S and content Cfoo.
now we delete foo and write "bar" with content Cbar != Cfoo.
inode X gets reused, size is S and mtime is M (could happen if "foo" modify, delete and creation of "bar" happens within mtime granularity).

Then the modification check would assume foo == bar, which is not the case.

Note: similar thing could happen with current code, if the filename stays same, but content is exchanged within mtime granularity time (and also size stays same).

ThomasWaldmann · 2018-07-02T15:18:21Z

I opened #3946 to refer to this stale / blocked PR, so it can be closed.

ThomasWaldmann mentioned this pull request Nov 13, 2016

file_known_and_unchanged is wasteful and should be converted to inodes #909

Open

ThomasWaldmann changed the title ~~implement fs type detection for linux, dummy fallback~~ WIP: fs type detection for linux, dev/inode cache keys Nov 13, 2016

ThomasWaldmann force-pushed the fs-detect branch from 99ad908 to e73e00e Compare November 13, 2016 15:05

ThomasWaldmann commented Nov 13, 2016

View reviewed changes

enkore reviewed Nov 13, 2016

View reviewed changes

ThomasWaldmann force-pushed the fs-detect branch from e73e00e to 72f28b3 Compare November 14, 2016 03:19

ThomasWaldmann mentioned this pull request Nov 19, 2016

mtime cache misbehaving on nested backups #1860

Closed

ThomasWaldmann added 3 commits November 20, 2016 21:55

implement fs type detection for linux, dummy fallback

9e9e579

use (dev, inode) as files cache key

6c1ed70

determination of inode stability is platform dependent

c19c1bc

e.g. btrfs has stable inode numbers on linux, but elsewhere it could maybe be implemented in a strange way that does not have stable inode numbers.

ThomasWaldmann force-pushed the fs-detect branch from 72f28b3 to c19c1bc Compare November 20, 2016 21:05

use more generic fsinfo instead of fstype

caf45f3

ThomasWaldmann commented Nov 20, 2016

View reviewed changes

ThomasWaldmann added the help wanted label Nov 20, 2016

enkore added the stale label May 15, 2017

ThomasWaldmann mentioned this pull request Jul 2, 2018

fs type detection for linux, dev/inode cache keys #3946

Open

ThomasWaldmann closed this Jul 2, 2018

ThomasWaldmann deleted the fs-detect branch October 6, 2020 15:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: fs type detection for linux, dev/inode cache keys #1842

WIP: fs type detection for linux, dev/inode cache keys #1842

ThomasWaldmann commented Nov 13, 2016 •

edited

Loading

codecov-io commented Nov 13, 2016 •

edited

Loading

enkore commented Nov 13, 2016

ThomasWaldmann commented Nov 13, 2016

enkore commented Nov 13, 2016

ThomasWaldmann Nov 13, 2016

enkore Nov 13, 2016

ThomasWaldmann Nov 14, 2016

ThomasWaldmann Nov 13, 2016

enkore Nov 13, 2016 •

edited

Loading

ThomasWaldmann Nov 14, 2016

enkore Nov 13, 2016 •

edited

Loading

ThomasWaldmann Nov 14, 2016 •

edited

Loading

ThomasWaldmann commented Nov 14, 2016

ThomasWaldmann commented Nov 19, 2016

ThomasWaldmann commented Nov 20, 2016

enkore commented Nov 20, 2016

ThomasWaldmann commented Nov 20, 2016

ThomasWaldmann Nov 20, 2016

ThomasWaldmann Nov 20, 2016 •

edited

Loading

textshell commented Jan 28, 2017

enkore commented Jan 28, 2017 •

edited

Loading

ThomasWaldmann commented Feb 27, 2017 •

edited

Loading

ThomasWaldmann commented Jul 2, 2018

		from .remote import cache_if_remote

		FS_WITH_STABLE_INODES = {'extfs', 'btrfs', 'xfs', 'zfs', }

WIP: fs type detection for linux, dev/inode cache keys #1842

WIP: fs type detection for linux, dev/inode cache keys #1842

Conversation

ThomasWaldmann commented Nov 13, 2016 • edited Loading

codecov-io commented Nov 13, 2016 • edited Loading

Codecov Report

enkore commented Nov 13, 2016

ThomasWaldmann commented Nov 13, 2016

enkore commented Nov 13, 2016

ThomasWaldmann Nov 13, 2016

Choose a reason for hiding this comment

enkore Nov 13, 2016

Choose a reason for hiding this comment

ThomasWaldmann Nov 14, 2016

Choose a reason for hiding this comment

ThomasWaldmann Nov 13, 2016

Choose a reason for hiding this comment

enkore Nov 13, 2016 • edited Loading

Choose a reason for hiding this comment

ThomasWaldmann Nov 14, 2016

Choose a reason for hiding this comment

enkore Nov 13, 2016 • edited Loading

Choose a reason for hiding this comment

ThomasWaldmann Nov 14, 2016 • edited Loading

Choose a reason for hiding this comment

ThomasWaldmann commented Nov 14, 2016

ThomasWaldmann commented Nov 19, 2016

ThomasWaldmann commented Nov 20, 2016

enkore commented Nov 20, 2016

ThomasWaldmann commented Nov 20, 2016

ThomasWaldmann Nov 20, 2016

Choose a reason for hiding this comment

ThomasWaldmann Nov 20, 2016 • edited Loading

Choose a reason for hiding this comment

textshell commented Jan 28, 2017

enkore commented Jan 28, 2017 • edited Loading

ThomasWaldmann commented Feb 27, 2017 • edited Loading

ThomasWaldmann commented Jul 2, 2018

ThomasWaldmann commented Nov 13, 2016 •

edited

Loading

codecov-io commented Nov 13, 2016 •

edited

Loading

enkore Nov 13, 2016 •

edited

Loading

enkore Nov 13, 2016 •

edited

Loading

ThomasWaldmann Nov 14, 2016 •

edited

Loading

ThomasWaldmann Nov 20, 2016 •

edited

Loading

enkore commented Jan 28, 2017 •

edited

Loading

ThomasWaldmann commented Feb 27, 2017 •

edited

Loading