-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document how to find/delete duplicate archived files #12
Comments
The file logged as successfully transferred to tape by dCache is the last one, so that's the one that should be kept. |
The most common way for duplicates to be detected will likely be logging from tsmtapehints.pl, since it extracts the full file list it was trivial to do duplicate detection at the same time. |
For clarity, the procedure is, assuming old files are present written by endit with a static description:
|
One reason for duplicates can be dcache being shut down while tsmarchiver.pl is running dsmc to archive files. Files that are successfully archived while the ENDIT dcache plugin is not running will not be marked as successfully migrated to tape, and dCache will retry the operation. |
Given a more modern version of endit daemons all files are written with a description that is the time when that particular dsmc write session was started. This can be used to uniquely identify a single duplicate, and thus avoiding the tedious manual procedure previously described. An automated procedure can look like this:
|
The most common cause for duplicates today are pools being restarted while doing pool-to-pool migrations, ie moving tape data between instances. This causes the pool to re-transfer the files on disk, causing duplicates. After such a migration we recommend running tsmtapehints to generate a fresh hint file, and check the output/log if any duplicates are found. |
Warn if we find files with size mismatch on retrieve errors. There are some corner cases where this helps pinpoint the reason for repeated retries etc. The most common case is duplicate files, as described in #12
Older ENDIT versions seems to have been prone to archive files multiple times when certain error conditions were triggered.
This should be due to old bugs, but we need to document how to detect if this has happened and most important how to cleanup afterwards.
Procedure is something like:
The text was updated successfully, but these errors were encountered: