-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Journal READ Corruption Issue #52
Comments
Which OS and filesystem? I'd like to try to reproduce in the same environment! |
OSX 10.9.1 - Mac OS Extended (Journaled) java version "1.7.0_40" |
Just hit it again after I had to kill the java process. READ 2013-reg-313-fa2d6b37-b7d5-4d5c-b3READ 2013-reg-338-50668ff9-c152-4ddb-888d-236f4321f147 I am having to 'restart' (basically kill the jvm) the server pretty regularly during development. Could that be contributing to this? Maybe something with the BufferedWriters and the termination sequence? |
In theory the cache should be able to recover from an incomplete journal.
I suspect the problem is that we recover from the incomplete journal, but we don't delete the truncated line before starting to write new lines into the file? |
Yeah, as it stands, it looks like any time an incomplete or junk line exists, the entire cache gets destroyed. I suppose this is because a cache of unknown state is not super useful, but in my use case, im not sure how to avoid this scenario. |
I'm seeing a variation on this issue which is happening on Android. The behaviour is the same in that there is a corrupted line in the journal and that is causing the cache to be deleted on opening. In my case the line looks like this: READ 96644ef5b41c75bd081e1737bREAD e9211ead93e7e0c88e4f79b0d5af4a45 This causes the method readJournalLine() to throw when it encounters it at which point the whole cache gets deleted. It would be much more use to me to just drop the invalid line rather than deleting the whole cache (particularly if it is just a read that is corrupted), can anyone see any reason why this wouldn't work? |
Late to the game here but am seeing same issue. My android application will at some point delete my entire cache and start from scratch! This particular cache has no methods exposed to remove entries so they should always be there. I appear to be hitting this "corrupt journal" issue and my cache is getting nuked. I'm unclear on what is causing this state or how to fix it. |
I finally caught issue when occuring in debugger. What's very odd is if I pause code before cache.delete and use debugger to issue call to rebuild the journal and then restart...my cache then comes up without issues yet there's a "bad" line left in journal. It's unclear why the journal is set to rebuild itself based on directory contents...or why nuking whole cache is preferable to merely missing an entry. I added a line of code to flush cache after my writes and this isn't solving the issue. I the corruption occurred why my app was idle and I killed the task (do this regularly during debugging of course). On first seeing corrupt journal and opening file I see: After rebuild of journal (and success bringing up cache) I see following corruption still in journal (bad line moved to end of journal): CLEAN popegregoryiii 135 Would appreciate any ideas. Like others I have multiple directories (for each cache) and there may be a few thousand files in each directory. |
Looks like this is a rather old problem and it still persists today. After taking a look at the code I might have some idea on the root cause of this problem. Assuming you don't have multiple instances of the DiskLRU on the same dir (across processes / threads), there's still a chance where the journal file can get corrupted. This is because the operations on the primary journal file are not atomic. For example this line operates directly on the primary journal file.
If the process crashes or gets killed when |
I'm seeing unpredictable but consistent corruption of journal files. The point of corruption is always a READ line in the journal. I'm trying to wrap my head around how this could happen and wondering if anyone has suggestions.
I've created an object cache that wraps the DiskLruCache. Here are my put and get methods:
My caches are on the order of 3,000 objects when i start seeing the behavior. This object cache is accessed by multiple threads but it looks to me like the DiskLruCache itself is doing the necessary synchronization. I've seen the cache get wiped out several times, and I've managed to capture a cache in the state where it has an invalid entry:
READ 2013-reg-c8d10fe6-8c07-4b33-8d03-b7addf69038d
READ 2013-reg-191ff930-47e3-48b0-92f2-1da067e982d4
READ 2013-reg-b6ff7fd9-7da5-4f71-946f-e5d560060794
RREAD 2013-pst-72fe30af-b568-48cd-8582-bfbaa0e2ae6b
READ 2013-pst-5440007e-4f48-46c0-bec2-3ab4779a4c3e
READ 2013-pst-0117fbeb-a140-43ba-8dd2-4b7fc0f55211
In every case where the cache has become corrupted, the behavior is similar. Part of a READ entry is written over with another READ. This isn't at the very end of the cache state, its about 85% of the way through the cache.
The text was updated successfully, but these errors were encountered: