Failure recovery #31

albe · 2018-07-23T10:07:05Z

After a crash and potentially broken records, the storage should heal itself.

This can be achieved by following steps:

truncate all indexes to valid file sizes (currently throws an new Error('Index file is corrupt!'))
check if the partition contains an invalid document (unfinished write), if so, truncate the partition (currently throws an new Error('Can only truncate on valid document boundaries.'))
check if the partition contains more documents than it's index, if so, reindex the missing documents
check if the partition contains less documents than it's index, if so, truncate the index

The text was updated successfully, but these errors were encountered:

albe · 2019-09-20T06:36:45Z

See https://cseweb.ucsd.edu/~swanson/papers/DAC2011PowerCut.pdf for a paper researching behavior of different SSD on power failure

albe · 2019-09-21T07:11:10Z

To effectively check for corrupted documents, a checksum is necessary (see #72), otherwise only unfinished writes could be detected. However, filesystems typically increase file size first, then write to the file, so the file size could be correct, but the contents be corrupted. This could only be detected for corruptions breaking the serialization format, but there's still a chance a document gets deserialized that was not written.
Checksum only needs to be checked at startup, to not reduce general read performance.

The checksum should contain previous document checksum in order to also be able to guarantee immutability of the whole partition.

albe · 2019-10-05T13:15:29Z

After thinking more about checksums, this should be solely a serializer concern and hence fully pluggable. Dictating a checksum into the document has a couple of consequences:

it decreases write performance, because every write needs to calculate the checksum first
it needs to make rules on when to verify the checksum configurable or read performance also suffers
it adds more complexity into the storage layer and requires additional trade-offs for the document header decision (see discussion Add checksum and other metadata to document prefix #72)
some serialization formats might already contain a checksum, so work would be done doubly
some use-cases might require stricter checksums than others (just parity byte, crc32 or shaXsum), dictating one would neglect all others
the choice of checksum algorithm is hardcoded into the storage format (i.e. partition format version) and changing that would be a hard b/c break
for JSON serialization, the common error-case (torn write) is already ruled out by the document not being able to be deserialized anymore

So checksums only play a role in the following use-cases:

data is transferred over a medium that may corrupt single bytes
a custom serialization format is used, that has less formalism than JSON allowing it to technically deserialize incomplete records (msgpack, protobuf)
guaranteeing the immutability of the store, by checksumming over previous documents

For all those use cases, it is good enough and relatively easy to achieve, by changing the serializer methods. Maybe the common use-cases (custom serialization format, immutability guarantees) should be shown in the documentation.

albe · 2020-06-27T13:41:24Z

Requires #24 in order to fix the global index in case it is broken.

albe · 2021-02-22T09:09:14Z

With #145 included the next steps are roughly like this:

truncating should not throw an exception if the truncate position is at a valid document boundary (directly following a separator) [✔️ Truncate torn writes #151]
on opening a storage, all partitions should be checked for torn writes, i.e. if the last document is not finished with a document separator
for all partitions the document sequence number of the last valid document should be returned, if there are torn writes the sequence number of that torn write should be returned as well
(if there were torn writes) the whole storage should be truncated after the lowest torn write sequence number* [✔️ Automatically repair torn writes on opening #155]
check if the primary index is up to date with the highest sequence number of all partitions, if not reindex all documents following the last indexed document (by scanning all partitions backwards for the first document with a sequence number lower or equal the last indexed document)

*another option would be to only truncate the single torn writes and keep potentially succesfully written later documents in other partitions. However, this would mean that documents go missing in between and sequence numbers have holes. Also, indexes would still potentially point to non-existing/wrong documents.

albe added the enhancement label Jul 23, 2018

albe added P: Index Affects the indexing layer P: Storage Affects the storage layer labels Oct 5, 2019

albe mentioned this issue Oct 5, 2019

Add checksum and other metadata to document prefix #72

Closed

albe added the documentation label Feb 27, 2020

This was referenced Jun 27, 2020

Project Status #29

Open

Make indexes more robust #121

Merged

This was referenced Jul 25, 2020

Reconsider making partitions backward-scannable #124

Closed

Allow scanning partition backwards #125

Closed

albe mentioned this issue Feb 9, 2021

Separate documents by a unique sequence and make partition backwards scannable #145

Merged

This was referenced Feb 23, 2021

Truncate torn writes #151

Merged

Automatically repair torn writes on opening #155

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failure recovery #31

Failure recovery #31

albe commented Jul 23, 2018

albe commented Sep 20, 2019

albe commented Sep 21, 2019 •

edited

Loading

albe commented Oct 5, 2019

albe commented Jun 27, 2020

albe commented Feb 22, 2021 •

edited

Loading

Failure recovery #31

Failure recovery #31

Comments

albe commented Jul 23, 2018

albe commented Sep 20, 2019

albe commented Sep 21, 2019 • edited Loading

albe commented Oct 5, 2019

albe commented Jun 27, 2020

albe commented Feb 22, 2021 • edited Loading

albe commented Sep 21, 2019 •

edited

Loading

albe commented Feb 22, 2021 •

edited

Loading