Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure recovery #31

Open
albe opened this issue Jul 23, 2018 · 5 comments
Open

Failure recovery #31

albe opened this issue Jul 23, 2018 · 5 comments
Labels
documentation enhancement P: Index Affects the indexing layer P: Storage Affects the storage layer

Comments

@albe
Copy link
Owner

albe commented Jul 23, 2018

After a crash and potentially broken records, the storage should heal itself.

This can be achieved by following steps:

  • truncate all indexes to valid file sizes (currently throws an new Error('Index file is corrupt!'))
  • check if the partition contains an invalid document (unfinished write), if so, truncate the partition (currently throws an new Error('Can only truncate on valid document boundaries.'))
  • check if the partition contains more documents than it's index, if so, reindex the missing documents
  • check if the partition contains less documents than it's index, if so, truncate the index
@albe
Copy link
Owner Author

albe commented Sep 20, 2019

See https://cseweb.ucsd.edu/~swanson/papers/DAC2011PowerCut.pdf for a paper researching behavior of different SSD on power failure

@albe
Copy link
Owner Author

albe commented Sep 21, 2019

To effectively check for corrupted documents, a checksum is necessary (see #72), otherwise only unfinished writes could be detected. However, filesystems typically increase file size first, then write to the file, so the file size could be correct, but the contents be corrupted. This could only be detected for corruptions breaking the serialization format, but there's still a chance a document gets deserialized that was not written.
Checksum only needs to be checked at startup, to not reduce general read performance.

The checksum should contain previous document checksum in order to also be able to guarantee immutability of the whole partition.

@albe
Copy link
Owner Author

albe commented Oct 5, 2019

After thinking more about checksums, this should be solely a serializer concern and hence fully pluggable. Dictating a checksum into the document has a couple of consequences:

  • it decreases write performance, because every write needs to calculate the checksum first
  • it needs to make rules on when to verify the checksum configurable or read performance also suffers
  • it adds more complexity into the storage layer and requires additional trade-offs for the document header decision (see discussion Add checksum and other metadata to document prefix #72)
  • some serialization formats might already contain a checksum, so work would be done doubly
  • some use-cases might require stricter checksums than others (just parity byte, crc32 or shaXsum), dictating one would neglect all others
  • the choice of checksum algorithm is hardcoded into the storage format (i.e. partition format version) and changing that would be a hard b/c break
  • for JSON serialization, the common error-case (torn write) is already ruled out by the document not being able to be deserialized anymore

So checksums only play a role in the following use-cases:

  • data is transferred over a medium that may corrupt single bytes
  • a custom serialization format is used, that has less formalism than JSON allowing it to technically deserialize incomplete records (msgpack, protobuf)
  • guaranteeing the immutability of the store, by checksumming over previous documents

For all those use cases, it is good enough and relatively easy to achieve, by changing the serializer methods. Maybe the common use-cases (custom serialization format, immutability guarantees) should be shown in the documentation.

@albe albe added P: Index Affects the indexing layer P: Storage Affects the storage layer labels Oct 5, 2019
@albe
Copy link
Owner Author

albe commented Jun 27, 2020

Requires #24 in order to fix the global index in case it is broken.

@albe
Copy link
Owner Author

albe commented Feb 22, 2021

With #145 included the next steps are roughly like this:

  • truncating should not throw an exception if the truncate position is at a valid document boundary (directly following a separator) [✔️ Truncate torn writes #151]
  • on opening a storage, all partitions should be checked for torn writes, i.e. if the last document is not finished with a document separator
  • for all partitions the document sequence number of the last valid document should be returned, if there are torn writes the sequence number of that torn write should be returned as well
  • (if there were torn writes) the whole storage should be truncated after the lowest torn write sequence number* [✔️ Automatically repair torn writes on opening #155]
  • check if the primary index is up to date with the highest sequence number of all partitions, if not reindex all documents following the last indexed document (by scanning all partitions backwards for the first document with a sequence number lower or equal the last indexed document)

*another option would be to only truncate the single torn writes and keep potentially succesfully written later documents in other partitions. However, this would mean that documents go missing in between and sequence numbers have holes. Also, indexes would still potentially point to non-existing/wrong documents.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation enhancement P: Index Affects the indexing layer P: Storage Affects the storage layer
Projects
None yet
Development

No branches or pull requests

1 participant