New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle & track block write / verification failures #1074
Comments
Keeping track of a percentage should be pretty straight forward, as we only need to count blocks written over blocks failed (basically adding to counters, which should be trivial with the "new" block-write-streams) It shouldn't even be too hard to keep the blocks that failed to be written, as long as it's not a high percentage failing (or we just keep the block addresses, then even that wouldn't be much of a problem until we hit really high percentages on large drives). |
Maybe keep a precise count up to a certain limit, and if that limit is surpassed, just keep and report a percentage? (e.g: 25% failed). If a lot of blocks failed, detailed information is not very useful anymore. |
Yeah, exactly, that is the challenge of this feature. |
Oh, sorry – I somehow missed that. Well, if we want to track which blocks didn't check out while verifying, we will need to use entirely different hashing mechanisms like Merkle trees (used in Bittorrent, IPFS, and other filesystems) or rolling hashes, such as rabin fingerprinting (used by LBFS, the dat project, and probably a better choice for what we're thinking about here). CRC32 (which is prone to collisions), MD5, and the SHA-family (or similar) won't do us much good in this case. I'm starting to think it could actually make sense to drop the full disk CRC / MD5 / SHA / etc checksumming entirely, and only verify the source image with those (i.e. if a file of the same basename, but a Then we could calculate rabin fingerprints of the block-stream while writing, and verify the flashed device with those afterwards – that would give us the ability to determine which blocks exactly were corrupted, compute a percentage, etc, etc. Following that, we'd basically have some more options:
|
But presumably any rolling-hash or fingerprinting scheme would have to be a tradeoff between the blocksize used and the memory used to store the whole result for a potentially multi-gigabyte disk image, which might have been streamed from the internet? Pinging @petrosagg as he might want to join in the continuation of the conversation from #735 |
I think the memory requirements should be low enough to just keep the hashes in memory – except for the smallest block sizes (which would be terribly inefficient to write anyways):
|
So we need to support MD5 and other common checksum algorithms for the downloading phase in case of images we know about that are hosted in the cloud. When the user attempts to stream an image with extended information, we calculate the checksum they tell us as part of the downloading phase and compare it with what they've told us once the download completes. In the mid-time, Another way to go would be what Tizen already provides. Their XML file contains checksums (sha1 or sha256 usually) for every block range. So keeping that in mind we can calculate the checksum of X amount of blocks and store it as we go, and calculate back and compare (thus no need of rolling hashes). Of course this means that we're not doing per-block checksums (otherwise I guess it'd be wasteful, although I'd like to see some numbers), so we can't be that precise in our results. As far as I remember the block size we use in |
Indeed, I just added an issue regarding that to balena-io-modules/blockmap#6
Yup, I was only suggesting dropping them for the verification step, when reading back from the flashed device, but still verifying the source with them, if that makes sense?
I think we could still do per block rolling hashes when using bmaps (additional to checking the bmap region checksums), as some mapped regions might be quite large, and only having to rewrite a few blocks is probably plenty faster than having to rewrite an entire mapped region. |
I've been talking nonsense here, I realised;
Looking at the above table, we can just as well hash every block with an MD5 or SHA or whatever floats our boat and keep it in memory for the block sizes we use. Don't know why my mind was going to those complicated places with this before. |
New issue to track suggestions made in #735 (comment) to keep & expose data on blocks which failed to be written during the flashing of an image.
The text was updated successfully, but these errors were encountered: