Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow embedding a CRC/MD5 checksum in filename, and check the calculated checksum against it #1113

Open
ThomDietrich opened this issue Feb 18, 2017 · 22 comments

Comments

@ThomDietrich
Copy link

I've recently switched over our image flashing recommendation to Etcher. Great piece of software.
I see a lot of issues regarding the creation of a checksum for the image. Could you please explain how this checksum is validated and how I can provide a checksum for an image file, so the user can be sure the file he/she's flashing is indeed identical to the one I've created and uploaded once. I'm sure there is a solution in place, that I didn't catch up with yet 😄

Thanks!

@jviotti
Copy link
Contributor

jviotti commented Feb 19, 2017

Hi @ThomDietrich ,

Thanks for reaching out. The way validation works is the following:

  • Etcher will calculate a CRC32 checksum of the select image while its flashed to the drive
  • Once the image has been flashed, Etcher will re-calculate a CRC32 of the drive
  • If the CRC32 checksum calculate from the drive matches the one we initially calculated from the image, then we know the flash was successful

This means that you don't have to pass a checksum to Etcher at all (it does it all under the hood). There is still a chance where the image the user downloaded is not complete (e.g: the download stopped halfway through). In that case, you can provide the right CRC32 checksum of the image on your website, an ask users to compare it with the final checksum they see after flashing the image.

We have some plans to provide an Etcher "catalog", where users will be able to browse and discover images that Etcher knows about, which will contain the real image checksum, and that will be validated as well. Check https://resin.io/blog/the-future-of-etcher/ for details!

Please re-open if you have any other questions!

@ThomDietrich
Copy link
Author

ThomDietrich commented Feb 19, 2017

Hello @jviotti,
thank you for the detailed explanation. Matches my assumptions and that sounds good so far.

you can provide the right CRC32 checksum of the image on your website

This is the one aspect I was curious about. If this is the current situation, I need to rephrase this issue as a Feature Request (please re-open):

Would you consider comparing the initially calculated checksum with the one provided in the image filename or in a separate checksum file?

The easiest approach would be a short hash CRC32 directly in the filename: imagefile-c45ad668.img. This feature is probably a oneliner without any risk. You just need to show an additional success indication in the UI if the file hash matches any part of the filename.

A more sophisticated way to suply a checksum is via an extra file (MD5SUMS, SHA1SUMS, SHA256SUMS or SHA512SUMS, imagefile.img.sfv, imagefile.img.md5, ...). These are provided by many distributors but probably never actually downloaded by the end user, hence not as important for Etcher to check against.

Wdyt?

@alexandrosm
Copy link
Contributor

alexandrosm commented Feb 19, 2017 via email

@ThomDietrich
Copy link
Author

ThomDietrich commented Feb 19, 2017

Hey Alexandros, true that! My images own a date and a git commit reference in their filename. One can't simply try to match these and show false negatives.

If you read again, you'll see that I suggested to show an additional indicator IF the CRC was found in the filename, not the other way around. I think that would still be a valuable feedback for the end user and whatever way you look at it: "Better than Nothing".

The catalog/cloud index feature sounds great but the use cases are probably a bit different.

@alexandrosm
Copy link
Contributor

alexandrosm commented Feb 19, 2017 via email

@ThomDietrich
Copy link
Author

ThomDietrich commented Feb 19, 2017

Bingo. Exactly how I imagined it.

We could still discuss if or if not a less visible "checksum not found" text/icon/tooltip element might be useful otherwise. Would give the user the "wait, but there is" moment and promote the feature.

On a side note: From what I've seen in this and other issues, you guys are doing a great job managing this tracker ;) Thanks

@jviotti
Copy link
Contributor

jviotti commented Feb 20, 2017

Thanks a lot @ThomDietrich, I like the suggestion. Let's re-open and re-word the title.

@jviotti jviotti reopened this Feb 20, 2017
@jviotti jviotti changed the title CRC/MD5 checksum in filename or extra file Allow embedding a CRC/MD5 checksum in filename, and check the calculated checksum against it Feb 20, 2017
@jviotti jviotti added this to the Backlog milestone Feb 20, 2017
@jviotti
Copy link
Contributor

jviotti commented Feb 20, 2017

We need two things to accomplish this task:

  • Make the writer accept an external checksum for internal comparison purposes /cc @jhermsmeier

Notice that this checksum approach will not work on the case of images with bmaps.

  • Attempt to find a checksum in the file name, and if so, pass it to the writer

BTW, @jhermsmeier has been doing work on the writer to make it accept and handle many types of checksum algorithms. In order to not make this feature complex, I propose attempting to detect only a handful of well-known algorithms, like MD5 and CRC32.

@lurch
Copy link
Contributor

lurch commented Feb 20, 2017

Attempt to find a checksum in the file name, and if so, pass it to the writer

I think @ThomDietrich was actually suggesting exactly the opposite - rather than specifically looking for a checksum in the filename (e.g. by using a regex), instead get the checksum that the writer computed (luckily @jhermsmeier recently added functionality to get it to compute multiple checksum types at once), and then see if the calculated checksum was present in the filename (and display an additional confirmation if so).
i.e. if the CRC32 checksum is ab3456fd9 and the filename is myimage-99feab3456fd9a73.img then that should be a "match", which it wouldn't be if you try to identify the checksum-part of the filename first.

@lurch
Copy link
Contributor

lurch commented Feb 20, 2017

P.S. @ThomDietrich There was a proposal to embed the image checksum as separate metadata within e.g. a zip file (see #707 ) but that's been dropped in favour of storing the metadata in the online catalog mentioned earlier.

@jviotti
Copy link
Contributor

jviotti commented Feb 20, 2017

Oh, I see, my bad :) Let me rephrase the plan:

  • Use the returned calculated checksum (if validation was enabled), and check if its present on the filename. If it is, present extra confirmation information

I'm still worried about the possibility of the writer to handle multiple checksum algorithms. Maybe we should always calculate one (probably CRC32, since its less error prone), and optionally calculate other ones.

@ThomDietrich
Copy link
Author

ThomDietrich commented Feb 20, 2017

Yes that's how I intended it. CRC32 for the image filename is probably the right choice (main reason being the short hash length), for the catalog metadata thing you would probably go with SHA256/512? Does it really matter? Who knows :)

@lurch your example is actually a bit evil 😄 That's clearly not a CRC32 string and would horribly trick the discussed functionality

@lurch
Copy link
Contributor

lurch commented Feb 20, 2017

I'm still worried about the possibility of the writer to handle multiple checksum algorithms.

Just out of curiosity: in what sense? (didn't @jhermsmeier add unit-tests?)

@jhermsmeier jhermsmeier self-assigned this Mar 6, 2017
ThomDietrich added a commit to openhab/openhabian that referenced this issue Mar 17, 2017
balena-io/etcher#1113
Signed-off-by: Thomas Dietrich <Thomas.Dietrich@tu-ilmenau.de>
@ThomasKaiser
Copy link

Using Etcher 1.0.0-rc4 I flashed something called Armbian_5.27_Clearfogbase_Debian_jessie_default_4.4.63_8df8e50e.img and got this:
bildschirmfoto 2017-04-23 um 22 30 42

When/how is this CRC32 check supposed to work? :)

@jviotti
Copy link
Contributor

jviotti commented Apr 24, 2017

Hi @ThomasKaiser ,

That is a CRC32 checksum of what was written to the drive. Etcher makes use of it internally during the validation phase, so you don't have to worry about that, however you can manually validate it by calculating a CRC32 checksum out of your image (there is a crc32 command on my Mac), and comparing it with what we show there.

@ThomasKaiser
Copy link

Etcher makes use of it internally during the validation phase, so you don't have to worry about that

Well, just like @ThomDietrich we both seem to worry about this part (corrupted downloads) too. I thought Etcher would do something more in the meantime if it spotted the CRC32 value as part of the image filename?

@jviotti
Copy link
Contributor

jviotti commented Apr 24, 2017

Yeah, we have plans to implement this. Keep in mind that in v2, Etcher will feature an image catalog, and that will ensure complete downloads as well.

@ThomasKaiser
Copy link

Yeah, we have plans to implement this.

Ok, but it's not ready yet it seems? I was under the impression you already implemented something like that since I really like @ThomDietrich's idea since it's so easy to implement (both adding the CRC32 checksum to the image filename on the publisher's side and in Etcher simply comparing your internal checksum with a part of the image filename).

But to be honest those situations where users end up with corrupted downloads are rare and the majority of problems is related to broken burning processes and Etcher catches them all even if users then complain about Etcher instead of questioning their SD cards :)

Keep in mind that in v2, Etcher will feature an image catalog, and that will ensure complete downloads as well.

How get 'interested 3rd parties' like us (Armbian) into this catalog?

@lurch
Copy link
Contributor

lurch commented Apr 24, 2017

Ok, but it's not ready yet it seems?

Correct, that's why this issue is still open ;-)

How get 'interested 3rd parties' like us (Armbian) into this catalog?

Rest assured that we'll be providing more details when we're ready to start accepting entries. We're currently focusing on polishing up Etcher ready for it's 1.0 release :)

@ThomDietrich
Copy link
Author

ThomDietrich commented Dec 10, 2017

Hey guys, Hey @alexandrosm and @jviotti,
just wanted to check in and see if you did reconsider this feature at some point? Feels like it's rather small and non-invasive. Would be great to see it implemented, unrelated to your catalog idea. Here is the gist of it: #1113 (comment)

@jviotti
Copy link
Contributor

jviotti commented Dec 15, 2017

Hi @ThomDietrich , we are definitely considering it. @jhermsmeier and @Shou will be working on this very soon (keep an eye on any PR that references this ticket)

@alexandrosm
Copy link
Contributor

alexandrosm commented Aug 8, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants