W3irdN3rd: md5 (or even crc32) is fine to rule out accidental corruption
Curiously even a raw sum; Some 8bit BASIC programs have the minimum to verify the sum adds up before running said copied binary data from BASIC to raw hardware.
Regardless, corruption is actually quite common. Little OT history; Back when the first communications were being set up (
serial and/or phone) some transfers of data would be fine, but many would have issues, bits flipped in several places. Eventually it was figured out to be noise on the line, but the pattern was fractal in nature, and difficult to predict. Ultimately it was impossible to remove.
So they started adding parity bits, you'd have a start bit, 8 bits of data, then additional parity bits to count even/odd and other tidbits for ECC, if the check failed you just requested the byte again. Finally there's a closing bit to say you're done.
There's tons of ECC built into normal communications, so a few errors are detected and corrected along the way, and only total failures need a repeat; Finally the ECC is discarded and you get the final data. So while very common to have corruption during transmission, a lot of data integrity is invisible to what we end users usually experience. Off hand i think IP4 uses CRC32 is used for basic checks.
While looking up on Error Correct Codes a bit, a paper on the Hubble Satellite they added apparently a 6bit something Solomon-Reed code; Which resulted in being able to use half the power or double the distance in effectiveness since errors during transmission would be detected and corrected. Used on common media, it means scratches on discs and even holes, can remain readable as it transparently fixes said data during reading. (
Or it should try at least)
I guess the little TLDR. Errors and corruption is quite common; But the infrastructure and overhead lowers the final product to hopefully be the correct data. Were that not the case, i think the world would be a lot more chaotic.