It seems that you're using an outdated browser. Some things may not work as they should (or don't work at all).
We suggest you upgrade newer and better browser like: Chrome, Firefox, Internet Explorer or Opera

×
Hi I am using MD5 for my personal collection . I have some unique dump files . I am copying them to external harddrives for backup purpose . But before I copy them sometimes problems happening on my main machine like shut downs , freezes , etc . So basically I don't have any security concerns . My whole question about data corruption . Am I safe ? Is MD5 is enough for data corruption . Even after shut downs if MD5 hashes same , those files will be corrupt ? Thanks for the all answers .
This question / problem has been solved by Lin545image
Yes. But there is also bit-rot on hard disks and memory. Thats why you might also think about ZFS/Btrfs and ECC RAM (Xeon/Server MB or all AMD/ASUS, +Unbuffered ECC). Also do not confuse - RAID does not help, backups do.
Post edited June 04, 2016 by Lin545
avatar
Lin545: Yes. But there is also bit-rot on hard disks and memory. Thats why you might also think about ZFS/Btrfs and ECC RAM (Xeon/Server MB or all AMD/ASUS, +Unbuffered ECC). Also do not confuse - RAID does not help, backups do.
Still some of the people saying MD5 is weak . You need SHA-2(512) or SHA-3(512) :D
avatar
Skysect: Still some of the people saying MD5 is weak . You need SHA-2(512) or SHA-3(512) :D
I think those people confuse attacks against hash (hash collisions) with hash affinity. :)
avatar
Skysect: Still some of the people saying MD5 is weak . You need SHA-2(512) or SHA-3(512) :D
avatar
Lin545: I think those people confuse attacks against hash (hash collisions) with hash affinity. :)
You are probably right . Countless times I wrote I don't have security problems . Even no one have access to those files except me . But people still writing something without read .
avatar
Skysect: Still some of the people saying MD5 is weak . You need SHA-2(512) or SHA-3(512) :D
avatar
Lin545: I think those people confuse attacks against hash (hash collisions) with hash affinity. :)
But.. isn't MD5 also weaker as a checksum for data integrity vs other methods?
(aren't collisions just non-unique matches for both random or caused errors?)
Post edited June 04, 2016 by phaolo
avatar
phaolo: But.. isn't MD5 also weaker as a checksum for data integrity vs other methods?
(aren't collisions just non-unique matches? Is the term related to attacks only?)
Yes, it's weaker, but it doesn't matter in practice. The probability that a random corruption produces a file with the same hash is extremely low.
avatar
phaolo: But.. isn't MD5 also weaker as a checksum for data integrity vs other methods?
(I thought that collisions were just non-unique matches)
Its a game between increased CPU load and security. Depending upon attack angle, more primitive hashing may turn out more efficient. For example, chance that a random bit-flip produces a hash collision are unrealistic.

To make analogy, sure you can shoot a fly out of the sky with MOAB, but MOAB also increases costs, weight etc. Regular swatter just suffice for the fly. I don't imagine someone purposely forging a garbage-filled file, that accurately matches your file, to then find a way to slip it in your system just to trick the bit-rot detection... CPU time is why some anti-rot file systems even use crc32, even lazier form than md5.

Of course, if you expose your file system outside, things change. In particular, if filesystem is distributed and there is a chance to combine bit-rot detection with protection against forging, then use of SHA-2 is justifies itself. But for purely local filesystem, more complex hash may (benchmarking rocks!) introduce unwanted and unjustified lags/overhead in everyday work. Security almost always counters usability.
Post edited June 04, 2016 by Lin545
avatar
phaolo: But.. isn't MD5 also weaker as a checksum for data integrity vs other methods?
(aren't collisions just non-unique matches? Is the term related to attacks only?)
avatar
mk47at: Yes, it's weaker, but it doesn't matter in practice. The probability that a random corruption produces a file with the same hash is extremely low.
For personal files from a trusted source MD5 is just fine for verification that the files are otherwise uncorrupted, as only a few bits or bytes will go bad, enough the sum would completely change.

If you're distributing source code on the other hand and possibly binaries for say a highly used distribution or another, the hashes collision can be thwarted, however it involves adding extra bytes or making further changes until the sum equals an official one. This is where the more secure hashes are used, or I'd think you'd have multiple hashes of different sizes, which is 10x harder to fake both of them than just one of them.
avatar
rtcvb32:
I know, but that was not the question. I specifically wrote “random corruption”. Skysect has mentioned several times that he is only concerned about bitrot.
avatar
Skysect: Hi I am using MD5 for my personal collection . I have some unique dump files . I am copying them to external harddrives for backup purpose . But before I copy them sometimes problems happening on my main machine like shut downs , freezes , etc . So basically I don't have any security concerns . My whole question about data corruption . Am I safe ? Is MD5 is enough for data corruption . Even after shut downs if MD5 hashes same , those files will be corrupt ? Thanks for the all answers .
If this is just for data comparison purposes against file corruption on your private collection, then MD5 hashsum are fine.
Depending the size of your files, you may want to try archiving to multiple sources(like a 2nd hard drive, NFS file share, optical media, etc). If you also choose to backup your files to the cloud, MD5 checksums get kind of weak.
So, to note how to use MD5SUM for your needs is pretty simple. Assuming you are using md5sum :P You can also built a batch file to do the job as well.

[code]
#saves sums to file.md5
md5sum * >file.md5

#checks files noted in file.md5 against current files
md5sum -c file.md5
[/code]

The file will look something like this:

[code]
d8e2876f27b56216f1d334aac2cf910b *magicmaker/setup_magicmaker_2.11.0.15.exe
8289cdb475f2a5665db2d8f21955144e *magicmaker/setup_magicmaker_2.4.0.5.exe
a83969fa8846416bcea2ffc3945f4127 *magicmaker/setup_magicmaker_2.6.0.7.exe
[/code]
avatar
rtcvb32: So, to note how to use MD5SUM for your needs is pretty simple. Assuming you are using md5sum :P You can also built a batch file to do the job as well.

[code]
#saves sums to file.md5
md5sum * >file.md5

[/code]
You may want to add an additional > in case you plan on writing all the hashsums to one file.
>> adds data, the > will overwrite any existing data.

try this

[code]
md5sum * >>file.md5
[/code]
Post edited June 05, 2016 by morrowslant
avatar
morrowslant: You may want to add an additional > in case you plan on writing all the hashsums to one file.
>> adds data, the > will overwrite any existing data.
True. Although ensuring the lines are unique are important too. So either sort -u (if supported) or uniq to make sure you don't end up retesting the MD5 against a single file like several times.
Thanks guys . I asked same question on 5 different big websites and got most definitive answer on GOG again :D You are truly awesome .