[Slackbuilds-users] MD5 hash sums

Thu Aug 23 10:58:50 UTC 2018

On Thu, Aug 23, 2018 at 01:45:44AM -0400, T3 slider wrote:
> On Wed, Aug 22, 2018, 11:15 PM David O'Shaughnessy <lists at osh.id.au> wrote:
> 
> > For an attacker to change the upstream source archive without
> > changing the MD5 requires a 2nd preimage attack, which as far as
> > I understand is not computationally feasible at present.  This is
> > different to a much simpler collision attack, where the attacker
> > generates two _new_ archives with new (and matching) MD5s.
> >
> 
> The download files do not necessarily have to be tar archives, and
> in some cases (generally those with multiple download files and
> therefore multiple checksums), individual files can be included for
> download.  Intentional PDF collisions have been around for ages
> (see https://www.mscs.dal.ca/~selinger/md5collision/ ), so if a
> SlackBuild includes some documentation as a download link, and the
> upstream server has since been compromised, a user could definitely
> be stuck with a malicious file even if the SlackBuild maintainer
> did everything right and verified upstream signatures.

The reason the slackbuild MD5's do not fall victim to the "pdf
collision" attack is a subtle distinction between what crytographers
define as a "collision" attack vs. a second pre-image attack.

The pdf attack you describe above is a "collision" attack.  A
collision attack is defined by cryptographers as "find any two files,
A and B, which produce the same hash" [1].  That was how the PDF
attack worked, it was just constrained to "the A and B files need to
be valid PDF files".  But, a collision attack does not allow an
attacker (MITM, etc.) to simply substitute the upstream file with a
new file that matches the same hash as the origional file.

In order for an attacker to substitute a different upstream file for
the origional file, the attacker has to find a new file that matches
the MD5 hash of the existing file.  This attack is what
cryptographers define as a "second pre-image" attack.  And it means
"given a file G, find another file C which has an identical hash"
[2].  This is the attack that a MITM ISP or a cracker breaking in to
an upstream repository has to perform to substitute a different file
that matches the MD5 in the SBO.

The reason is that the SBO files contain an MD5 sum of a known file G
(the origional).  And the SBO file is then protected by a GPG
signature.  If a user verifies that the downloaded SBO and GPG
signature are valid, then the user knows that the SBO has not been
altered.  Therefore the user knows the MD5 sum within the SBO has
also not been altered (by a MITM or other attack).

Therefore, in order for the downloaded file (tar ball, zip, pdf,
whatever) to be altered, but still match the MD5 sum in the SBO, a
second pre-image attack is requred.  This is because the valid MD5
sum in the SBO is now the sum of a single given file G, and the MITM
has to find a new file C which hashes to the same MD5 of the given
file G.

MD5 has not (yet) been shown to be vulnerable to this attack (second
pre-image) in any feasable timeframe [3][4].  Yes, a day is going to
arrive sometime in the future where it will likely fall to a second
pre-image attack, so upgrading to a stronger hash function is also a
good idea.  But the MD5's are not (yet) a security risk.

> It is probably more difficult to generate tarballs
> with collisions but I'm guessing it isn't quite as difficult as we're
> pretending it is, and it's irrelevant since unzipped files can be passed as
> download links.

Cryptographers don't care about what the stream of bytes represent. 
The attack types are defined over the set of "all possible byte
combinations of inputs to the hash function".

> Simply put, this is bad security. Anyone who disagrees doesn't understand
> the problem.

Well, anyone who confuses collision attacks (which have been shown
against MD5) with second pre-image attacks (which is what the GPG
protected MD5 hashes in the SBO force an attacker to achieve) is also
not fully up to speed on the meaning of the different attacks and the
security properties they provide and falls into the trap of
"criticis[ing] MD5 and SHA1 for the wrong reasons" [4].

> Can't we just add an optional sha256sum in the .info file and
> maintainers can gradually add these checksums to their SlackBuilds,
> targeting some point in the future that these fields will be required (and
> possibly the md5s retired)?

Upgrading to sha256 is not a bad idea, but the MD5's do not yet
present a "the sky is falling" risk given the way they are verified
via the GPG signature protecting the SBO file.

[1] https://en.wikipedia.org/wiki/Collision_attack
[2] https://en.wikipedia.org/wiki/Preimage_attack
[3] http://www.cs.cmu.edu/~perspectives/md5.html (paragraph beginning
"To understand the implications" up through the paragraph ending
"Perspectives requires only second preimage resistance of MD5")

[4] https://en.wikibooks.org/wiki/Cryptography/Breaking_Hash_Algorithms
  "The MD5 and SHA-1 hash functions, in applications that do not
   actually require collision resistance, are still considered
   adequate.

   Many people criticise MD5 and SHA1 for the wrong reasons. [4]
   There is no known practical or almost-practical preimage attack on
   MD5 or SHA-1, much less second-preimage attacks, only collision
   attacks.[5][6]