Suppose you receive a software update with a SHA-256 checksum next to the download link. You hash the file, compare it to the published value, and they match. What have you actually proven? Only that the bytes you downloaded are the bytes the checksum was computed over. If an attacker can change the file, they can also change the published checksum to match. A plain hash protects against accidental corruption, not against a deliberate adversary who controls the channel.
The property you usually want is stronger: that the message came from someone who holds a particular secret, and that not a single bit has changed since they produced it. That property is called message authentication, and the standard tool for it with symmetric keys is the keyed-hash message authentication code, or HMAC.
Integrity versus authenticity
These two words get used loosely, so it is worth pinning them down. Integrity means the data has not been altered. Authenticity means you know who it came from. A checksum gives you integrity against noise on the wire. It gives you nothing against an adversary, because the checksum function is public and unkeyed.
A message authentication code gives you both at once, but only relative to a shared key. The sender computes a tag over the message using a key both parties know. The receiver recomputes the tag with the same key and checks that it matches. Because producing a valid tag requires the key, a matching tag tells the receiver two things: the message is unchanged, and it was produced by someone holding the key.
A MAC proves the sender holds the shared key. It does not prove which of the two parties sent the message, because both hold the same key. For non-repudiation, where you need to prove a specific individual signed something, you need a digital signature with a private key. A MAC is symmetric; a signature is asymmetric.
The tempting design that does not work
The obvious way to turn a hash into a keyed function is to prepend the key to the message and hash the whole thing: tag = H(key || message). It looks fine. The key is secret, so an attacker who does not know it cannot compute the tag. Unfortunately, this construction is broken for the most common hash functions of the SHA-1 and SHA-2 families, and the reason is a property called length extension.
Hashes like SHA-256 are built on the Merkle-Damgard construction. They process a message in fixed-size blocks, and the output is simply the function's internal state after the last block. That is the flaw an attacker exploits. If you know H(key || message) and the length of the secret, you can set the hash function's internal state to the published digest and keep feeding it more data. You produce a valid tag for key || message || padding || extra without ever knowing the key.
This is not theoretical. The Flickr API was vulnerable to exactly this in 2009, allowing forged API calls against a signing scheme that prepended a secret to the request parameters. The lesson is that you cannot reason about cryptographic constructions from the outside. A scheme can look secure and fail to a structural property of the primitive underneath it.
How HMAC is actually built
HMAC, defined in RFC 2104 and analyzed by Mihir Bellare, Ran Canetti, and Hugo Krawczyk, sidesteps length extension by hashing twice with two derived keys. The structure is:
HMAC(K, m) = H( (K' XOR opad) || H( (K' XOR ipad) || m ) )
Here K' is the key adjusted to the hash's block size, ipad is the byte 0x36 repeated, and opad is 0x5c repeated. The inner hash binds the key to the message. The outer hash then hashes that result again with a different keyed prefix. Because the output of HMAC is the digest of a digest, an attacker cannot continue the computation from a published tag. The internal state they would need is the inner hash, which they never see.
The deeper result is what makes HMAC trustworthy in practice: it is provably secure as long as the underlying hash's compression function behaves as a pseudorandom function. This is why HMAC-SHA-1 remained safe for authentication well after SHA-1 collisions became practical. A collision attack lets you find two messages with the same hash; it does not let you recover the HMAC key or forge a tag without it. Construction matters as much as the primitive.
Verifying tags without leaking timing
There is a subtle trap in the verification step. The natural way to compare the received tag against the computed one is a byte-by-byte comparison that stops at the first mismatch. That early exit leaks information. An attacker measuring how long verification takes can learn how many leading bytes they guessed correctly, then forge a valid tag one byte at a time.
The fix is a constant-time comparison that always examines every byte regardless of where the first difference is. Most cryptographic libraries ship one, such as hmac.compare_digest in Python or crypto.timingSafeEqual in Node. This is the same discipline that applies across symmetric cryptography, covered in our piece on constant-time programming. A correct algorithm with a careless comparison is still exploitable.
Where HMAC shows up
Once you know the shape of it, HMAC is visible across the protocols you use daily.
| Context | What HMAC does there |
|---|---|
| TLS record layer | Older cipher suites use HMAC to authenticate each record; modern AEAD suites fold this into the cipher itself. |
| Key derivation | HKDF is built entirely on HMAC, using it to extract and expand key material from a shared secret. |
| JWT tokens | The HS256 signing algorithm is HMAC-SHA-256 over the token header and payload. |
| TOTP codes | The six-digit codes from authenticator apps are HMAC of a counter or timestamp, truncated to digits. |
| API request signing | Cloud provider request signatures authenticate the request body and headers under your secret key. |
When not to reach for it
HMAC authenticates; it does not encrypt. A message protected only by HMAC is fully readable, just tamper-evident. If you need both confidentiality and authenticity, the modern answer is an authenticated encryption (AEAD) mode like AES-GCM or ChaCha20-Poly1305, which integrates the authentication tag into the cipher and removes the chance of combining encryption and a MAC incorrectly. HMAC remains the right tool when there is nothing to hide and everything to verify, such as a download signature, a webhook payload, or a session cookie that must not be forged.
The takeaway
A hash answers "did this change?" An HMAC answers "did this change, and was it produced by someone holding the key?" The gap between those two questions is where most real attacks live, and the design of HMAC, two nested hashes with two derived keys, exists because the simpler answer to that gap turned out to be forgeable. The next time you see a SHA-256 sum sitting next to a download with no key behind it, you know exactly what it does and does not promise.