HKDF: Turning One Secret Into Many, Correctly

A common task in applied cryptography looks deceptively simple: "I have a shared secret. I need two keys from it — one for encryption, one for authentication." The wrong way to solve this is to hash the secret and slice it in half. The right way is HKDF, and the reason it exists tells you something important about why amateur cryptography breaks.

HKDF is the HMAC-based Key Derivation Function, specified in RFC 5869 by Hugo Krawczyk in 2010. It's the key derivation function used in TLS 1.3, in the Signal Protocol, in Noise, in IKEv2, and in roughly every modern protocol designed after about 2012. If you do anything with shared secrets in a cryptographic context, you almost certainly want HKDF.

The function does one specific job: take some input keying material (which may have varying levels of entropy and structure), plus optional context information, and produce one or more independent-looking output keys of any requested length. Despite that being a narrow problem, the design choices in HKDF matter a lot.

The Problem It Solves

Suppose you've just completed a Diffie-Hellman key exchange. You have a shared secret — let's call it Z — that's the same on both ends. Z is 32 bytes long. Now you need:

A 32-byte AES key for encrypting messages
A 32-byte HMAC key for authenticating messages
A 16-byte IV seed for some specific cipher mode
Possibly later: more keys for new sessions, key rotation, etc.

What you don't want to do is use Z directly as your AES key. Why? Because Z is the output of a Diffie-Hellman operation, and DH outputs are not uniformly random. The values live in a specific algebraic structure, and while they're computationally indistinguishable from random for adversaries who can't break DH, they may have statistical biases that real cryptographic operations rely on not existing.

You also don't want to derive your two 32-byte keys by splitting SHA256(Z) in half. That's the kind of thing that looks fine and is brittle in subtle ways — for instance, knowing one half doesn't directly reveal the other, but the construction has no formal security argument and breaks if the hash isn't modeled as a random oracle.

The actual goal

Given an input that may have any amount of usable entropy (concentrated or spread out), produce arbitrary numbers of independent-looking keys whose security is reducible to the input's underlying entropy. The reduction needs to be tight, and it needs to hold across many output keys.

Extract-Then-Expand

HKDF achieves this in two phases. The separation is the conceptual core of the design.

Phase 1: Extract

The Extract step compresses the input keying material into a fixed-size, uniformly-random-looking value called a pseudorandom key (PRK). It uses HMAC for this:

PRK = HMAC-Hash(salt, IKM)

where IKM is the input keying material (your DH output Z, for example) and salt is a non-secret value that helps remove structural biases. The salt is optional in HKDF; if you don't supply one, an all-zeros string of hash-length is used.

The cryptographic argument for Extract is what makes HKDF rigorous. It assumes HMAC behaves like a "computational extractor" — given an input with sufficient entropy, the output is computationally indistinguishable from a uniform random string of hash-output length. This is a stronger and better-studied assumption than treating SHA-256 as a random oracle.

Phase 2: Expand

Once you have a uniform-looking PRK, the Expand step generates output keys of any length you need:

T(0) = empty string
T(1) = HMAC-Hash(PRK, T(0) | info | 0x01)
T(2) = HMAC-Hash(PRK, T(1) | info | 0x02)
T(3) = HMAC-Hash(PRK, T(2) | info | 0x03)
...
OKM  = T(1) | T(2) | T(3) | ... truncated to L bytes

where info is an optional context string that binds the output to a specific purpose. The output length L can be up to 255 times the hash output length (8,160 bytes for SHA-256).

The info parameter is how you derive multiple independent keys from the same shared secret. By using different info strings — for example, "encrypt" for the AES key and "auth" for the HMAC key — you get outputs that are computationally independent. An attacker who somehow learns one derived key gains no information about keys derived under different info strings.

A Concrete Example

Using HKDF-SHA256 in Python with the cryptography library:

from cryptography.hazmat.primitives import hashes
from cryptography.hazmat.primitives.kdf.hkdf import HKDF

shared_secret = b"..."  # 32 bytes from DH
salt = b"specific-session-salt"

encrypt_key = HKDF(
    algorithm=hashes.SHA256(),
    length=32,
    salt=salt,
    info=b"haven-session-v1-encrypt",
).derive(shared_secret)

auth_key = HKDF(
    algorithm=hashes.SHA256(),
    length=32,
    salt=salt,
    info=b"haven-session-v1-auth",
).derive(shared_secret)

Two HKDF calls with different info strings produce two independent-looking keys. Note that this is conceptually equivalent to one Extract followed by two Expands; libraries often expose the combined API for convenience.

What HKDF Is Not

HKDF is frequently confused with password-based key derivation functions like PBKDF2, scrypt, and Argon2. They look superficially similar but solve different problems.

Function	Designed For	Slow on Purpose?
HKDF	Deriving keys from cryptographic input (DH outputs, hash outputs, other uniform-ish material)	No — fast by design
PBKDF2	Deriving keys from passwords (low-entropy human input)	Yes — iteration count slows attacks
scrypt	Same as PBKDF2, additionally memory-hard	Yes — slow + memory-expensive
Argon2	Modern password KDF — winner of 2015 Password Hashing Competition	Yes — tunable time and memory cost

The mistake to avoid: using HKDF to derive keys from passwords. HKDF is fast — which means an attacker who steals your stored "key derivation" output can brute-force the password at full speed. For passwords, you need a slow KDF (Argon2 or scrypt). For cryptographic key material that already has high entropy, you want a fast KDF (HKDF), because slowing it down provides no security benefit and adds latency to every operation.

It's also legitimate to chain them: use Argon2 to convert a password into a high-entropy key, then use HKDF to derive multiple sub-keys from that result.

Common Pitfalls

Forgetting the info parameter

The most common HKDF mistake is calling it with an empty info string and deriving multiple keys by changing only the output length or the salt. This works but couples your security to operational discipline you may not have. Use distinct, structured info strings — something like "protocol-name v1 purpose" — and the keys are guaranteed independent.

Salt confusion

Salt in HKDF is not the same as salt in password hashing. In password hashing, the salt is critical for breaking precomputation attacks. In HKDF, the salt is for entropy extraction — it helps when the input keying material has structural biases. A constant salt is fine if your IKM is already high-entropy; a random per-session salt is appropriate if it isn't.

Confusing IKM and the PRK

Some APIs let you call Expand directly, skipping Extract. This is correct only if your input is already a uniformly-random key (e.g., the output of a previous HKDF, or a value from a CSPRNG). It is wrong if your input is a DH output or other structured cryptographic material — in that case, you need Extract first.

Using the wrong hash

HKDF can use any HMAC-compatible hash. SHA-256 is the most common choice. SHA-384 and SHA-512 are appropriate if you need longer output. SHA-1 still works mathematically but signals that the rest of the system is also outdated.

Where HKDF Sits in Real Protocols

In TLS 1.3, HKDF replaces the ad-hoc key derivation of TLS 1.2 with a clean, formally-analyzed structure. Every TLS 1.3 session derives a hierarchy of secrets via HKDF: master secrets, traffic secrets, exporter secrets, and so on. Each is derived with a specific info label that documents its purpose.

In the Signal Protocol's Double Ratchet, HKDF is used at every step: deriving new chain keys, deriving new message keys, deriving root keys from DH outputs. The protocol's forward secrecy properties depend on HKDF producing genuinely independent keys at each ratchet step.

In the MLS protocol, HKDF underpins the entire tree-based key schedule. The cryptographic safety arguments for MLS group operations route through HKDF's properties.

The general lesson: any time you find yourself thinking "I have a secret, I want to turn it into other secrets," that's a key derivation function. There is exactly one correct answer to that problem for high-entropy input, and HKDF is its name.

Where Haven Fits

Haven's session-key derivation uses HKDF-SHA256 throughout. The MLS protocol layer uses HKDF as part of the IETF specification; our auth-credential derivation chains Argon2 (for the password → high-entropy step) with HKDF (for splitting the result into the multiple keys we need per session). The info labels are explicit and versioned, so changes to the derivation logic produce different keys and don't silently break existing sessions.

Related reading: forward secrecy describes the protocol property that HKDF makes possible at each ratchet step.