Encryption answers one question: can the adversary read the contents? For data at rest in someone else's hands, that is necessary but far from sufficient. There is a second channel that encryption leaves wide open, and it is the sequence of locations you access. Cryptographers call it the access pattern, and it leaks far more than intuition suggests.
Why the Access Pattern Is a Leak
Imagine an encrypted medical database hosted by a cloud provider. Every row is ciphertext. But the provider sees which row your query reads. If a particular row is fetched every time a clinic processes an oncology referral, the provider learns which row corresponds to oncology patients without decrypting anything. Repeated access, co-occurrence of accesses, the timing of reads against external events, all of it accumulates into inference.
This is not hypothetical. A line of research on searchable encryption has produced a series of leakage-abuse attacks, where a server that only observes access patterns and query frequencies recovers a substantial fraction of the supposedly hidden queries and even plaintext. The lesson the field absorbed is blunt: hiding contents while exposing access patterns is often not enough, and for some workloads it is barely better than no protection at all.
The adversary is the storage itself, or anyone watching it: a cloud provider, a compromised operating system observing memory, a court order served on the host. They cannot decrypt your data. ORAM assumes they will instead study the trail of which encrypted blocks you read and write, and it sets out to make that trail tell them nothing.
The Definition That Sounds Impossible
Oblivious RAM, first formalized by Oded Goldreich and Rafail Ostrovsky in work spanning the late 1980s and 1990s, sets an aggressive goal. The physical sequence of accesses the server observes must be indistinguishable regardless of what the program is actually doing. Two completely different logical access patterns, the same number of operations apart, must produce server-visible traces that are statistically identical.
Put differently: whether you read the same record a thousand times or a thousand different records once, the server must not be able to tell the two cases apart. The only thing it may learn is the total number of accesses.
Achieving this requires two things on every single access: you must touch more locations than you actually need, and you must shuffle data around so that the same logical item lives somewhere new each time you fetch it. There is no way to read one item and reveal nothing without doing extra work.
From Naive to Practical
The simplest scheme that works is also absurd: to read one item obliviously, scan the entire dataset, decrypting and re-encrypting everything as you go. The server sees a full sweep every time and learns nothing about which item you wanted. For a million records, that is a million accesses to retrieve one. Correct, and useless.
Decades of work went into bringing the overhead down from linear to logarithmic. The hierarchical constructions of Goldreich and Ostrovsky reduced it dramatically. The landmark for practitioners arrived in 2013 with Path ORAM, by Emil Stefanov and colleagues, which is simple enough to implement and efficient enough to deploy.
How Path ORAM Works
Path ORAM stores encrypted data blocks in the nodes of a binary tree held on the server. The client keeps two small pieces of state locally: a position map that records which leaf each block is currently assigned to, and a small stash of blocks held temporarily in memory. The invariant is that every block lives somewhere on the path from the root to its assigned leaf.
To access a block, the client does the following. It looks up the block's current leaf, then reads every node along the entire root-to-leaf path, the whole path, not just the node it wants. The block it needs is somewhere on that path. It takes the block, then immediately reassigns it to a fresh random leaf and updates the position map. Finally it writes the path back, pushing blocks as far down toward their leaves as it can and keeping any that do not fit in the stash.
The server sees a read of a full path and a write of a full path. Because the block was just remapped to a new random leaf, the next access to the same block reads a different, unpredictable path. The server cannot link two accesses to the same item, and it cannot tell a repeated access from a fresh one. The bandwidth cost is on the order of the logarithm of the database size per access, which for realistic sizes is a few hundred block transfers rather than a million.
The Cost Is Not Optional
It is tempting to hope that a cleverer scheme could drive the overhead to nearly nothing. It cannot, and this is a proven result, not an engineering limitation. The original Goldreich-Ostrovsky bound established that oblivious access must pay a logarithmic penalty, and a 2018 result by Kasper Green Larsen and Jesper Buus Nielsen confirmed that logarithmic overhead is essentially the best achievable in the natural model. Obliviousness has a price floor that no future breakthrough will erase.
| Approach | Accesses to read one item | Client state |
|---|---|---|
| Encryption only | 1 (but leaks the pattern) | Key only |
| Naive full scan | Entire dataset | Key only |
| Path ORAM | About log of dataset size | Position map and a small stash |
How ORAM Differs From PIR
People often confuse Oblivious RAM with Private Information Retrieval, because both hide which item a client wants. The difference is who owns the data and what operations are allowed. PIR is about querying a database the server holds, typically read-only and often a public corpus, where the goal is that the server cannot tell which entry you retrieved, and the client need keep little to no per-item state. ORAM is about your own data that you have outsourced for storage, supporting both reads and writes, and it requires the client to maintain state, the position map and stash, to keep accesses unlinkable over time. PIR protects a query against a database; ORAM protects an ongoing read-write workload against the storage holding it.
Where It Actually Shows Up
ORAM is not a consumer feature you toggle on. It is infrastructure that lives inside specific high-assurance systems. It has been used to harden access to data inside secure enclaves, where a compromised operating system can watch an enclave's memory access pattern even though it cannot read the enclave's encrypted contents, exactly the access-pattern leak ORAM is built to defeat. It underpins research systems for oblivious cloud storage and private database queries, and it is part of the toolkit for building services that hold encrypted user data without being able to profile how that data is used.
For most threat models, the right defenses are simpler and cheaper: strong end-to-end encryption, minimizing what data exists at all, and resisting the metadata collection that access patterns are one example of. ORAM matters because it marks the far end of the spectrum, the point where you assume the storage operator is the adversary and you refuse to leak even the shape of your usage.
That framing is the useful takeaway even if you never deploy an ORAM yourself. Privacy is not one property, it is a stack of them. Content confidentiality is the layer everyone thinks of. Access-pattern privacy is the layer underneath, the one that quietly leaks when the layer above is doing its job perfectly. Knowing the difference is what separates a system that looks private from one that is, and at Haven we design against the leaks that survive encryption, not just the ones it already covers.