Social graph data — the map of who knows whom — is among the most sensitive data a messaging service could collect. Governments have used it to identify dissident networks. Advertisers use it to infer interests and relationships. Data breaches that expose contact graphs leak information about people who never consented to be in the database.
Contact discovery is the process by which a messaging app determines which of your contacts also use the service. Every messaging app that supports address-book-based friend finding has to solve this problem. How they solve it varies dramatically, with significant privacy consequences.
The Naive Approach (and Why It's Still Common)
The simplest contact discovery implementation: upload your entire address book to the server, query it against the user database, return a list of matches. Simple, fast, works. Also uploads the phone numbers and names of everyone in your contacts — including people who don't use the service and never agreed to have their information sent to this company.
This approach was standard for years. WhatsApp's early implementation worked this way. Facebook's "People You May Know" feature has been extensively analyzed for how deeply contact upload data feeds its social graph inference engine.
When you upload your address book, you're sharing data about other people. Your contact list contains their phone numbers, names, and possibly email addresses. None of those people consented to have their information sent to this service. Contact discovery is one of the few places where an individual user's privacy choice has direct consequences for non-users.
Hashing: Better, but Not Enough
A common improvement: instead of uploading raw phone numbers, hash them first and upload the hashes. The server computes hashes of registered users' phone numbers, compares, and returns matches. Your plaintext contacts aren't sent. Progress.
The problem is that phone numbers are a small, enumerable space. There are approximately 10 billion possible 10-digit phone numbers, and most are unassigned or follow geographic patterns. An attacker who obtains hashed phone numbers can reverse them by computing hashes of all plausible numbers. In 2019, researchers demonstrated this attack against WhatsApp and Telegram, recovering the phone numbers associated with hashed entries in roughly an hour using commodity hardware.
Simple hashing of phone numbers is not meaningfully better than plaintext for a server-side attacker or a breach scenario. It raises the bar against passive surveillance of network traffic, but not much more.
Signal's OPRF Approach
Signal's current contact discovery system uses an Oblivious Pseudorandom Function (OPRF) implemented inside a Trusted Execution Environment (Intel SGX). The protocol works as follows:
- Signal's server holds the user database encrypted under a key only accessible inside SGX.
- The client submits its contacts as OPRF inputs — the server computes a pseudorandom function over them, but cannot see the inputs, only contribute its key to the computation.
- The client receives outputs it can check against the server's published set, learning only which of its contacts are registered — not the full user database, and not exposing plaintext contact data to the server.
The SGX component is a trust assumption — Intel SGX has had vulnerabilities, and trusting hardware attestation has limits. But it's categorically more privacy-preserving than upload-and-compare. Signal's contact discovery protocol has been published and analyzed publicly; the academic paper is available.
The core property Signal's approach achieves: the server learns neither your contacts' phone numbers nor which users are in your address book. You learn which of your contacts are Signal users. Neither party learns more than that. Signal blog, "Technology preview: Private contact discovery for Signal"
What Different Apps Actually Do
| App | Contact Discovery Method | Non-user Data Uploaded? |
|---|---|---|
| Signal | OPRF + SGX trusted execution | ✓ No (by design) |
| Hashed phone numbers; historical plaintext upload | ✗ Yes (hash reversible) | |
| Telegram | Hashed phone numbers | ✗ Yes (hash reversible) |
| iMessage | Query Apple servers for registered addresses | ~ Apple receives query hashes |
| Email-based systems | User supplies address directly; no contact scan | ✓ Not applicable |
Email-based messaging systems, including Haven, sidestep the contact discovery problem because there's no automatic address book scan — users initiate contact by entering or importing addresses directly. This is less convenient than "find all your friends automatically," but it avoids uploading your address book entirely.
The Enumeration Attack
Contact discovery creates a second privacy risk beyond upload: enumeration. If a service returns a match/no-match for any queried phone number, an attacker can query every number in a prefix range to build a list of all users. This lets someone determine whether a specific person uses a service — potentially sensitive information for, say, a journalist using Signal, or an activist using a particular encrypted app.
WhatsApp has faced multiple reports of bulk enumeration used to harvest phone numbers of users in specific regions. Services that don't rate-limit contact discovery queries or implement anti-enumeration measures enable this attack. Signal's OPRF approach limits enumeration by making it cryptographically expensive to probe numbers not in your own contact list.
The Problem of Non-Consenting Third Parties
Even well-intentioned contact discovery implementations have a structural problem: the privacy decision belongs to the user installing the app, but the data exposed belongs partly to other people. If you upload your contacts, you expose the phone numbers and names of everyone in your address book. They had no say.
GDPR Article 6 requires a lawful basis for processing personal data. When a messaging app processes the phone numbers of non-users through contact discovery, those non-users are data subjects who haven't provided consent. Several data protection authorities in the EU have investigated this specifically — the Irish Data Protection Commission investigated WhatsApp's contact discovery practices, resulting in a €225 million fine in 2021 (though primarily related to transparency, not contact discovery specifically).
The structural fix is what Signal implemented: cryptographic contact discovery that doesn't require transmitting contacts to the server. The economic incentive against it is that social graph data is valuable. Services that collect it have something to sell or use; services that don't, don't.
What You Can Do
Practically, your options depend on what tools your contacts use. If everyone you communicate with uses Signal, the contact discovery problem is largely addressed. For other services:
- Deny contact access if the app offers manual contact entry — most do, just with more friction. You can find people by username or phone number without uploading your full address book.
- For services that require contact upload, consider what's in your address book. If it contains contacts for people who would be put at risk by having their relationship to you exposed — sources, medical professionals, legal contacts — the upload decision carries real stakes.
- Prefer email-based identity over phone-number-based identity for sensitive contacts. Email addresses are not enumerable in the same way phone numbers are, and email-based systems typically don't scan address books.
Contact discovery is one of many places where metadata privacy matters as much as content privacy. Your social graph — who you know and contact — can reveal as much as the content of your messages in many contexts. It deserves the same scrutiny.