Why Voice Calls Are Still Mostly Plaintext (And the Protocols That Fix It)

The phone system was designed in an era when wiretapping required physical access to copper wires. Decades later, the underlying signaling protocol is unchanged — and researchers have demonstrated live interception of calls and texts from anywhere in the world using nothing but a laptop and a telecom industry access credential.

When people think about encrypted communication, they usually think about messaging apps. Signal, WhatsApp, iMessage — the conversation around end-to-end encryption has centered on text. But voice calls carry everything messaging does: intimate conversations, business negotiations, journalist sources, medical consultations. And the protection most voice calls receive is far weaker than most people assume.

SS7: The Protocol That Still Runs the Phone Network

The Public Switched Telephone Network (PSTN) — the global infrastructure that connects traditional phone calls — uses a signaling protocol called SS7 (Signaling System No. 7). SS7 was designed in the 1970s and standardized through the 1980s, built for an era when access to telecom switching equipment required physical presence at a secure facility. The protocol has no authentication between nodes: any switch that connects to the SS7 network can send messages claiming to be any other switch.

In 2014, researchers Karsten Nohl and Tobias Engel demonstrated at the Chaos Communication Congress that anyone with access to an SS7 connection — available for purchase from certain international telecom resellers — could intercept calls and text messages, track a phone's location in real time, and redirect calls to a recording device, anywhere in the world. The target needed only to receive a call; they didn't need to answer it.

The vulnerability has since been exploited beyond proof-of-concept. The U.S. Cybersecurity and Infrastructure Security Agency (CISA) issued advisories noting that SS7 weaknesses are actively used. Carriers have implemented some mitigations — filtering certain classes of SS7 messages — but the fundamental architecture remains. Any call that traverses the PSTN is potentially interceptable via SS7 at the signaling layer, regardless of what your carrier tells you about "call encryption."

VoIP and SIP: A Partial Improvement

Voice over IP (VoIP) protocols, particularly SIP (Session Initiation Protocol), replaced circuit-switched PSTN infrastructure for much of the industry over the past twenty years. SIP handles call setup, routing, and teardown over IP networks. The actual audio is typically carried by RTP (Real-time Transport Protocol) or its encrypted variant, SRTP (Secure Real-time Transport Protocol).

The split between signaling and media is important. SRTP encrypts the audio stream in transit — meaning someone who can observe the network packets won't hear your voice. However, SRTP says nothing about whether the signaling layer (SIP) is encrypted. SIP traffic is often transmitted over plain UDP or TCP; an attacker observing SIP signaling can learn who called whom, when, and for how long — the exact metadata profile that intelligence agencies have historically found more valuable than call content.

Transport vs. End-to-End

SRTP encrypts between your device and the VoIP provider's server. The provider decrypts and re-encrypts for the other party — or doesn't re-encrypt at all. True end-to-end encryption means neither the provider nor any intermediate server can access the audio. These are different guarantees with different threat models.

Most corporate VoIP systems, Zoom calls, and carrier-grade VoIP infrastructure use SRTP for the media stream. But the call still passes through the provider's infrastructure in plaintext at some point, making those systems legally compellable through a standard wiretap order — no SS7 attack required.

ZRTP: End-to-End Encryption Without a Key Server

ZRTP (Z Real-time Transport Protocol, RFC 6189) was designed by Phil Zimmermann — the same engineer who created PGP — specifically to provide end-to-end encrypted voice without trusting any intermediary. ZRTP performs a Diffie-Hellman key exchange directly between the two endpoints at call setup time. No key escrow, no PKI dependency, no central authority.

The key verification mechanism is a "Short Authentication String" (SAS): a short hash of the session key material, displayed to both parties. If you read the SAS aloud to each other and the strings match, you have cryptographic confirmation that no man-in-the-middle intercept occurred. The phone call itself becomes the authentication channel. This is elegant: an attacker would have to intercept both the key exchange and the resulting audio call, replacing the real SAS in real time.

ZRTP also provides forward secrecy: each call generates ephemeral keys that are deleted after the call ends, so a compromise of long-term keys doesn't expose past conversations. This is the same property that forward secrecy provides in messaging protocols like Signal.

How Modern Apps Handle Voice Encryption

The major encrypted messaging apps each take a different approach to voice calls:

Service	Call Encryption	E2E (No Provider Access)	Metadata Protected
Signal	Signal Protocol (SRTP + custom key exchange)	✓ Yes	~ Partial (sealed sender)
WhatsApp	Signal Protocol	✓ Yes (audio content)	✗ No (Meta collects metadata)
FaceTime	SRTP with IDS key exchange	✓ Yes	✗ No (Apple sees who called whom)
Standard VoIP / SIP	SRTP (usually)	✗ No (provider decrypt/re-encrypt)	✗ No
PSTN (regular call)	None (SS7 plaintext)	✗ No	✗ No

Signal's implementation derives per-call encryption keys from the same long-term identity keys used for messaging, authenticated through the Signal Protocol's key verification model. This means a call to a verified Signal contact provides both forward secrecy and identity assurance — the audio can't be intercepted in transit, and you know it's really the person you think it is.

What WebRTC Adds to the Picture

WebRTC — the browser-based real-time communication standard — mandates DTLS-SRTP for all media streams. DTLS (Datagram Transport Layer Security) is TLS adapted for UDP; it handles key negotiation. This means every browser-based video or audio call using WebRTC is encrypted in transit. However, DTLS-SRTP in WebRTC is not end-to-end in the way Signal is: it's encrypted between each participant and the media server, which may decrypt and re-mix audio for conferencing.

Some WebRTC implementations support insertable streams, which allow end-to-end encryption even through conferencing infrastructure — Google Meet's client-side encrypted rooms and certain Jitsi configurations use this approach. But it's opt-in and not universally deployed.

The Cellular Network's Role

Modern LTE and 5G networks encrypt the radio link between your phone and the cell tower. This protects against local interception — someone with a stingray/IMSI catcher nearby can't trivially decode your audio on LTE the way they could on 2G. But radio-link encryption is transport security, not end-to-end: the call is decrypted at the tower, re-encrypted (or not) through the carrier's internal network, and the carrier's switching infrastructure has full access.

The carrier can be legally compelled. In the US, CALEA (Communications Assistance for Law Enforcement Act, 1994) requires telecom carriers and VoIP providers serving the public to build lawful intercept capability into their infrastructure. Any call passing through a CALEA-compliant carrier is interceptable by law enforcement with a court order.

What Genuine Voice Privacy Requires

End-to-end encrypted voice calls — where only the two endpoints can access the audio, regardless of what the provider or carrier can see — require all of the following:

Key exchange that bypasses the provider — the session key must be negotiated directly between endpoints, not derived from server-held key material
Forward secrecy — ephemeral keys generated per call, discarded after
Authentication — both parties confirm the other's identity to prevent man-in-the-middle interception
Signaling encryption — even if the audio is protected, unencrypted SIP or PSTN signaling leaks who called whom and when

Signal satisfies all four. Most other calling systems satisfy one or two. A regular phone call satisfies none.

"The telephone network was designed for universal access and reliability, not for privacy. Retrofitting privacy onto a system designed without it requires replacing the protocol, not patching it." — A recurring observation in telecommunications security research

If your voice communication carries anything sensitive — legal strategy, medical information, source protection, financial discussions — the protocol your call traverses matters as much as the app you use. The most private call is one that never touches the PSTN, encrypted end-to-end with verified keys between two endpoints that trust each other. Everything else involves a trade-off you should understand before making it.

For further reading on related protocols, see our posts on forward secrecy and why metadata is often more revealing than content.