What Smart Speakers Actually Send Home

A voice assistant has to listen for its wake word before you say it. That single requirement is the whole privacy problem, and most of what companies tell you about it is true in a narrow, carefully worded way that leaves out the parts that matter.

Amazon, Google, and Apple each say the same thing when asked how their smart speakers work: the device is always listening for its wake word locally, on-device, and nothing is sent to the cloud until that word is detected. That's accurate. It's also not the whole story, and the gap between "accurate" and "the whole story" is where most of the actual privacy exposure lives.

Wake-Word Detection Is a Probability Model, Not a Switch

The on-device model that listens for "Alexa" or "Hey Google" is a small neural network trained to recognize an acoustic pattern. It is not a perfect classifier. It produces a confidence score, and when that score crosses a threshold, the device starts streaming audio to the cloud for full processing. The threshold is tuned to minimize false negatives (missing a real wake word) at the cost of some false positives.

Those false positives are not rare. In 2019, reporting from Bloomberg and Belgian broadcaster VRT NWS, combined with leaked internal documents, showed that Amazon and Google both ran human review programs where contractors listened to snippets of audio flagged as ambiguous, to help retrain the wake-word models. Some of those snippets included arguments, medical details, and recordings that started well before any wake word was plausibly spoken. Apple's Siri had a parallel program, confirmed in its own 2019 disclosure, that included graders listening to accidental activations that captured private medical conversations and, in one documented case, what sounded like a drug deal in progress.

What changed after 2019

All three companies now let users opt out of human review of their voice recordings, and default settings shifted toward auto-deleting recordings after a set window (commonly three to eighteen months, configurable). The review programs still exist for people who don't opt out, and grading audio against a wake-word model to improve it is a permanent feature of how these systems get better, not a one-time cleanup.

The Mute Button Doesn't Mean What You Think

Most smart speakers have a physical mute button or switch that disconnects the microphone array's power at the hardware level, which is a real and verifiable control. What it does not do is change how the device behaves the rest of the time. When unmuted, the device is processing raw audio locally, continuously, to check it against the wake-word model. That local processing happens whether or not you ever say the wake word. The question worth asking isn't "is my mic hot," it's "what happens to the audio buffer that gets analyzed and then normally discarded, and under what conditions does that discard not happen."

Manufacturers generally keep a rolling audio buffer of a few seconds on-device so that when the wake word is detected, the assistant can capture the moment just before it (this is why "Alexa, what did you just say" style corrections work). That buffer is designed to be ephemeral and overwritten continuously. It is also, structurally, exactly the kind of local cache that shows up in forensic device extractions and in vulnerability research on smart speaker firmware.

Third-Party Skills and Actions Widen the Trust Boundary

Voice assistant platforms support third-party integrations (Alexa Skills, Google Actions) built by developers who are not Amazon or Google. Academic research, including a widely cited 2021 study from Ruhr University Bochum and North Carolina State University, found that a meaningful share of published Skills and Actions had ambiguous or inconsistent privacy policies, and that the certification process for these integrations does not verify runtime behavior after approval. A malicious or careless Skill can request permissions, phrase prompts to sound like the assistant is still listening after it should have stopped ("voice squatting"), or silently log more than it needs to.

Control	What it actually limits
Hardware mute switch	Cuts microphone power. Reliable, but only covers the moment it's engaged.
Voice recording auto-delete	Limits retention window in the cloud. Does not stop initial capture or short-term local buffering.
Opt out of human review	Removes your audio from contractor grading pools. Does not affect automated processing.
Skill/Action permission review	Limits what a specific integration can request. Does not audit what it does after approval.

Reducing the Actual Exposure

If you use a smart speaker and want to keep the convenience while cutting real exposure, a few changes do most of the work. Turn off human review of voice recordings in the account privacy settings (all three major platforms let you do this without disabling the assistant). Set auto-delete to the shortest available window rather than the default. Physically mute the device during conversations you would not want transcribed, since the mute switch is one of the few controls that is enforced in hardware rather than policy. Review installed Skills and Actions periodically and remove ones you don't actively use, since an unused integration is exposure with no offsetting benefit. And place the device somewhere your most sensitive conversations don't happen near it, which sounds obvious but is the single most effective control and the one people skip.

The privacy question with voice assistants was never really "is it recording me." It's "who gets to listen later, for how long, and under what conditions does that quietly change."

The Trade-Off Doesn't Fully Go Away

None of this makes a smart speaker as private as a device with no always-on microphone. The core design (a local model deciding, in real time, whether to start streaming your audio to a company's servers) means some amount of trust in that decision boundary is unavoidable if you use the product. What's changed since 2019 is that the controls to limit downstream retention and human review are real, documented, and worth actually configuring instead of accepting the defaults, which are optimized for the company's model training pipeline, not for your exposure.

Haven doesn't have a voice assistant and isn't trying to compete with one. The relevant lesson carries over to any always-on service you invite into your home: read what "processed locally" actually promises, and check the retention settings instead of trusting the marketing page.