Canary Tokens: How to Know When Your Files Have Been Accessed

Most security tools tell you what to prevent. Canary tokens tell you what has already happened — the moment it happens. They're one of the few defensive techniques that work against attackers who have already bypassed your perimeter.

A canary token is a deliberately placed lure — a file, URL, credential, or identifier — that contains a hidden tracking mechanism. When an attacker (or anyone else) accesses it, the token fires a notification to the owner. The name comes from the "canary in the coal mine" idiom: the canary's death is your earliest warning of a dangerous environment.

The concept predates the modern security industry. Clifford Stoll famously described a precursor in his 1989 book The Cuckoo's Egg, where he planted fake files to track a hacker who had infiltrated systems at Lawrence Berkeley National Laboratory. The attacker opened the files while searching for information; Stoll was watching. The modern implementation is far more scalable and requires no server-side setup beyond a free service.

The core technology behind most canary tokens is simple: a unique URL that, when fetched, logs the request (IP address, timestamp, user agent, DNS resolver) and sends an alert to the token owner. Embed that URL in a document, a spreadsheet macro, an email, or an image tag — anywhere an attacker would interact with it — and you have a passive detector.

Token Types and What They Detect

Thinkst Canary popularized the free canarytokens.org service, which generates tokens in a wide variety of formats. Each type is suited to a different detection scenario.

Token Type	Best For Detecting	Trigger Mechanism
HTTP URL	Files shared online, documents with embedded images, phishing lure verification	HTTP GET request to unique URL
DNS	Environments where HTTP is blocked or monitored; highly reliable	DNS lookup of unique subdomain
Word / PDF document	Document exfiltration; attacker opens a file from a shared drive or stolen backup	Office remote template fetch or PDF JS action triggers HTTP/DNS lookup
AWS API credentials	Credential theft from code repositories, config files, or CI/CD systems	Any AWS API call with the fake key triggers CloudTrail alert forwarded to token owner
Email link	Confirming whether forwarded or stolen emails have been read	Tracking pixel or unique link in email body
Cloned website	Detecting when someone attempts to phish your users with a cloned site	JavaScript on the original site reports when it loads in the clone context

Practical Deployment Scenarios

Detecting Stolen Backups

Place a Word document with a canary token in your backup archives. Name it something plausible: Employees_Salaries_2026.docx or DB_Credentials_PROD.docx. The name should look like exactly the kind of file an attacker who just exfiltrated your backups would open first. If anyone opens it — whether in your backup environment or after exfiltration — you get an alert with their IP and timestamp.

This technique is effective because attackers who successfully exfiltrate data often open files immediately after download to verify they've captured something valuable. The canary fires before they've had time to cover their tracks.

Detecting Credential Leaks in Code

AWS API keys are a notorious source of credential leaks — developers accidentally commit them to public GitHub repositories, and automated scanners harvest them within seconds. A canary AWS key left in a plausible location (a commented-out test config, a sample .env file in a documentation repository) will trigger an alert if anyone attempts to use it. This gives you confirmation that credentials in that location type are being actively harvested, even if the real credentials aren't there.

Insider Threat Detection

Canary tokens placed in sensitive directories (HR folders, financial data, executive files) provide a passive audit trail without active monitoring software. If someone on your network opens a canary-bearing document, you see the source IP, the time, and in many cases the user agent. This isn't a substitute for proper access controls, but it provides an alert layer for authorized users who access data they shouldn't.

Placement principle

The value of a canary token comes from placement: it should be exactly where an attacker would look, bearing a name that signals high value. A canary in the wrong location, with an uninteresting name, won't get touched. Invest thought in naming and placement — that's where most of the design work is.

What Alert Data You Get

When a canary token fires, you typically receive:

Timestamp (UTC)
Source IP address — often a VPN exit node, Tor exit node, or cloud provider, but sometimes the actual attacker IP
User agent — the browser or application that made the request
DNS resolver (for DNS-type tokens) — can indicate ISP or corporate network
Geographic location (approximate, based on IP)

A sophisticated attacker using Tor or a VPN will obscure their real IP — you'll see a Tor exit node or a cloud provider datacenter address, not a home IP. This is still useful: the timestamp confirms access occurred, and the user agent may reveal the operating system and application. Combined with other evidence (logs, access records), this narrows the investigation.

Limitations and Failure Modes

Canary tokens are not reliable against all attackers. Several scenarios reduce or eliminate their effectiveness:

Attacker-aware adversaries who know canary tokens exist may recognize suspicious filenames or avoid opening unknown files entirely. This is a real limitation for sophisticated red teams and nation-state operators.
Offline environments: if the attacker operates in an air-gapped or firewalled environment, HTTP-based tokens won't fire. DNS tokens are more robust here, since DNS often passes through firewalls when HTTP doesn't — but a truly isolated network blocks both.
Preview panes: some email clients and file managers automatically preview documents, which may fire the canary even when the file is opened legitimately by an authorized user. Calibrate your alert thresholds accordingly.
False attribution: the IP address you receive is the request origin, not necessarily the attacker's location. Don't over-attribute based on IP alone.

Canary tokens don't replace intrusion detection — they complement it. Think of them as a passive tripwire layer that fires when other controls have already failed. They're most valuable precisely when you don't know you've been breached.

Getting Started

The easiest entry point is canarytokens.org (operated by Thinkst Canary), which generates tokens for free with no account required. You provide an email address for alerts and a memo (for your own reference), and it generates the token. For organizational use, Thinkst offers a self-hosted open-source version and a commercial product with centralized management.

A reasonable starting deployment for an individual or small team:

One Word document canary in each important shared drive location, named to attract attention
One DNS canary embedded in any externally accessible config files or documentation
One AWS credential canary in any code repository that historically might have had real credentials

The marginal cost of deploying canary tokens is almost zero. The marginal value — knowing immediately when a breach has occurred rather than weeks later during a forensic investigation — can be significant. They're one of the highest-leverage passive defensive tools available, and they scale from individual use to enterprise deployment without architectural complexity.

For organizations doing more systematic breach detection work, canary tokens pair well with monitoring your digital footprint — knowing what's exposed lets you plant canaries in the right places.