A SQL injection vulnerability exists whenever an application builds a database query by gluing untrusted input directly into the query string. The database receives one blob of text and has no way to know which parts were the programmer's intent and which parts came from an attacker. If the input contains SQL syntax, the database happily executes it. The result is the same root cause behind cross-site scripting: a confusion between code and data.
The Canonical Example
Imagine a login that builds its query like this, in pseudocode:
query = "SELECT * FROM users WHERE email = '" + input + "' AND password = '" + pw + "'"
A normal email produces a normal query. But suppose the attacker types ' OR '1'='1 into the email field. The query becomes a statement whose WHERE clause is always true, returning every row in the table and frequently logging the attacker in as the first user — often an administrator. No password required. The single quote "broke out" of the string literal the programmer intended, and everything after it was interpreted as SQL commands rather than data.
From that foothold the techniques escalate quickly: UNION SELECT to pull data from other tables, stacked queries to run multiple statements, and in some configurations calls to functions that read files or execute operating-system commands.
Blind and Time-Based Variants
The above assumes the application shows query results back to the attacker. Modern apps often don't — they just return "login failed" or a generic error. That doesn't make them safe; it makes the attack blind.
- Boolean-based blind — the attacker injects a condition and watches whether the page behaves differently when it is true versus false, extracting data one bit at a time.
- Time-based blind — the attacker injects something like
IF(condition, SLEEP(5), 0). If the response takes five seconds, the condition was true. Slow, but fully automatable. - Out-of-band — the injected query makes the database open a network connection (a DNS lookup, an HTTP request) to a server the attacker controls, leaking data through a side channel even when no result ever appears on the page.
Tools like sqlmap automate all of these, which is why "we don't display query results" is not a defense.
The Defenses That Don't Quite Work
Two intuitive fixes are widely deployed and widely insufficient on their own.
Manually escaping quotes seems to fix the canonical example, but it is a game of whack-a-mole. Numeric contexts need no quotes to exploit, character-set quirks can smuggle quote characters past naive escapers, and one forgotten field reopens the door. Escaping is the right tool for output encoding in HTML; it is the wrong primary tool for SQL.
Blocklisting keywords — rejecting input that contains SELECT or UNION — breaks legitimate input (anyone named "Mr. Select") and is trivially bypassed with comments, case variation, and encoding. Web application firewalls help as a noisy outer layer but should never be the thing standing between an attacker and your database.
The Defense That Actually Works: Parameterized Queries
The real fix is structural. Use parameterized queries (also called prepared statements). Instead of building a query string, you send the database a query template with placeholders, and then send the parameter values separately:
SELECT * FROM users WHERE email = ? AND password = ?
The database parses and plans the query before it ever sees the values. The parameters are bound afterward as pure data — they can never change the query's structure no matter what characters they contain. An attacker typing ' OR '1'='1 simply searches for a user whose email is literally the string ' OR '1'='1, finds none, and fails. This is not escaping; it is a clean separation of code from data at the protocol level.
| Approach | Stops injection? | Notes |
|---|---|---|
| Manual string escaping | Partly | Fragile; one miss reopens it |
| Keyword blocklist / WAF | No | Bypassable; outer layer only |
| Parameterized queries | Yes | The correct primary defense |
| ORM with bound params | Yes | Uses parameterization under the hood — until you drop to raw SQL |
Modern ORMs (object-relational mappers) parameterize by default, which is why frameworks have quietly eliminated most injection in well-written applications. The danger returns the instant a developer drops to raw SQL for a query the ORM can't express — that hand-built string is exactly where the next breach hides.
Defense in Depth Around the Query
Parameterization closes the hole, but the principle of assuming a layer will fail applies here too. Limit the blast radius:
- Least privilege — the application's database account should be able to read and write only the tables it needs, never
DROPtables or read system catalogs. - Input validation — validate type, length, and format. A user ID field should accept only digits. This is a complement to parameterization, not a substitute.
- Don't leak errors — verbose database errors hand the attacker a map. Log them server-side; show users nothing useful.
- Encrypt sensitive columns — if a breach happens anyway, encrypted-at-rest fields limit what the attacker walks away with.
Why It Connects to Privacy
Most of the breach headlines that expose millions of email addresses, password hashes, and personal records trace back to either SQL injection or stolen credentials. The data exposed is the metadata of your life — who you are, where you signed up, what you bought. This is why data minimization matters at the architecture level: a database that never stored your plaintext message contents, your social graph, or your recovery details cannot leak them, no matter how the query is built.
At Haven, the server stores ciphertext it cannot read and derives nothing sensitive from a plaintext passphrase, so even a worst-case database compromise yields encrypted blobs rather than readable mail and messages. SQL injection defense is still mandatory — parameterized queries everywhere, least-privilege accounts — but the strongest protection against a database breach is designing so that the database never holds the secrets in the first place.