Introduction
URL encoding (percent-encoding) converts unsafe characters into a format that can be transmitted over the internet. URLs can only contain a limited set of characters. Anything else must be encoded.
Failing to encode URLs properly leads to broken links, security vulnerabilities, and data corruption. This guide covers best practices, common pitfalls, and security considerations.
How It Works
URL encoding (percent-encoding) converts unsafe characters into %XX format, where XX is the hexadecimal ASCII code.
Character types:
Safe (never need encoding):
A-Z a-z 0-9 - _ . ~
Reserved (have meaning in URLs):
: / ? # [ ] @ ! $ & ' ( ) * + , ; =
Must be encoded:
Space and anything not safe/reserved
Encoding process:
Character: @
ASCII code: 64 (decimal)
Hex: 40
Encoded: %40
Character: Space
ASCII code: 32 (decimal)
Hex: 20
Encoded: %20
Example transformation:
Input: hello world & test
Step 1: hello world & test
Step 2: Identify unsafe chars (space, &)
Step 3: Convert to hex
space → 0x20 → %20
& → 0x26 → %26
Output: hello%20world%20%26%20test
Why reserved characters need encoding:
URLs have structure: scheme://host:port/path?query=value#fragment
Each symbol has meaning:
?starts query string&separates parameters#starts fragment/separates path segments:separates scheme/port
If your data contains these characters, you must encode them. Otherwise the browser interprets them as structure, not data.
Example of what breaks:
// Bad: User searches for "cats & dogs"
const url = `/search?q=cats & dogs`;
// Browser sees: /search?q=cats&dogs
// Thinks "dogs" is a separate parameter
// Query becomes: q="cats ", dogs=""
// Good: Encode the &
const url = `/search?q=cats%20%26%20dogs`;
// Browser sees: /search?q=cats%20%26%20dogs
// Query becomes: q="cats & dogs"
UTF-8 and multi-byte characters:
Non-ASCII characters (emoji, Chinese, accents) encode to multiple %XX sequences:
café → caf%C3%A9 (é = two bytes: C3 A9)
🚀 → %F0%9F%9A%80 (four bytes)
The browser handles UTF-8 encoding automatically with encodeURIComponent().
Best Practices
1. Always Encode User Input in URLs
Why: User input can contain special characters that break URL structure. Failure to encode creates security vulnerabilities and broken functionality.
How to implement:
// Bad: Direct string interpolation
const searchQuery = "cats & dogs";
const url = `/search?q=${searchQuery}`;
// Result: /search?q=cats & dogs
// Broken: "&" interpreted as parameter separator
// Good: Encode user input
const searchQuery = "cats & dogs";
const url = `/search?q=${encodeURIComponent(searchQuery)}`;
// Result: /search?q=cats%20%26%20dogs
// Works correctly
Critical: Never trust user input. Always encode it before including in URLs.
2. Use encodeURIComponent for Query Parameters
Why: encodeURIComponent encodes all special characters except -_.!~*'(). This is correct for query parameter values and path segments.
How to implement:
// Correct usage
const params = {
name: "John Doe",
email: "john+tag@example.com",
url: "https://example.com/page?id=123"
};
const queryString = Object.entries(params)
.map(([key, value]) => `${key}=${encodeURIComponent(value)}`)
.join('&');
// Result: name=John%20Doe&email=john%2Btag%40example.com&url=https%3A%2F%2Fexample.com%2Fpage%3Fid%3D123
Don’t use encodeURI(): It doesn’t encode &, =, +. This breaks query parameters.
3. Understand Reserved vs Unreserved Characters
Why: Knowing which characters need encoding prevents double-encoding and malformed URLs.
Character classification:
Unreserved (never need encoding):
A-Z a-z 0-9 - _ . ~
Reserved (have special meaning, encode in data):
: / ? # [ ] @ ! $ & ' ( ) * + , ; =
Must be encoded (unsafe in URLs):
Space " < > % { } | \ ^ `
Example:
// Space must be encoded
"hello world" → "hello%20world"
// @ is reserved, encode in data
"user@example.com" → "user%40example.com"
// / is reserved, don't encode in paths
"/api/users" → "/api/users" (no encoding)
4. Never Double-Encode URLs
Why: Double-encoding produces incorrect URLs that must be decoded twice. Hard to debug and causes data corruption.
How to implement:
// Bad: Encoding already-encoded data
const encoded = encodeURIComponent("hello world"); // "hello%20world"
const doubleEncoded = encodeURIComponent(encoded); // "hello%2520world"
// %20 became %2520 - wrong!
// Good: Check if already encoded
function safeEncode(str) {
// Decode first, then encode (idempotent)
return encodeURIComponent(decodeURIComponent(str));
}
Detection: If you see %25 in URLs, it’s likely double-encoded (% became %25).
5. Use URL Builder Libraries for Complex URLs
Why: Manual string concatenation is error-prone. Libraries handle encoding, parameter ordering, and edge cases correctly.
How to implement:
// Manual (error-prone)
const url = `https://api.example.com/search?q=${encodeURIComponent(query)}&page=${page}`;
// Using URL API (recommended)
const url = new URL('https://api.example.com/search');
url.searchParams.set('q', query); // Automatically encoded
url.searchParams.set('page', page);
// Result: https://api.example.com/search?q=encoded%20value&page=1
# Python: Use urllib.parse
from urllib.parse import urlencode, urlparse, urlunparse
params = {'q': 'search term', 'filter': 'active'}
query_string = urlencode(params)
# Result: q=search+term&filter=active
Common Pitfalls
Not Encoding at All
The problem:
// No encoding - dangerous
const username = "user@example.com";
fetch(`/api/users/${username}`);
// Result: /api/users/user@example.com
// Broken: @ is reserved character
Why it’s bad:
- Breaks URL parsing (reserved chars have special meaning)
- Security risk (URL injection attacks)
- Data corruption (special chars lost or misinterpreted)
The fix:
// Properly encoded
const username = "user@example.com";
fetch(`/api/users/${encodeURIComponent(username)}`);
// Result: /api/users/user%40example.com
Encoding the Entire URL
The problem:
// Wrong: Encoding the whole URL
const fullUrl = "https://example.com/search?q=hello world";
const encoded = encodeURIComponent(fullUrl);
// Result: https%3A%2F%2Fexample.com%2Fsearch%3Fq%3Dhello%20world
// Broken: Not a valid URL anymore
Why it’s bad: Encodes structural characters (://, /, ?, &) that should remain as-is.
The fix:
// Correct: Only encode the data parts
const baseUrl = "https://example.com/search";
const query = "hello world";
const fullUrl = `${baseUrl}?q=${encodeURIComponent(query)}`;
// Result: https://example.com/search?q=hello%20world
Using + for Spaces (Outdated)
The problem:
// Old HTML form encoding
const encoded = "hello+world"; // + means space
// But in modern URLs:
const value = "1+1=2";
// Should be: 1%2B1%3D2
// Not: 1+1=2 (loses the + sign)
Why it’s bad: + is ambiguous. In query strings, it’s sometimes decoded as space (application/x-www-form-urlencoded). But not in path segments.
The fix: Always use %20 for spaces. It works everywhere.
"hello world" → "hello%20world" // Standard
"hello world" → "hello+world" // Legacy, avoid
Quick Reference Checklist
Encoding user input:
- Always encode user input before adding to URLs
- Use
encodeURIComponent()for query parameters and path segments - Use URL API / libraries instead of manual string concatenation
- Never encode structural characters (
://,/,?,&) - Check for double-encoding (avoid encoding already-encoded data)
Special cases:
- Use
%20for spaces (not+) - Encode
@as%40in user data - Encode
&as%26to prevent parameter splitting - Encode
#as%23to prevent fragment interpretation
Security:
- Never trust user input in URLs
- Validate URLs after encoding
- Use URL parsing libraries, not regex
- Log suspicious encoding patterns (multiple
%signs)
Language-Specific Functions
JavaScript:
encodeURIComponent(str) // Use for data parts
encodeURI(str) // Use for complete URLs (rare)
decodeURIComponent(str) // Decode data parts
Python:
from urllib.parse import quote, unquote
quote(str) # Encode
quote_plus(str) # Encode with + for spaces (forms)
unquote(str) # Decode
PHP:
urlencode($str) // Encode for query params
rawurlencode($str) // RFC 3986 compliant (recommended)
urldecode($str) // Decode
Go:
import "net/url"
url.QueryEscape(str) // Encode
url.QueryUnescape(str) // Decode
Standards and References
- RFC 3986: Uniform Resource Identifier (URI) standard
- RFC 1738: Uniform Resource Locators (legacy)
- OWASP: URL Encoding Guide
- MDN: encodeURIComponent() reference
Summary
URL encoding is mandatory for any user-provided data in URLs. Use encodeURIComponent() for query parameters and path segments. Don’t encode the full URL structure. Avoid double-encoding.
Key takeaways:
- Always encode user input: Prevents broken URLs and security bugs
- Use encodeURIComponent: Correct function for data parts
- Don’t encode structure: Keep
://,/,?,&as-is - Use libraries: URL API handles encoding automatically
- Test with special chars:
@,&,#,+,=must encode correctly