URL Encoding: A Complete Developer's Guide

Introduction

URL encoding (percent-encoding) converts unsafe characters into a format that can be transmitted over the internet. URLs can only contain a limited set of characters. Anything else must be encoded.

Failing to encode URLs properly leads to broken links, security vulnerabilities, and data corruption. This guide covers best practices, common pitfalls, and security considerations.

How It Works

URL encoding (percent-encoding) converts unsafe characters into %XX format, where XX is the hexadecimal ASCII code.

Character types:

Safe (never need encoding):
  A-Z a-z 0-9 - _ . ~

Reserved (have meaning in URLs):
  : / ? # [ ] @ ! $ & ' ( ) * + , ; =

Must be encoded:
  Space and anything not safe/reserved

Encoding process:

Character: @
ASCII code: 64 (decimal)
Hex: 40
Encoded: %40

Character: Space
ASCII code: 32 (decimal)
Hex: 20
Encoded: %20

Example transformation:

Input:  hello world & test
Step 1: hello world & test
Step 2: Identify unsafe chars (space, &)
Step 3: Convert to hex
        space → 0x20 → %20
        & → 0x26 → %26
Output: hello%20world%20%26%20test

Why reserved characters need encoding:

URLs have structure: scheme://host:port/path?query=value#fragment

Each symbol has meaning:

? starts query string
& separates parameters
# starts fragment
/ separates path segments
: separates scheme/port

If your data contains these characters, you must encode them. Otherwise the browser interprets them as structure, not data.

Example of what breaks:

// Bad: User searches for "cats & dogs"
const url = `/search?q=cats & dogs`;
// Browser sees: /search?q=cats&dogs
// Thinks "dogs" is a separate parameter
// Query becomes: q="cats ", dogs=""

// Good: Encode the &
const url = `/search?q=cats%20%26%20dogs`;
// Browser sees: /search?q=cats%20%26%20dogs
// Query becomes: q="cats & dogs"

UTF-8 and multi-byte characters:

Non-ASCII characters (emoji, Chinese, accents) encode to multiple %XX sequences:

café → caf%C3%A9  (é = two bytes: C3 A9)
🚀 → %F0%9F%9A%80  (four bytes)

The browser handles UTF-8 encoding automatically with encodeURIComponent().

Best Practices

1. Always Encode User Input in URLs

Why: User input can contain special characters that break URL structure. Failure to encode creates security vulnerabilities and broken functionality.

How to implement:

// Bad: Direct string interpolation
const searchQuery = "cats & dogs";
const url = `/search?q=${searchQuery}`;
// Result: /search?q=cats & dogs
// Broken: "&" interpreted as parameter separator

// Good: Encode user input
const searchQuery = "cats & dogs";
const url = `/search?q=${encodeURIComponent(searchQuery)}`;
// Result: /search?q=cats%20%26%20dogs
// Works correctly

Critical: Never trust user input. Always encode it before including in URLs.

2. Use encodeURIComponent for Query Parameters

Why: encodeURIComponent encodes all special characters except -_.!~*'(). This is correct for query parameter values and path segments.

How to implement:

// Correct usage
const params = {
  name: "John Doe",
  email: "john+tag@example.com",
  url: "https://example.com/page?id=123"
};

const queryString = Object.entries(params)
  .map(([key, value]) => `${key}=${encodeURIComponent(value)}`)
  .join('&');
// Result: name=John%20Doe&email=john%2Btag%40example.com&url=https%3A%2F%2Fexample.com%2Fpage%3Fid%3D123

Don’t use encodeURI(): It doesn’t encode &, =, +. This breaks query parameters.

3. Understand Reserved vs Unreserved Characters

Why: Knowing which characters need encoding prevents double-encoding and malformed URLs.

Character classification:

Unreserved (never need encoding):
  A-Z a-z 0-9 - _ . ~

Reserved (have special meaning, encode in data):
  : / ? # [ ] @ ! $ & ' ( ) * + , ; =

Must be encoded (unsafe in URLs):
  Space " < > % { } | \ ^ `

Example:

// Space must be encoded
"hello world" → "hello%20world"

// @ is reserved, encode in data
"user@example.com" → "user%40example.com"

// / is reserved, don't encode in paths
"/api/users" → "/api/users" (no encoding)

4. Never Double-Encode URLs

Why: Double-encoding produces incorrect URLs that must be decoded twice. Hard to debug and causes data corruption.

How to implement:

// Bad: Encoding already-encoded data
const encoded = encodeURIComponent("hello world");  // "hello%20world"
const doubleEncoded = encodeURIComponent(encoded);  // "hello%2520world"
// %20 became %2520 - wrong!

// Good: Check if already encoded
function safeEncode(str) {
  // Decode first, then encode (idempotent)
  return encodeURIComponent(decodeURIComponent(str));
}

Detection: If you see %25 in URLs, it’s likely double-encoded (% became %25).

5. Use URL Builder Libraries for Complex URLs

Why: Manual string concatenation is error-prone. Libraries handle encoding, parameter ordering, and edge cases correctly.

How to implement:

// Manual (error-prone)
const url = `https://api.example.com/search?q=${encodeURIComponent(query)}&page=${page}`;

// Using URL API (recommended)
const url = new URL('https://api.example.com/search');
url.searchParams.set('q', query);  // Automatically encoded
url.searchParams.set('page', page);
// Result: https://api.example.com/search?q=encoded%20value&page=1

# Python: Use urllib.parse
from urllib.parse import urlencode, urlparse, urlunparse

params = {'q': 'search term', 'filter': 'active'}
query_string = urlencode(params)
# Result: q=search+term&filter=active

Common Pitfalls

Not Encoding at All

The problem:

// No encoding - dangerous
const username = "user@example.com";
fetch(`/api/users/${username}`);
// Result: /api/users/user@example.com
// Broken: @ is reserved character

Why it’s bad:

Breaks URL parsing (reserved chars have special meaning)
Security risk (URL injection attacks)
Data corruption (special chars lost or misinterpreted)

The fix:

// Properly encoded
const username = "user@example.com";
fetch(`/api/users/${encodeURIComponent(username)}`);
// Result: /api/users/user%40example.com

Encoding the Entire URL

The problem:

// Wrong: Encoding the whole URL
const fullUrl = "https://example.com/search?q=hello world";
const encoded = encodeURIComponent(fullUrl);
// Result: https%3A%2F%2Fexample.com%2Fsearch%3Fq%3Dhello%20world
// Broken: Not a valid URL anymore

Why it’s bad: Encodes structural characters (://, /, ?, &) that should remain as-is.

The fix:

// Correct: Only encode the data parts
const baseUrl = "https://example.com/search";
const query = "hello world";
const fullUrl = `${baseUrl}?q=${encodeURIComponent(query)}`;
// Result: https://example.com/search?q=hello%20world

Using + for Spaces (Outdated)

The problem:

// Old HTML form encoding
const encoded = "hello+world";  // + means space

// But in modern URLs:
const value = "1+1=2";
// Should be: 1%2B1%3D2
// Not: 1+1=2 (loses the + sign)

Why it’s bad: + is ambiguous. In query strings, it’s sometimes decoded as space (application/x-www-form-urlencoded). But not in path segments.

The fix: Always use %20 for spaces. It works everywhere.

"hello world" → "hello%20world"  // Standard
"hello world" → "hello+world"    // Legacy, avoid

Quick Reference Checklist

Encoding user input:

Always encode user input before adding to URLs
Use encodeURIComponent() for query parameters and path segments
Use URL API / libraries instead of manual string concatenation
Never encode structural characters (://, /, ?, &)
Check for double-encoding (avoid encoding already-encoded data)

Special cases:

Use %20 for spaces (not +)
Encode @ as %40 in user data
Encode & as %26 to prevent parameter splitting
Encode # as %23 to prevent fragment interpretation

Security:

Never trust user input in URLs
Validate URLs after encoding
Use URL parsing libraries, not regex
Log suspicious encoding patterns (multiple % signs)

Language-Specific Functions

JavaScript:

encodeURIComponent(str)  // Use for data parts
encodeURI(str)          // Use for complete URLs (rare)
decodeURIComponent(str) // Decode data parts

Python:

from urllib.parse import quote, unquote
quote(str)         # Encode
quote_plus(str)    # Encode with + for spaces (forms)
unquote(str)       # Decode

PHP:

urlencode($str)      // Encode for query params
rawurlencode($str)   // RFC 3986 compliant (recommended)
urldecode($str)      // Decode

Go:

import "net/url"
url.QueryEscape(str)   // Encode
url.QueryUnescape(str) // Decode

Standards and References

RFC 3986: Uniform Resource Identifier (URI) standard
RFC 1738: Uniform Resource Locators (legacy)
OWASP: URL Encoding Guide
MDN: encodeURIComponent() reference

Summary

URL encoding is mandatory for any user-provided data in URLs. Use encodeURIComponent() for query parameters and path segments. Don’t encode the full URL structure. Avoid double-encoding.

Key takeaways:

Always encode user input: Prevents broken URLs and security bugs
Use encodeURIComponent: Correct function for data parts
Don’t encode structure: Keep ://, /, ?, & as-is
Use libraries: URL API handles encoding automatically
Test with special chars: @, &, #, +, = must encode correctly

URL Encoding: A Complete Developer's Guide

Introduction

How It Works

Best Practices

1. Always Encode User Input in URLs

2. Use encodeURIComponent for Query Parameters

3. Understand Reserved vs Unreserved Characters

4. Never Double-Encode URLs

5. Use URL Builder Libraries for Complex URLs

Common Pitfalls

Not Encoding at All

Encoding the Entire URL

Using + for Spaces (Outdated)

Quick Reference Checklist

Language-Specific Functions

Standards and References

Summary

Try It Yourself