Introduction
A UUID (Universally Unique Identifier) is a 128-bit number guaranteed to be unique across space and time. Unlike auto-increment IDs (1, 2, 3…), UUIDs can be generated anywhere without coordination. They won’t collide.
Format: 550e8400-e29b-41d4-a716-446655440000 (8-4-4-4-12 hexadecimal digits)
UUIDs solve critical problems in distributed systems, data imports, and security-conscious applications. This guide covers when to use them, which version to choose, and implementation best practices.
How It Works
UUIDs use 128 bits of randomness to make collisions virtually impossible. Let’s look at the math.
UUIDv4 (random) collision probability:
- Total possible UUIDs: 2^122 (about 5.3 × 10^36)
- Generating 1 billion UUIDs per second for 100 years = 3.15 × 10^18 UUIDs
- Collision probability: 0.0000000000000000026% (effectively zero)
You’d need to generate a billion UUIDs every second for billions of years to have a 50% chance of a single collision.
Structure breakdown (UUIDv4):
550e8400-e29b-41d4-a716-446655440000
xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx
x = random hex digit (0-9, a-f)
4 = version number (v4)
y = variant bits (8, 9, a, or b)
Version comparison:
- UUIDv4: Random (122 bits of entropy). Default choice for most use cases.
- UUIDv7: Time-ordered random (draft spec). Best for high-throughput inserts.
- UUIDv5: Deterministic hashing. Same input always produces same UUID.
- UUIDv1: Legacy. Leaks MAC address and timestamp. Avoid.
The tradeoff: UUIDs are 128 bits (16 bytes) vs integers at 32-64 bits (4-8 bytes). This impacts index size and memory usage. For most applications, the benefits outweigh the cost.
Best Practices
1. Use UUIDv4 for General Purposes
Why: Random UUIDs provide unpredictability and collision resistance. Collision probability is effectively zero (1 in 2^122).
How to implement:
// JavaScript (Node 16+, modern browsers)
const id = crypto.randomUUID();
// → "9b1deb4d-3b7d-4bad-9bdd-2b0d7b3dcb6d"
# Python
import uuid
user_id = uuid.uuid4()
# → UUID('87b2a5e2-8b4f-4c0d-9e3f-1234567890ab')
-- PostgreSQL
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email VARCHAR(255)
);
When to use: Default choice for user IDs, session tokens, file uploads, distributed databases.
2. Use UUIDs for Distributed Systems
Why: Multiple servers can generate IDs independently without database coordination. No single point of failure.
Problem with auto-increment:
Server A generates: user_id=1, user_id=2
Server B generates: user_id=1, user_id=2
Merge → Collision! Which user is which?
Solution with UUIDs:
Server A generates: 3c5e8a9f-...
Server B generates: 9d2b7f4c-...
Merge → No collision, IDs remain unique
Use cases:
- Multi-region databases
- Offline-first applications
- Microservices
- Data imports from external sources
3. Store UUIDs as Binary, Not Text
Why: Text representation (36 bytes) is 2.25x larger than binary (16 bytes). Impacts index size and query performance.
How to implement:
-- PostgreSQL: Native UUID type (16 bytes)
CREATE TABLE users (
id UUID PRIMARY KEY,
created_at TIMESTAMP
);
-- MySQL: Use BINARY(16) for storage
CREATE TABLE users (
id BINARY(16) PRIMARY KEY,
created_at TIMESTAMP
);
# Python: Store as bytes
import uuid
from sqlalchemy import LargeBinary
class User(Base):
id = Column(LargeBinary(16), primary_key=True, default=uuid.uuid4().bytes)
Performance impact:
- Text UUIDs: 36 bytes per ID
- Binary UUIDs: 16 bytes per ID
- 10M records: 200MB savings in ID storage alone
4. Use UUIDv7 for Time-Ordered Scenarios
Why: UUIDv4 is random, causing database index fragmentation. UUIDv7 (draft RFC) includes timestamps for natural ordering.
When to use:
- High-throughput inserts (logs, events, analytics)
- Tables where creation order matters
- Systems with >10k inserts/second
How to implement:
# Python (requires uuidv7 package)
from uuidv7 import uuidv7
event_id = uuidv7()
# → Sortable by time while remaining unique
Trade-off: Slightly less random than UUIDv4, but still collision-resistant.
5. Don’t Expose UUIDs in URLs for Security
Why: Even though UUIDs are unpredictable, exposing them in URLs can leak information about entity existence.
Insecure approach:
GET /api/users/9b1deb4d-3b7d-4bad-9bdd-2b0d7b3dcb6d
→ 200 OK (user exists)
GET /api/users/aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa
→ 404 Not Found (user doesn't exist)
→ Attacker can enumerate valid UUIDs
Secure approach:
GET /api/users/me (use session auth)
GET /api/users/john (use public slugs)
GET /api/users/9b1deb4d (use UUID + authorization check)
Additional security: Always validate authorization, not just UUID validity.
Common Pitfalls
Using UUIDv1 (Time-Based with MAC Address)
The problem:
# UUIDv1 includes MAC address and timestamp
uuid1 = uuid.uuid1()
# → 87b2a5e2-8b4f-11eb-9e3f-0242ac120002
# ^^^^^^^^^^^^^^
# Your MAC address leaked
Why it’s bad:
- Leaks machine MAC address (hardware identifier)
- Leaks timestamp (when UUID was created)
- Predictable sequence allows enumeration
- Privacy violation (GDPR concerns)
The fix: Use UUIDv4 (random) or UUIDv7 (time-ordered without MAC).
Treating UUIDs as Strings Everywhere
The problem:
# Bad: String comparison and storage
user_id = "9b1deb4d-3b7d-4bad-9bdd-2b0d7b3dcb6d" # 36 bytes
if user_id == another_id: # String comparison
...
Why it’s bad:
- Wastes memory (36 bytes vs 16 bytes)
- Slower comparisons (string vs binary)
- Larger indexes (affects query performance)
The fix:
# Good: Use UUID objects and binary storage
import uuid
user_id = uuid.UUID('9b1deb4d-3b7d-4bad-9bdd-2b0d7b3dcb6d')
# Convert to string only for display
print(f"User: {user_id}")
Generating UUIDs Client-Side Without Validation
The problem:
// Client generates UUID
const userId = crypto.randomUUID();
fetch('/api/users', {
method: 'POST',
body: JSON.stringify({ id: userId })
});
Why it’s bad:
- Malicious clients can provide duplicate or crafted UUIDs
- No server-side validation of uniqueness
- Can cause data corruption or conflicts
The fix:
// Server generates and returns UUID
fetch('/api/users', {
method: 'POST',
body: JSON.stringify({ email: 'user@example.com' })
});
// Response: { id: "9b1deb4d-...", email: "user@example.com" }
UUID Versions Comparison
| Version | Use Case | Pros | Cons |
|---|---|---|---|
| UUIDv4 | General purpose | Random, unpredictable | Index fragmentation |
| UUIDv1 | Legacy systems | Time-ordered | Leaks MAC address |
| UUIDv5 | Deterministic | Same input = same UUID | Not unique per generation |
| UUIDv7 | High-throughput | Time-ordered, no MAC | Draft spec (not finalized) |
Quick Reference Checklist
Choosing UUID strategy:
- Use UUIDv4 for general applications (user IDs, sessions, files)
- Use UUIDv7 for time-series data (logs, events, analytics)
- Don’t use UUIDv1 (leaks MAC address)
- Store as binary (16 bytes), not text (36 bytes)
- Generate UUIDs server-side, not client-side
Database schema:
- Use native UUID type (PostgreSQL, SQL Server)
- Use BINARY(16) if native UUID unavailable (MySQL)
- Add indexes on UUID columns used in JOINs
- Consider UUIDv7 for tables with >10k inserts/second
Security considerations:
- Don’t expose UUIDs as sole authorization mechanism
- Validate authorization separately from UUID validity
- Use public slugs for shareable URLs, not UUIDs
- Avoid predictable UUID patterns (never use sequential v1)
Standards and References
- RFC 4122: UUID specification (v1, v3, v4, v5)
- Draft RFC: UUIDv7 and UUIDv8 (time-ordered variants)
- PostgreSQL: Native UUID type and
gen_random_uuid() - MySQL 8.0+:
UUID_TO_BIN()andBIN_TO_UUID()functions
Summary
UUIDs provide globally unique identifiers that can be generated independently across distributed systems. Use UUIDv4 for general purposes. Store as binary (16 bytes) for optimal performance.
Key takeaways:
- UUIDv4 is default: Random, collision-resistant, no privacy leaks
- Store as binary: 16 bytes (binary) vs 36 bytes (text)
- UUIDv7 for high-throughput: Time-ordered prevents index fragmentation
- Don’t use UUIDv1: Leaks MAC address and timestamps
- Generate server-side: Client-generated UUIDs are security risk