Understanding UUIDs: The Universal Unique Identifier

Introduction

A UUID (Universally Unique Identifier) is a 128-bit number guaranteed to be unique across space and time. Unlike auto-increment IDs (1, 2, 3…), UUIDs can be generated anywhere without coordination. They won’t collide.

Format: 550e8400-e29b-41d4-a716-446655440000 (8-4-4-4-12 hexadecimal digits)

UUIDs solve critical problems in distributed systems, data imports, and security-conscious applications. This guide covers when to use them, which version to choose, and implementation best practices.

How It Works

UUIDs use 128 bits of randomness to make collisions virtually impossible. Let’s look at the math.

UUIDv4 (random) collision probability:

Total possible UUIDs: 2^122 (about 5.3 × 10^36)
Generating 1 billion UUIDs per second for 100 years = 3.15 × 10^18 UUIDs
Collision probability: 0.0000000000000000026% (effectively zero)

You’d need to generate a billion UUIDs every second for billions of years to have a 50% chance of a single collision.

Structure breakdown (UUIDv4):

550e8400-e29b-41d4-a716-446655440000
xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx

x = random hex digit (0-9, a-f)
4 = version number (v4)
y = variant bits (8, 9, a, or b)

Version comparison:

UUIDv4: Random (122 bits of entropy). Default choice for most use cases.
UUIDv7: Time-ordered random (draft spec). Best for high-throughput inserts.
UUIDv5: Deterministic hashing. Same input always produces same UUID.
UUIDv1: Legacy. Leaks MAC address and timestamp. Avoid.

The tradeoff: UUIDs are 128 bits (16 bytes) vs integers at 32-64 bits (4-8 bytes). This impacts index size and memory usage. For most applications, the benefits outweigh the cost.

Best Practices

1. Use UUIDv4 for General Purposes

Why: Random UUIDs provide unpredictability and collision resistance. Collision probability is effectively zero (1 in 2^122).

How to implement:

// JavaScript (Node 16+, modern browsers)
const id = crypto.randomUUID();
// → "9b1deb4d-3b7d-4bad-9bdd-2b0d7b3dcb6d"

# Python
import uuid
user_id = uuid.uuid4()
# → UUID('87b2a5e2-8b4f-4c0d-9e3f-1234567890ab')

-- PostgreSQL
CREATE TABLE users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email VARCHAR(255)
);

When to use: Default choice for user IDs, session tokens, file uploads, distributed databases.

2. Use UUIDs for Distributed Systems

Why: Multiple servers can generate IDs independently without database coordination. No single point of failure.

Problem with auto-increment:

Server A generates: user_id=1, user_id=2
Server B generates: user_id=1, user_id=2
Merge → Collision! Which user is which?

Solution with UUIDs:

Server A generates: 3c5e8a9f-...
Server B generates: 9d2b7f4c-...
Merge → No collision, IDs remain unique

Use cases:

Multi-region databases
Offline-first applications
Microservices
Data imports from external sources

3. Store UUIDs as Binary, Not Text

Why: Text representation (36 bytes) is 2.25x larger than binary (16 bytes). Impacts index size and query performance.

How to implement:

-- PostgreSQL: Native UUID type (16 bytes)
CREATE TABLE users (
    id UUID PRIMARY KEY,
    created_at TIMESTAMP
);

-- MySQL: Use BINARY(16) for storage
CREATE TABLE users (
    id BINARY(16) PRIMARY KEY,
    created_at TIMESTAMP
);

# Python: Store as bytes
import uuid
from sqlalchemy import LargeBinary

class User(Base):
    id = Column(LargeBinary(16), primary_key=True, default=uuid.uuid4().bytes)

Performance impact:

Text UUIDs: 36 bytes per ID
Binary UUIDs: 16 bytes per ID
10M records: 200MB savings in ID storage alone

4. Use UUIDv7 for Time-Ordered Scenarios

Why: UUIDv4 is random, causing database index fragmentation. UUIDv7 (draft RFC) includes timestamps for natural ordering.

When to use:

High-throughput inserts (logs, events, analytics)
Tables where creation order matters
Systems with >10k inserts/second

How to implement:

# Python (requires uuidv7 package)
from uuidv7 import uuidv7

event_id = uuidv7()
# → Sortable by time while remaining unique

Trade-off: Slightly less random than UUIDv4, but still collision-resistant.

5. Don’t Expose UUIDs in URLs for Security

Why: Even though UUIDs are unpredictable, exposing them in URLs can leak information about entity existence.

Insecure approach:

GET /api/users/9b1deb4d-3b7d-4bad-9bdd-2b0d7b3dcb6d
→ 200 OK (user exists)

GET /api/users/aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa
→ 404 Not Found (user doesn't exist)
→ Attacker can enumerate valid UUIDs

Secure approach:

GET /api/users/me (use session auth)
GET /api/users/john (use public slugs)
GET /api/users/9b1deb4d (use UUID + authorization check)

Additional security: Always validate authorization, not just UUID validity.

Common Pitfalls

Using UUIDv1 (Time-Based with MAC Address)

The problem:

# UUIDv1 includes MAC address and timestamp
uuid1 = uuid.uuid1()
# → 87b2a5e2-8b4f-11eb-9e3f-0242ac120002
#                          ^^^^^^^^^^^^^^
#                          Your MAC address leaked

Why it’s bad:

Leaks machine MAC address (hardware identifier)
Leaks timestamp (when UUID was created)
Predictable sequence allows enumeration
Privacy violation (GDPR concerns)

The fix: Use UUIDv4 (random) or UUIDv7 (time-ordered without MAC).

Treating UUIDs as Strings Everywhere

The problem:

# Bad: String comparison and storage
user_id = "9b1deb4d-3b7d-4bad-9bdd-2b0d7b3dcb6d"  # 36 bytes
if user_id == another_id:  # String comparison
    ...

Why it’s bad:

Wastes memory (36 bytes vs 16 bytes)
Slower comparisons (string vs binary)
Larger indexes (affects query performance)

The fix:

# Good: Use UUID objects and binary storage
import uuid
user_id = uuid.UUID('9b1deb4d-3b7d-4bad-9bdd-2b0d7b3dcb6d')
# Convert to string only for display
print(f"User: {user_id}")

Generating UUIDs Client-Side Without Validation

The problem:

// Client generates UUID
const userId = crypto.randomUUID();
fetch('/api/users', {
    method: 'POST',
    body: JSON.stringify({ id: userId })
});

Why it’s bad:

Malicious clients can provide duplicate or crafted UUIDs
No server-side validation of uniqueness
Can cause data corruption or conflicts

The fix:

// Server generates and returns UUID
fetch('/api/users', {
    method: 'POST',
    body: JSON.stringify({ email: 'user@example.com' })
});
// Response: { id: "9b1deb4d-...", email: "user@example.com" }

UUID Versions Comparison

Version	Use Case	Pros	Cons
UUIDv4	General purpose	Random, unpredictable	Index fragmentation
UUIDv1	Legacy systems	Time-ordered	Leaks MAC address
UUIDv5	Deterministic	Same input = same UUID	Not unique per generation
UUIDv7	High-throughput	Time-ordered, no MAC	Draft spec (not finalized)

Quick Reference Checklist

Choosing UUID strategy:

Use UUIDv4 for general applications (user IDs, sessions, files)
Use UUIDv7 for time-series data (logs, events, analytics)
Don’t use UUIDv1 (leaks MAC address)
Store as binary (16 bytes), not text (36 bytes)
Generate UUIDs server-side, not client-side

Database schema:

Use native UUID type (PostgreSQL, SQL Server)
Use BINARY(16) if native UUID unavailable (MySQL)
Add indexes on UUID columns used in JOINs
Consider UUIDv7 for tables with >10k inserts/second

Security considerations:

Don’t expose UUIDs as sole authorization mechanism
Validate authorization separately from UUID validity
Use public slugs for shareable URLs, not UUIDs
Avoid predictable UUID patterns (never use sequential v1)

Standards and References

RFC 4122: UUID specification (v1, v3, v4, v5)
Draft RFC: UUIDv7 and UUIDv8 (time-ordered variants)
PostgreSQL: Native UUID type and gen_random_uuid()
MySQL 8.0+: UUID_TO_BIN() and BIN_TO_UUID() functions

Summary

UUIDs provide globally unique identifiers that can be generated independently across distributed systems. Use UUIDv4 for general purposes. Store as binary (16 bytes) for optimal performance.

Key takeaways:

UUIDv4 is default: Random, collision-resistant, no privacy leaks
Store as binary: 16 bytes (binary) vs 36 bytes (text)
UUIDv7 for high-throughput: Time-ordered prevents index fragmentation
Don’t use UUIDv1: Leaks MAC address and timestamps
Generate server-side: Client-generated UUIDs are security risk

Understanding UUIDs: The Universal Unique Identifier

Introduction

How It Works

Best Practices

1. Use UUIDv4 for General Purposes

2. Use UUIDs for Distributed Systems

3. Store UUIDs as Binary, Not Text

4. Use UUIDv7 for Time-Ordered Scenarios

5. Don’t Expose UUIDs in URLs for Security

Common Pitfalls

Using UUIDv1 (Time-Based with MAC Address)

Treating UUIDs as Strings Everywhere

Generating UUIDs Client-Side Without Validation

UUID Versions Comparison

Quick Reference Checklist

Standards and References

Summary

Try It Yourself