Understanding UUIDs: The Universal Unique Identifier

A comprehensive guide to UUIDs, their versions, use cases, and implementation.

CO
conv4me
October 10, 2025
8 min read
1 view

Introduction

A UUID (Universally Unique Identifier) is a 128-bit number guaranteed to be unique across space and time. Unlike auto-increment IDs (1, 2, 3…), UUIDs can be generated anywhere without coordination. They won’t collide.

Format: 550e8400-e29b-41d4-a716-446655440000 (8-4-4-4-12 hexadecimal digits)

UUIDs solve critical problems in distributed systems, data imports, and security-conscious applications. This guide covers when to use them, which version to choose, and implementation best practices.

How It Works

UUIDs use 128 bits of randomness to make collisions virtually impossible. Let’s look at the math.

UUIDv4 (random) collision probability:

  • Total possible UUIDs: 2^122 (about 5.3 × 10^36)
  • Generating 1 billion UUIDs per second for 100 years = 3.15 × 10^18 UUIDs
  • Collision probability: 0.0000000000000000026% (effectively zero)

You’d need to generate a billion UUIDs every second for billions of years to have a 50% chance of a single collision.

Structure breakdown (UUIDv4):

550e8400-e29b-41d4-a716-446655440000
xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx

x = random hex digit (0-9, a-f)
4 = version number (v4)
y = variant bits (8, 9, a, or b)

Version comparison:

  • UUIDv4: Random (122 bits of entropy). Default choice for most use cases.
  • UUIDv7: Time-ordered random (draft spec). Best for high-throughput inserts.
  • UUIDv5: Deterministic hashing. Same input always produces same UUID.
  • UUIDv1: Legacy. Leaks MAC address and timestamp. Avoid.

The tradeoff: UUIDs are 128 bits (16 bytes) vs integers at 32-64 bits (4-8 bytes). This impacts index size and memory usage. For most applications, the benefits outweigh the cost.

Best Practices

1. Use UUIDv4 for General Purposes

Why: Random UUIDs provide unpredictability and collision resistance. Collision probability is effectively zero (1 in 2^122).

How to implement:

// JavaScript (Node 16+, modern browsers)
const id = crypto.randomUUID();
// → "9b1deb4d-3b7d-4bad-9bdd-2b0d7b3dcb6d"
# Python
import uuid
user_id = uuid.uuid4()
# → UUID('87b2a5e2-8b4f-4c0d-9e3f-1234567890ab')
-- PostgreSQL
CREATE TABLE users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email VARCHAR(255)
);

When to use: Default choice for user IDs, session tokens, file uploads, distributed databases.

2. Use UUIDs for Distributed Systems

Why: Multiple servers can generate IDs independently without database coordination. No single point of failure.

Problem with auto-increment:

Server A generates: user_id=1, user_id=2
Server B generates: user_id=1, user_id=2
Merge → Collision! Which user is which?

Solution with UUIDs:

Server A generates: 3c5e8a9f-...
Server B generates: 9d2b7f4c-...
Merge → No collision, IDs remain unique

Use cases:

  • Multi-region databases
  • Offline-first applications
  • Microservices
  • Data imports from external sources

3. Store UUIDs as Binary, Not Text

Why: Text representation (36 bytes) is 2.25x larger than binary (16 bytes). Impacts index size and query performance.

How to implement:

-- PostgreSQL: Native UUID type (16 bytes)
CREATE TABLE users (
    id UUID PRIMARY KEY,
    created_at TIMESTAMP
);

-- MySQL: Use BINARY(16) for storage
CREATE TABLE users (
    id BINARY(16) PRIMARY KEY,
    created_at TIMESTAMP
);
# Python: Store as bytes
import uuid
from sqlalchemy import LargeBinary

class User(Base):
    id = Column(LargeBinary(16), primary_key=True, default=uuid.uuid4().bytes)

Performance impact:

  • Text UUIDs: 36 bytes per ID
  • Binary UUIDs: 16 bytes per ID
  • 10M records: 200MB savings in ID storage alone

4. Use UUIDv7 for Time-Ordered Scenarios

Why: UUIDv4 is random, causing database index fragmentation. UUIDv7 (draft RFC) includes timestamps for natural ordering.

When to use:

  • High-throughput inserts (logs, events, analytics)
  • Tables where creation order matters
  • Systems with >10k inserts/second

How to implement:

# Python (requires uuidv7 package)
from uuidv7 import uuidv7

event_id = uuidv7()
# → Sortable by time while remaining unique

Trade-off: Slightly less random than UUIDv4, but still collision-resistant.

5. Don’t Expose UUIDs in URLs for Security

Why: Even though UUIDs are unpredictable, exposing them in URLs can leak information about entity existence.

Insecure approach:

GET /api/users/9b1deb4d-3b7d-4bad-9bdd-2b0d7b3dcb6d
→ 200 OK (user exists)

GET /api/users/aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa
→ 404 Not Found (user doesn't exist)
→ Attacker can enumerate valid UUIDs

Secure approach:

GET /api/users/me (use session auth)
GET /api/users/john (use public slugs)
GET /api/users/9b1deb4d (use UUID + authorization check)

Additional security: Always validate authorization, not just UUID validity.

Common Pitfalls

Using UUIDv1 (Time-Based with MAC Address)

The problem:

# UUIDv1 includes MAC address and timestamp
uuid1 = uuid.uuid1()
# → 87b2a5e2-8b4f-11eb-9e3f-0242ac120002
#                          ^^^^^^^^^^^^^^
#                          Your MAC address leaked

Why it’s bad:

  • Leaks machine MAC address (hardware identifier)
  • Leaks timestamp (when UUID was created)
  • Predictable sequence allows enumeration
  • Privacy violation (GDPR concerns)

The fix: Use UUIDv4 (random) or UUIDv7 (time-ordered without MAC).

Treating UUIDs as Strings Everywhere

The problem:

# Bad: String comparison and storage
user_id = "9b1deb4d-3b7d-4bad-9bdd-2b0d7b3dcb6d"  # 36 bytes
if user_id == another_id:  # String comparison
    ...

Why it’s bad:

  • Wastes memory (36 bytes vs 16 bytes)
  • Slower comparisons (string vs binary)
  • Larger indexes (affects query performance)

The fix:

# Good: Use UUID objects and binary storage
import uuid
user_id = uuid.UUID('9b1deb4d-3b7d-4bad-9bdd-2b0d7b3dcb6d')
# Convert to string only for display
print(f"User: {user_id}")

Generating UUIDs Client-Side Without Validation

The problem:

// Client generates UUID
const userId = crypto.randomUUID();
fetch('/api/users', {
    method: 'POST',
    body: JSON.stringify({ id: userId })
});

Why it’s bad:

  • Malicious clients can provide duplicate or crafted UUIDs
  • No server-side validation of uniqueness
  • Can cause data corruption or conflicts

The fix:

// Server generates and returns UUID
fetch('/api/users', {
    method: 'POST',
    body: JSON.stringify({ email: 'user@example.com' })
});
// Response: { id: "9b1deb4d-...", email: "user@example.com" }

UUID Versions Comparison

Version Use Case Pros Cons
UUIDv4 General purpose Random, unpredictable Index fragmentation
UUIDv1 Legacy systems Time-ordered Leaks MAC address
UUIDv5 Deterministic Same input = same UUID Not unique per generation
UUIDv7 High-throughput Time-ordered, no MAC Draft spec (not finalized)

Quick Reference Checklist

Choosing UUID strategy:

  • Use UUIDv4 for general applications (user IDs, sessions, files)
  • Use UUIDv7 for time-series data (logs, events, analytics)
  • Don’t use UUIDv1 (leaks MAC address)
  • Store as binary (16 bytes), not text (36 bytes)
  • Generate UUIDs server-side, not client-side

Database schema:

  • Use native UUID type (PostgreSQL, SQL Server)
  • Use BINARY(16) if native UUID unavailable (MySQL)
  • Add indexes on UUID columns used in JOINs
  • Consider UUIDv7 for tables with >10k inserts/second

Security considerations:

  • Don’t expose UUIDs as sole authorization mechanism
  • Validate authorization separately from UUID validity
  • Use public slugs for shareable URLs, not UUIDs
  • Avoid predictable UUID patterns (never use sequential v1)

Standards and References

  • RFC 4122: UUID specification (v1, v3, v4, v5)
  • Draft RFC: UUIDv7 and UUIDv8 (time-ordered variants)
  • PostgreSQL: Native UUID type and gen_random_uuid()
  • MySQL 8.0+: UUID_TO_BIN() and BIN_TO_UUID() functions

Summary

UUIDs provide globally unique identifiers that can be generated independently across distributed systems. Use UUIDv4 for general purposes. Store as binary (16 bytes) for optimal performance.

Key takeaways:

  1. UUIDv4 is default: Random, collision-resistant, no privacy leaks
  2. Store as binary: 16 bytes (binary) vs 36 bytes (text)
  3. UUIDv7 for high-throughput: Time-ordered prevents index fragmentation
  4. Don’t use UUIDv1: Leaks MAC address and timestamps
  5. Generate server-side: Client-generated UUIDs are security risk

Try It Yourself

Head over to our tools and experiment with the concepts discussed in this article.