Introduction
Data serialization formats are everywhere. APIs use JSON. Config files use YAML or TOML. Excel exports CSV. Legacy systems use XML. Each format has strengths and tradeoffs.
Choosing the wrong format leads to parsing errors, performance issues, or unmaintainable code. This guide compares JSON, YAML, CSV, XML, and TOML. You’ll learn which format fits your use case and how to avoid common pitfalls.
How It Works
Each format has different syntax and use cases. Here’s how they compare.
JSON (JavaScript Object Notation)
{
"user": {
"name": "John Doe",
"age": 30,
"roles": ["admin", "developer"]
}
}
Characteristics:
- Strict syntax (quotes required, no trailing commas)
- No comments allowed
- Nested objects and arrays
- Fast parsing (~5-10x faster than YAML)
- Native browser support
When it works best: APIs, web communication, NoSQL databases.
YAML (YAML Ain’t Markup Language)
user:
name: John Doe
age: 30
roles:
- admin
- developer
Characteristics:
- Indentation-based structure (whitespace matters)
- Comments supported with
# - Quotes optional for most strings
- Human-readable
- Slower parsing than JSON
When it works best: Config files, CI/CD pipelines, infrastructure as code.
CSV (Comma-Separated Values)
name,age,role
John Doe,30,admin
Jane Smith,25,developer
Characteristics:
- Tabular data only (rows and columns)
- No nested structures
- Excel-compatible
- Simple parsing
- Extremely common for data exports
When it works best: Spreadsheets, data imports/exports, tabular datasets.
XML (eXtensible Markup Language)
<user>
<name>John Doe</name>
<age>30</age>
<roles>
<role>admin</role>
<role>developer</role>
</roles>
</user>
Characteristics:
- Tag-based structure (
<tag>value</tag>) - Supports attributes and namespaces
- Verbose (lots of repetition)
- Schema validation (XSD, DTD)
- Legacy systems rely on it
When it works best: SOAP APIs, legacy integrations, document standards (SVG, RSS).
TOML (Tom’s Obvious Minimal Language)
[user]
name = "John Doe"
age = 30
roles = ["admin", "developer"]
Characteristics:
- INI-file inspired syntax
- Comments supported with
# - Type-safe (dates, integers, floats)
- More structured than INI, simpler than YAML
- Growing adoption
When it works best: Application config (Rust, Go projects), package manifests.
Best Practices
1. Use JSON for APIs and Data Exchange
Why: JSON is the universal standard for REST APIs. Every language has native or near-native JSON support. It’s fast to parse and unambiguous.
How to implement:
{
"user": {
"id": "550e8400-e29b-41d4-a716-446655440000",
"name": "John Doe",
"email": "john@example.com",
"roles": ["admin", "developer"]
}
}
When to use:
- REST API requests/responses
- NoSQL database documents
- Web browser communication
- JavaScript/TypeScript config (package.json, tsconfig.json)
Performance: JSON parsing is 2-10x faster than YAML in most languages.
2. Use CSV for Tabular Data
Why: CSV is the simplest format for row/column data. Excel and Google Sheets natively support it. Perfect for data imports and exports.
How to implement:
user_id,name,email,created_at
1,John Doe,john@example.com,2025-01-15
2,Jane Smith,jane@example.com,2025-01-16
When to use:
- Excel/Sheets data exports
- Database bulk imports
- Simple data transfer between systems
- Analytics and reporting
Key point: Always include a header row with column names.
3. Use YAML for Configuration Files
Why: YAML is readable and supports comments. Better for files that developers edit frequently.
How to implement:
# Application configuration
database:
host: localhost
port: 5432
credentials:
username: admin
password: ${DB_PASSWORD} # From environment
features:
- authentication
- logging
- monitoring
When to use:
- Docker Compose files
- Kubernetes manifests
- CI/CD pipelines (GitHub Actions, GitLab CI)
- Framework config (Django, Rails)
Key advantage: Comments for documentation.
4. Use TOML for Application Config
Why: TOML is simpler than YAML with fewer edge cases. Type-safe (distinguishes strings, numbers, dates).
How to implement:
[database]
host = "localhost"
port = 5432
[database.credentials]
username = "admin"
password = "${DB_PASSWORD}"
[features]
enabled = ["authentication", "logging", "monitoring"]
When to use:
- Rust projects (Cargo.toml)
- Go applications
- Python packages (pyproject.toml)
- Alternative to YAML when you want type safety
Key advantage: Less ambiguity than YAML (no weird type coercion).
5. Avoid XML Unless Required
Why: XML is verbose and slower to parse. Use it only when interoperability with legacy systems requires it.
When forced to use XML:
<?xml version="1.0" encoding="UTF-8"?>
<users>
<user id="1">
<name>John Doe</name>
<email>john@example.com</email>
</user>
</users>
When you must use it:
- SOAP web services
- Legacy system integrations
- Document standards (SVG, RSS, XHTML)
- Industries with XML mandates (finance, healthcare)
Better alternative: If you control both ends, use JSON instead.
Common Pitfalls
Using CSV for Nested Data
The problem:
user,name,role_1,role_2,role_3
1,John,admin,developer,
2,Jane,developer,,
Why it’s bad: CSV can’t represent nested structures. You end up with empty columns or data duplication.
The fix: Use JSON or XML for hierarchical data.
[
{"user": 1, "name": "John", "roles": ["admin", "developer"]},
{"user": 2, "name": "Jane", "roles": ["developer"]}
]
YAML Type Coercion Surprises
The problem:
# Unexpected type conversions
norway: NO # Parsed as boolean false
version: 1.0 # Parsed as float, not string
date: 2025-01-15 # Parsed as date object
Why it’s bad: YAML auto-detects types. NO becomes false, version numbers become floats.
The fix: Quote values when you need exact strings.
norway: "NO"
version: "1.0"
date: "2025-01-15"
XML Verbosity Overhead
The problem:
<!-- 156 characters -->
<user>
<name>John Doe</name>
<age>30</age>
</user>
// 45 characters
{"name":"John Doe","age":30}
Why it’s bad: XML is 3-4x larger than JSON for the same data. More bandwidth, slower parsing.
The fix: Use JSON unless you specifically need XML features (namespaces, schemas, attributes).
Format Comparison Table
| Format | Readability | Size | Parsing Speed | Comments | Nested Data | Use Case |
|---|---|---|---|---|---|---|
| JSON | Good | Small | Fast | ❌ No | ✅ Yes | APIs, web data |
| YAML | Excellent | Medium | Slow | ✅ Yes | ✅ Yes | Config files |
| CSV | Good | Smallest | Fastest | ❌ No | ❌ No | Tabular data |
| XML | Poor | Large | Slow | ✅ Yes | ✅ Yes | Legacy systems |
| TOML | Excellent | Medium | Medium | ✅ Yes | ✅ Yes | App config |
Quick Reference Checklist
Use JSON when:
- Building REST APIs
- Storing NoSQL documents
- Communicating with browsers
- Performance is critical
- Maximum compatibility needed
Use YAML when:
- Writing config files with comments
- Creating CI/CD pipelines
- Defining infrastructure (Docker, K8s)
- Humans edit files frequently
Use CSV when:
- Exporting tabular data
- Importing to Excel/Sheets
- Simple row/column datasets
- No nested structures needed
Use TOML when:
- Building Rust/Go applications
- Want type-safe config
- Need simpler alternative to YAML
- Avoiding YAML edge cases
Avoid XML unless:
- Required by legacy systems
- SOAP API integration
- Industry standards mandate it
Tools and Standards
Parsers:
- JSON: Native in JavaScript,
jsonmodule (Python),encoding/json(Go) - YAML:
js-yaml(JS),PyYAML(Python),gopkg.in/yaml(Go) - CSV:
csvmodule (Python),csv-parser(Node),encoding/csv(Go) - XML: DOM/SAX parsers in all languages
- TOML:
toml(Python),toml(Rust),go-toml(Go)
Validators:
- JSON Schema:
ajv,jsonschema - YAML Lint:
yamllint,prettier - CSV:
csvlint - XML Schema:
xmllint, XSD validators
Converters:
- Online: json2yaml.com, csvjson.com
- CLI:
yq(YAML),jq(JSON),xq(XML)
Standards:
- JSON: RFC 8259 (ECMA-404)
- YAML: YAML 1.2.2 specification
- CSV: RFC 4180
- XML: W3C XML 1.0
- TOML: TOML v1.0.0
Summary
JSON wins for APIs and data exchange. CSV wins for tabular data. YAML and TOML both work for config files—pick TOML if you want fewer edge cases. XML is legacy—avoid unless required.
Key takeaways:
- JSON for APIs: Fast, universal, unambiguous
- CSV for tables: Simple, Excel-compatible, no nesting
- YAML for config: Readable, comments, widely adopted
- TOML for config: Type-safe, simpler than YAML, growing
- XML when required: Verbose, legacy, avoid if possible