Data Formats Compared: JSON, YAML, CSV, XML, TOML

Compare modern data serialization formats and learn which one to use for your specific use case.

CO
conv4me
October 11, 2025
7 min read
1 view

Introduction

Data serialization formats are everywhere. APIs use JSON. Config files use YAML or TOML. Excel exports CSV. Legacy systems use XML. Each format has strengths and tradeoffs.

Choosing the wrong format leads to parsing errors, performance issues, or unmaintainable code. This guide compares JSON, YAML, CSV, XML, and TOML. You’ll learn which format fits your use case and how to avoid common pitfalls.

How It Works

Each format has different syntax and use cases. Here’s how they compare.

JSON (JavaScript Object Notation)

{
  "user": {
    "name": "John Doe",
    "age": 30,
    "roles": ["admin", "developer"]
  }
}

Characteristics:

  • Strict syntax (quotes required, no trailing commas)
  • No comments allowed
  • Nested objects and arrays
  • Fast parsing (~5-10x faster than YAML)
  • Native browser support

When it works best: APIs, web communication, NoSQL databases.

YAML (YAML Ain’t Markup Language)

user:
  name: John Doe
  age: 30
  roles:
    - admin
    - developer

Characteristics:

  • Indentation-based structure (whitespace matters)
  • Comments supported with #
  • Quotes optional for most strings
  • Human-readable
  • Slower parsing than JSON

When it works best: Config files, CI/CD pipelines, infrastructure as code.

CSV (Comma-Separated Values)

name,age,role
John Doe,30,admin
Jane Smith,25,developer

Characteristics:

  • Tabular data only (rows and columns)
  • No nested structures
  • Excel-compatible
  • Simple parsing
  • Extremely common for data exports

When it works best: Spreadsheets, data imports/exports, tabular datasets.

XML (eXtensible Markup Language)

<user>
  <name>John Doe</name>
  <age>30</age>
  <roles>
    <role>admin</role>
    <role>developer</role>
  </roles>
</user>

Characteristics:

  • Tag-based structure (<tag>value</tag>)
  • Supports attributes and namespaces
  • Verbose (lots of repetition)
  • Schema validation (XSD, DTD)
  • Legacy systems rely on it

When it works best: SOAP APIs, legacy integrations, document standards (SVG, RSS).

TOML (Tom’s Obvious Minimal Language)

[user]
name = "John Doe"
age = 30
roles = ["admin", "developer"]

Characteristics:

  • INI-file inspired syntax
  • Comments supported with #
  • Type-safe (dates, integers, floats)
  • More structured than INI, simpler than YAML
  • Growing adoption

When it works best: Application config (Rust, Go projects), package manifests.

Best Practices

1. Use JSON for APIs and Data Exchange

Why: JSON is the universal standard for REST APIs. Every language has native or near-native JSON support. It’s fast to parse and unambiguous.

How to implement:

{
  "user": {
    "id": "550e8400-e29b-41d4-a716-446655440000",
    "name": "John Doe",
    "email": "john@example.com",
    "roles": ["admin", "developer"]
  }
}

When to use:

  • REST API requests/responses
  • NoSQL database documents
  • Web browser communication
  • JavaScript/TypeScript config (package.json, tsconfig.json)

Performance: JSON parsing is 2-10x faster than YAML in most languages.

2. Use CSV for Tabular Data

Why: CSV is the simplest format for row/column data. Excel and Google Sheets natively support it. Perfect for data imports and exports.

How to implement:

user_id,name,email,created_at
1,John Doe,john@example.com,2025-01-15
2,Jane Smith,jane@example.com,2025-01-16

When to use:

  • Excel/Sheets data exports
  • Database bulk imports
  • Simple data transfer between systems
  • Analytics and reporting

Key point: Always include a header row with column names.

3. Use YAML for Configuration Files

Why: YAML is readable and supports comments. Better for files that developers edit frequently.

How to implement:

# Application configuration
database:
  host: localhost
  port: 5432
  credentials:
    username: admin
    password: ${DB_PASSWORD}  # From environment

features:
  - authentication
  - logging
  - monitoring

When to use:

  • Docker Compose files
  • Kubernetes manifests
  • CI/CD pipelines (GitHub Actions, GitLab CI)
  • Framework config (Django, Rails)

Key advantage: Comments for documentation.

4. Use TOML for Application Config

Why: TOML is simpler than YAML with fewer edge cases. Type-safe (distinguishes strings, numbers, dates).

How to implement:

[database]
host = "localhost"
port = 5432

[database.credentials]
username = "admin"
password = "${DB_PASSWORD}"

[features]
enabled = ["authentication", "logging", "monitoring"]

When to use:

  • Rust projects (Cargo.toml)
  • Go applications
  • Python packages (pyproject.toml)
  • Alternative to YAML when you want type safety

Key advantage: Less ambiguity than YAML (no weird type coercion).

5. Avoid XML Unless Required

Why: XML is verbose and slower to parse. Use it only when interoperability with legacy systems requires it.

When forced to use XML:

<?xml version="1.0" encoding="UTF-8"?>
<users>
  <user id="1">
    <name>John Doe</name>
    <email>john@example.com</email>
  </user>
</users>

When you must use it:

  • SOAP web services
  • Legacy system integrations
  • Document standards (SVG, RSS, XHTML)
  • Industries with XML mandates (finance, healthcare)

Better alternative: If you control both ends, use JSON instead.

Common Pitfalls

Using CSV for Nested Data

The problem:

user,name,role_1,role_2,role_3
1,John,admin,developer,
2,Jane,developer,,

Why it’s bad: CSV can’t represent nested structures. You end up with empty columns or data duplication.

The fix: Use JSON or XML for hierarchical data.

[
  {"user": 1, "name": "John", "roles": ["admin", "developer"]},
  {"user": 2, "name": "Jane", "roles": ["developer"]}
]

YAML Type Coercion Surprises

The problem:

# Unexpected type conversions
norway: NO    # Parsed as boolean false
version: 1.0  # Parsed as float, not string
date: 2025-01-15  # Parsed as date object

Why it’s bad: YAML auto-detects types. NO becomes false, version numbers become floats.

The fix: Quote values when you need exact strings.

norway: "NO"
version: "1.0"
date: "2025-01-15"

XML Verbosity Overhead

The problem:

<!-- 156 characters -->
<user>
  <name>John Doe</name>
  <age>30</age>
</user>
// 45 characters
{"name":"John Doe","age":30}

Why it’s bad: XML is 3-4x larger than JSON for the same data. More bandwidth, slower parsing.

The fix: Use JSON unless you specifically need XML features (namespaces, schemas, attributes).

Format Comparison Table

Format Readability Size Parsing Speed Comments Nested Data Use Case
JSON Good Small Fast ❌ No ✅ Yes APIs, web data
YAML Excellent Medium Slow ✅ Yes ✅ Yes Config files
CSV Good Smallest Fastest ❌ No ❌ No Tabular data
XML Poor Large Slow ✅ Yes ✅ Yes Legacy systems
TOML Excellent Medium Medium ✅ Yes ✅ Yes App config

Quick Reference Checklist

Use JSON when:

  • Building REST APIs
  • Storing NoSQL documents
  • Communicating with browsers
  • Performance is critical
  • Maximum compatibility needed

Use YAML when:

  • Writing config files with comments
  • Creating CI/CD pipelines
  • Defining infrastructure (Docker, K8s)
  • Humans edit files frequently

Use CSV when:

  • Exporting tabular data
  • Importing to Excel/Sheets
  • Simple row/column datasets
  • No nested structures needed

Use TOML when:

  • Building Rust/Go applications
  • Want type-safe config
  • Need simpler alternative to YAML
  • Avoiding YAML edge cases

Avoid XML unless:

  • Required by legacy systems
  • SOAP API integration
  • Industry standards mandate it

Tools and Standards

Parsers:

  • JSON: Native in JavaScript, json module (Python), encoding/json (Go)
  • YAML: js-yaml (JS), PyYAML (Python), gopkg.in/yaml (Go)
  • CSV: csv module (Python), csv-parser (Node), encoding/csv (Go)
  • XML: DOM/SAX parsers in all languages
  • TOML: toml (Python), toml (Rust), go-toml (Go)

Validators:

  • JSON Schema: ajv, jsonschema
  • YAML Lint: yamllint, prettier
  • CSV: csvlint
  • XML Schema: xmllint, XSD validators

Converters:

  • Online: json2yaml.com, csvjson.com
  • CLI: yq (YAML), jq (JSON), xq (XML)

Standards:

  • JSON: RFC 8259 (ECMA-404)
  • YAML: YAML 1.2.2 specification
  • CSV: RFC 4180
  • XML: W3C XML 1.0
  • TOML: TOML v1.0.0

Summary

JSON wins for APIs and data exchange. CSV wins for tabular data. YAML and TOML both work for config files—pick TOML if you want fewer edge cases. XML is legacy—avoid unless required.

Key takeaways:

  1. JSON for APIs: Fast, universal, unambiguous
  2. CSV for tables: Simple, Excel-compatible, no nesting
  3. YAML for config: Readable, comments, widely adopted
  4. TOML for config: Type-safe, simpler than YAML, growing
  5. XML when required: Verbose, legacy, avoid if possible

Try It Yourself

Head over to our tools and experiment with the concepts discussed in this article.