Humans and AI retrieve and consume our content differently. In this post, I want to discuss what is the best balance between content for human and content for AI.
In my former posts, I recommended using structured content for better chunking for AI to understand and retrieve content. When we talk about structured content, we often look at these document formats: markdown, json, xml and yml.
So, which document formats are the best for both human and AI? Let’s take a look at each of these document formats:
Markdown (.md)
What it is:
Markdown is a lightweight markup language designed to make writing for the web simple and readable. It uses plain text syntax (like # for headings or - for lists`) that converts easily to HTML.
Example:
# Deploying to Cloud Run
Learn how to deploy your first app.
## Steps
1. Build your image
2. Push to Container Registry
3. Deploy with Cloud Run
Industry Example:
- Microsoft Learn and Google Developers both use Markdown as their primary authoring format.
- All articles on learn.microsoft.com are
.mdfiles stored in GitHub repos likemicrosoftdocs/azure-docs. - AWS, GitHub, and OpenAI also use Markdown for documentation and developer guides.
Why humans like it:
- Clean, minimal, and intuitive — almost like writing an email.
- Easy to learn, edit, and version-control in Git.
- Highly readable even before rendering.
Why AI likes it:
- Semantically structured (headings, lists, tables) without layout noise.
- Perfect for chunking and embedding for retrieval-augmented generation (RAG) or Copilot ingestion.
- Mirrors the formats LLMs are trained on (GitHub, documentation, etc.).
Trade-offs:
- Limited metadata support compared to JSON/YAML.
- Not ideal for representing complex relational data.
✅ Best for:
Readable documentation, tutorials, conceptual and how-to content consumed by both humans and AI.
JSON (.json)
What it is:
JavaScript Object Notation (JSON) is a structured data format using key–value pairs. It’s widely used for APIs, configurations, and machine-to-machine communication.
Example:
{
"title": "Deploy to Cloud Run",
"steps": [
"Build your image",
"Push to Container Registry",
"Deploy with Cloud Run"
],
"author": "Maggie Hu"
}
Industry Example:
| Use Case | Example | Purpose |
|---|---|---|
| Microsoft Learn Catalog | JSON for doc metadata | AI indexing and discovery |
| Google Vertex AI | JSON for prompt documentation | LLM instruction structuring |
| OpenAI Function Docs | JSON as documentation schema | Model understanding |
| Schema.org JSON-LD | JSON for semantic content | AI/web discoverability |
Why humans like it:
- Familiar to developers and easy to read for small datasets.
- Ideal for storing structured data or configuration.
Why AI likes it:
- Clear, unambiguous key-value structure for precise information retrieval.
- Ideal for embedding metadata and reasoning in structured formats.
- Natively supported as input/output format for LLMs.
Trade-offs:
- Harder for non-technical readers to interpret.
- Not suitable for long-form narrative text.
✅ Best for:
Metadata, structured data exchange, and AI pipelines requiring precise context.
YAML (.yml / .yaml)
What it is:
YAML (“YAML Ain’t Markup Language”) is a human-friendly data serialization format often used for configuration files. It’s similar to JSON but uses indentation instead of braces.
Example:
title: Deploy to Cloud Run
description: Learn how to deploy your first containerized app.
steps:
- Build your image
- Push to Container Registry
- Deploy with Cloud Run
author: Maggie Hu
Industry Example:
- Microsoft Learn, GitHub Pages (Jekyll), and Hugo/Docsy sites use YAML front matter at the top of Markdown files to store metadata like title, topic, author, and tags.
- Kubernetes defines all infrastructure configuration (pods, deployments, secrets) in YAML.
- GitHub Actions uses YAML to describe CI/CD workflows (
.github/workflows/main.yml).
Why humans like it:
- Clean indentation mirrors logical hierarchy.
- Excellent for connecting content with structured metadata.
- Easy to read and edit directly in Markdown front matter.
Why AI likes it:
- Provides machine-parsable structure with human-friendly syntax.
- Used widely for prompt templates, model configuration, and structured metadata ingestion.
Trade-offs:
- Sensitive to spacing and indentation errors.
- Can be ambiguous when representing data types.
✅ Best for:
Config files, front-matter metadata, and hybrid human–AI authoring systems.
XML (.xml)
What it is:
eXtensible Markup Language (XML) is a tag-based format for representing structured data hierarchies. It’s verbose but powerful for enforcing schema-based content consistency.
Example:
<task id="deploy-cloud-run">
<title>Deploy to Cloud Run</title>
<steps>
<step>Build your image</step>
<step>Push to Container Registry</step>
<step>Deploy with Cloud Run</step>
</steps>
</task>
Industry Example:
- IBM, the creator of DITA, and companies like Cisco, Oracle, and Adobe use XML-based DITA systems for large-scale technical documentation.
- Financial, aerospace, and medical industries rely on XML for regulated documentation and content validation (e.g., FAA, FDA compliance).
- Microsoft’s legacy MSDN and Office help systems were XML-based before their Markdown migration.
Why humans (used to) love it:
- Strict structure ensures consistency and reusability.
- Excellent for translation and compliance workflows.
Why AI doesn’t love it as much:
- Verbose, token-heavy, and less semantically clean for LLMs.
- Requires preprocessing to strip tags for content embedding.
- Complex to maintain for open collaboration.
Trade-offs:
- Ideal for governance and reuse, but difficult for readability.
- Better suited for enterprise content management systems than AI retrieval.
✅ Best for:
Regulated or legacy technical documentation requiring schema validation.
Summary: Human vs. AI Alignment

Takeaway
The best format for both humans and AI is Markdown enhanced with YAML or JSON metadata.
Markdown provides readability and natural structure for human writers, while YAML and JSON add the precision and hierarchy that AI systems rely on for retrieval, linking, and reasoning.

Leave a Reply
You must be logged in to post a comment.