Learning how to learn

Tag: content creation

Turn GitHub Copilot Into Your Documentation Co-Writer
For documentation writers managing large sets of content—enterprise knowledge bases, multi-product help portals, or internal wikis—the challenge goes beyond polishing individual sentences. You need to:
- Keep a consistent voice and style across hundreds of articles.
- Spot duplicate or overlapping topics
- Maintain accurate metadata and links
- Gain insights into content gaps and structure
This is where GitHub Copilot inside Visual Studio Code stands out. Unlike generic Gen-AI chatbots, Copilot has visibility across your entire content set, not just the file you’re editing. With carefully crafted prompts and instructions, that means you can ask it to:
- Highlight potential gaps, redundancies, or structural issues.
- Suggest rewrites that preserve consistency across articles.
- Surface related content to link or cross-reference.
In other words, Copilot isn’t just a text improver—it’s a content intelligence partner for documentation at scale. And if you’re already working in VS Code, it integrates directly into your workflow without requiring a new toolset.

What Can GitHub Copilot Do for Your Documentation

Once installed, GitHub Copilot can work directly on your .md, .html, .xml, or .yml files. Here’s how it helps across both single documents and large collections:

Refine Specific Text Blocks

Highlight a section and ask Copilot to improve the writing. This makes it easy to sharpen clarity and tone in targeted areas.

Suggest Edits Across the Entire Article

Use Copilot Chat to get suggestions for consistency and flow across an entire piece.

Fill in Metadata and Unfinished Sections

Copilot can auto-complete metadata fields or unfinished drafts, reducing the chance of missing key details.

Surface Relevant Links

While you’re writing, Copilot may suggest links to related articles in your repository—helping you connect content for the reader.

Spot Duplicates and Gaps (emerging use)

With tailored prompts, you can ask Copilot to scan for overlap between articles or flag areas where documentation is thin. This gives you content architecture insights, not just sentence-level edits.

What do you need to set up GitHub Copilot?

To set up GitHub Copilot, you will need:
- A GitHub account
- Installing Visual Studio Code (free)
- Installing the GitHub Copilot plugin.
Note: While GitHub Copilot offers a free tier, paid plans provide additional features and higher usage limits.

Why Copilot Is Different from Copilot in Word or other Gen-AI Chatbots

At first glance, you might think these features look similar to what Copilot in Word or other generative AI chatbots can do. But GitHub Copilot offers unique advantages for documentation work:
- Cross-Document Awareness
  Because it’s embedded in VS Code, Copilot has visibility into your entire local repo. For example, if you’re writing about pay-as-you-go billing in one article, it can pull phrasing or context from another relevant file almost instantly.
- Enterprise Content Intelligence
  With prompts, you can ask Copilot to analyze your portfolio: identify duplicate topics, find potential links, and even suggest improvements to your information architecture. This is especially valuable for knowledge bases and enterprise-scale content libraries.
- Code-Style Edit Reviews
  Visual Studio Code + GitHub Copilot has the ability to show suggested edits as code updates. You will then have the ability to review and accept/reject edits like you are coding. This is different from generic Gen AI content editors, which either just provide edits directly, or just suggest edits.
- Customizable Rules and Prompts
  You can set up an instruction.md file that defines rules for tone, heading style, or terminology. You can also create reusable prompt files and call them with / during chats. This ensures your writing is not just polished, but also consistent with your team’s standards.
Together, these capabilities transform GitHub Copilot from a document-level writing assistant into a documentation co-architect.

Limitations

Like any AI tool, GitHub Copilot isn’t perfect. Keep these in mind:

Always review suggestions
Like any other Gen AI tools, GitHub Copilot can hallucinate. Always review its suggestions and validate its edits.

Wrap-Up: Copilot as Your Content Partner

GitHub Copilot inside Visual Studio Code isn’t just another AI writing assistant—it’s a tool that scales with your entire content ecosystem.
- It refines text, polishes full articles, completes metadata, and suggests links.
- It leverages cross-document awareness to reveal gaps, duplicates, and structural improvements.
- It enforces custom rules and standards, ensuring consistency across hundreds of files.
And here’s where the real advantage comes in: with careful crafting of prompts and instruction files, Copilot becomes more than a reactive assistant. You can guide it to apply your team’s style, enforce terminology, highlight structural issues, and even surface information architecture insights. In other words, the quality of what Copilot gives you is shaped by the quality of what you feed it.

For content creators managing large sets of documentation, Copilot is more than a co-writer—it’s a content intelligence partner and co-architect. With thoughtful setup and prompt design, it helps you maintain quality, speed, and consistency—even at enterprise scale.

👉 Try it in your next documentation sprint and see how it transforms the way you manage your whole body of content.
September 11, 2025
Enhance Content Discoverability with Effective Metadata
In my previous posts on AI content optimization, I focused on two critical elements: how content chunking minimizes hallucination and how vector embeddings enable semantic matching. I covered recommendations like using structured formats, question-and-answer sets, and conversational language to improve both content quality and discoverability.

But I didn’t dive deep into the third pillar that makes RAG truly powerful: metadata. While chunking handles the “what” of your content and embeddings capture the “meaning,” metadata provides the essential “context” that transforms good retrieval into precise, relevant results.

The Three Pillars of RAG-Optimized Content

Chunking (The “What”): Breaks content into digestible, topic-focused pieces
- Structured formats with clear headings
- Single-topic chunks
- Consistent templates
Vector Embeddings (The “Meaning”): Captures semantic understanding
- Question-format headings
- Conversational language
- Semantic similarity matching
Metadata (The “Context”): Provides situational relevance
- Article type and intended audience
- Skill level and role requirements
- Date, version, and related topics
To understand why richer metadata can provide better context, we need to understand how vector embeddings are stored in vector database. After all, when RAG compare and retrieve chunks, it searches inside of the vector database to find semantic match.

So, what a vector record (the data entry) in a vector database looks like?

What’s Inside a Vector Record?

A vector record has three parts:

1. Unique ID A label that helps you quickly find the original content, which is stored separately.

2. The Vector A list of numbers that represents your content’s meaning (like a mathematical “fingerprint”). For example, text might become a list of 768 numbers.

Key rule: All vectors in the same collection must have the same length – you can’t mix different sizes.

3. Metadata Extra tags that add context, including:
- Platform info: Creation date, source, importance level
- Content details: Title, author, publication info
- Chunk info: Details about this specific piece of content
This structure lets you search by both meaning (vectors) and context (metadata) for more precise results. (Vector databases and metadata filtering)

How RAG Search Combines Vector Matching with Metadata Filtering

RAG (Retrieval-Augmented Generation) search combines vector similarity with metadata filtering to make your results both relevant and contextually appropriate. The RAG framework was first introduced by researchers at Meta in 2020 (see the original paper)::

Vector Similarity Matching When you ask a question, the system converts your question into a vector embedding (that same list of numbers we discussed). Then it searches the database for content vectors that are mathematically similar to your question vector. Think of it like finding documents that “mean” similar things to what you’re asking about.

Metadata Context Enhancement The system enhances similarity matching by also considering metadata context. These metadata filters can be set by users (when they specify requirements) or automatically by the system (based on context clues in the query). The system considers:
- Time relevance: “Only show me recent information from 2023 or later”
- Source credibility: “Only include content from verified authors or trusted platforms”
- Content type: “Focus on technical documentation, not blog posts”
- Geographic relevance: “Prioritize information relevant to my location”
This combined approach is also more efficient – metadata filtering can quickly eliminate irrelevant content before expensive similarity calculations.

The Combined Power Instead of getting thousands of somewhat-related results, you get a curated set of content that is both:
1. Semantically similar (the vector embeddings match your question’s meaning)
2. Contextually appropriate (the metadata ensures it meets your specific requirements)
For example, when you ask “How do I optimize database performance?” the system finds semantically similar content, then prioritizes results that match your context – returning recent technical articles by database experts while filtering out outdated blog posts or marketing content. You get the authoritative, current information you need.

What This Means for Content Creators

Understanding how metadata works in RAG systems reveals a crucial opportunity for content creators. Among the three types of metadata stored in vector databases, only one is truly under your control:

Automatically Generated Metadata:
- Chunk metadata: Created during content processing (chunk size, position, relationships)
- Platform metadata: Added by publishing systems (creation date, source URL, file type)
Creator-Controlled Metadata:
- Universal metadata: The contextual information you can strategically add to improve intent alignment
This is where you can make the biggest impact. By enriching your content with universal metadata, you help RAG systems understand not just what your content says, but who it’s for and how it should be used:
- Intended audience: “developers,” “business stakeholders,” “end users”
- Role requirements: “database administrator,” “product manager,” “customer support”
- Skill level: “beginner,” “intermediate,” “expert”
- Product version: “v2.1,” “legacy,” “beta”
- Customer intent: “troubleshooting,” “implementation,” “evaluation”
When you provide this contextual metadata, you’re essentially helping RAG systems deliver your content to the right person, at the right time, for the right purpose. The technical foundation we’ve explored – vector similarity plus metadata filtering – becomes much more powerful when content creators take advantage of universal metadata to improve intent alignment.

Your content doesn’t just need to be semantically relevant; it needs to be contextually perfect. Universal metadata is how you achieve that precision.
September 8, 2025
Reducing Hallucinations in Generative AI—What Content Creators Can Actually Do
If you’ve used ChatGPT or Copilot and received an answer that sounded confident but was completely wrong, you’ve experienced a hallucination. These misleading outputs are a known challenge in generative AI—and while some causes are technical, others are surprisingly content-driven.

As a content creator, you might think hallucinations are out of your hands. But here’s the truth: you have more influence than you realize.

Let’s break it down.

The Three Types of Hallucinations (And Where You Fit In)

Generative AI hallucinations typically fall into three practical categories. (Note: Academic research classifies these as “intrinsic” hallucinations that contradict the source/prompt, or “extrinsic” hallucinations that add unverifiable information. Our framework translates these concepts into actionable categories for content creators.)
1. Nonsensical Output
  The AI produces content that’s vague, incoherent, or just doesn’t make sense.
  Cause: Poorly written or ambiguous prompts.
  Your Role: Help users write better prompts by providing examples, templates, or guidance.
2. Factual Contradiction
  The AI gives answers that are clear and confident—but wrong, outdated, or misleading.
  Cause: The AI can’t find accurate or relevant information to base its response on.
  Your Role: Create high-quality, domain-specific content that’s easy for AI to find and understand.
3. Prompt Contradiction
  The AI’s response contradicts the user’s prompt, often due to internal safety filters or misalignment.
  Cause: Model-level restrictions or misinterpretation.
  Your Role: Limited—this is mostly a model design issue.
Where Does AI Get Its Information?

Where Does AI Get Its Information?

Modern AI systems increasingly use RAG (Retrieval-Augmented Generation) to ground their responses in real data. Instead of relying solely on training data, they actively search for and retrieve relevant content before generating answers. Learn more about how AI discovers and synthesizes content.

Depending on the system, AI pulls data from:
- Internal Knowledge Bases (e.g., enterprise documentation)
- The Public Web (e.g., websites, blogs, forums)
- Hybrid Systems (a mix of both)
If your content is published online, it becomes part of the “source of truth” that AI systems rely on. That means your work directly affects whether AI gives accurate answers—or hallucinates.

The Discovery–Accuracy Loop

Here’s how it works:
- If AI can’t find relevant content → it guesses based on general training data.
- If AI finds partial content → it fills in the gaps with assumptions.
- If AI finds complete and relevant content → it delivers accurate answers.
So what does this mean for you?

Your Real Impact as a Content Creator

You can’t control how AI is trained, but you can control two critical things:
1. The quality of content available for retrieval
2. The likelihood that your content gets discovered and indexed
And here’s the key insight:

This is where content creators have the greatest impact—by ensuring that content is not only high-quality and domain-specific, but also structured into discoverable chunks that AI systems can retrieve and interpret accurately.

Think of it like this: if your content is buried in long paragraphs, lacks clear headings, or isn’t tagged properly, AI might miss it—or misinterpret it. But if it’s chunked into clear, well-labeled sections, it’s far more likely to be picked up and used correctly. This shift from keywords to chunks is fundamental to how AI indexing differs from traditional search.

Actionable Tips for AI-Optimized Content

Structure for Chunking
- Use clear, descriptive headings that summarize the content below them
- Write headings as questions when possible (“How does X work?” instead of “X Overview”)
- Keep paragraphs focused on single concepts (3–5 sentences max)
- Create semantic sections that can stand alone as complete thoughts
- Include Q&A pairs for common queries—this mirrors how users interact with AI
- Use bullet points and numbered lists to break down complex information
Improve Discoverability
- Front-load key information in each section—AI often prioritizes early content
- Define technical terms clearly within your content, not just in glossaries
- Include contextual metadata through schema markup and structured data
- Write descriptive alt text for images and diagrams
Enhance Accuracy
- Date your content clearly, especially for time-sensitive information
- Link related concepts within your content to provide context
- Be explicit about scope —what your content covers and what it doesn’t
Understand Intent Alignment

AI systems are evolving to focus more on intent than just keyword matching. That means your content should address the “why” behind user queries—not just the “what.”

Think about the deeper purpose behind a search. Are users trying to solve a problem? Make a decision? Learn a concept? Your content should reflect that.

The Bottom Line

As AI continues to evolve from retrieval to generative systems, your role as a content creator becomes more critical—not less. By structuring your content for AI discoverability and comprehension, you’re not just improving search rankings; you’re actively reducing the likelihood that AI will hallucinate when answering questions in your domain.

So the next time you create or update content, ask yourself:

“Can an AI system easily find, understand, and accurately use this information?”

If the answer is yes, you’re part of the solution.
September 2, 2025
From retrieval to generative: How search evolution changes the future of content

Back in 2018, I wrapped up a grueling 10-course Data Science specialization with a capstone project: an app that predicted the next word based on user input. Mining text, calculating probabilities, generating predictions—the whole works. Sound familiar?

Fast forward to today, and I’m at Microsoft exploring how this same technology is reshaping content creation from an instructional design perspective—how do we create content that works for both human learning and AI systems?

Since ChatGPT exploded in November 2022, everyone’s talking about “AI-ready content.” But here’s what I kept wondering: why do we need chunk-sized content? Why do Metadata and Q&A formats suddenly matter more?

The Fundamental Shift: From Finding to Generating

Goals of traditional search vs. goals of Generative AI search

When you search in pre-AI time, the system is trying to find answers to your questions. It crawls the web, indexes content by keywords, and returns a list of links ranked by relevance. Your experience as a user? “I need to click through several links to piece together what I'm looking for.“

Generative AI search changes the search experience entirely. Instead of just finding existing content, it aims to generate new content tailored to your specific prompt. The result isn’t a list of links – it’s a synthesized, actionable response that directly answers your question. The user experience becomes: “I get actionable solutions, instantly.“

This isn’t a minor improvement – it’s a different paradigm.

_{Note: I’m simplifying the distinction for clarity, but the divide between “traditional search” and “generative AI search” isn’t as clear-cut as I’m describing. Even before November 2022, search engines were incorporating AI techniques like Google’s RankBrain (2015) and BERT (2019). What’s different now is the shift toward generating new content rather than just finding and ranking existing content.}

How the search processes actually work

As I’ve been studying this, I realized I didn’t fully understand how different the underlying processes really are. Let me break down what I’ve learned:

How does traditional search work?

Looking under the hood, traditional search follows a pretty straightforward path: bots crawl the internet, break content down into individual words and terms that get stored with document IDs, then match your search keywords against those indexed terms. Finally, relevance algorithms rank everything and serve up that familiar list of blue links.

Traditional web search process

How does generative AI search work?

This is where it gets fascinating (and more complex). While AI systems start the same way by scanning content across the internet, everything changes at the indexing stage.

Instead of cataloging keywords, AI breaks content into meaningful chunks and creates “vector embeddings,” which are essentially mathematical representations of meaning. The system then builds real-time connections and relationships between concepts, creating a web of understanding rather than just a keyword database.

When you ask a question, the AI finds relevant chunks based on meaning, not just keyword matches. Finally, instead of handing you links to sort through, AI synthesizes information from multiple sources to create a new, personalized response tailored to your specific question.

Generative AI index process

The big realization for me was that while traditional search treats your query as a collection of words to match, AI is trying to understand what you actually want to do.

What does this difference look like in practice?

Let’s see how this works with a simplified example:

Say we have three documents about computer performance:

Document 1: “Upgrading your computer's hard drive to a solid-state drive (SSD) can dramatically improve performance. SSDs provide faster boot times and quicker file access compared to traditional drives.“

Document 2: “Slow computer performance is often caused by too many programs running simultaneously. Close unnecessary background programs and disable startup applications to fix speed issues.“

Document 3: “Regular computer maintenance prevents performance problems. Clean temporary files, update software, and run system diagnostics to keep your computer running efficiently.“

Now someone searches: “How to make my computer faster?“

Traditional search breaks the question down into keywords like “make,” “computer,” and “faster,” then returns a ranked list of documents that contain those terms. You’d get some links to click through, and you’d have to piece together the answer yourself.

But generative AI understands you want actionable instructions and synthesizes information from all three sources into a comprehensive response: “Here are three approaches you can try: First, close unnecessary programs running in the background... Second, consider upgrading to an SSD for dramatic performance improvements... Third, maintain your system regularly by cleaning temporary files and updating software...“

How have the goals of content creation evolved?

This shift has forced me to rethink what “good content” even means. As a content creator and a learning developer, I used to focus primarily on content quality (accurate, clear, complete, fresh, accessible) and discoverability (keywords, clear headings, good formatting, internal links).

Now that generative AI is here, these fundamentals still matter, but there’s a third crucial goal: reducing AI hallucinations. When AI systems generate responses, they sometimes create information that sounds plausible but is actually incorrect or misleading. The structure and clarity of our source content plays a big role in whether AI produces accurate or fabricated information.

Goals of content creation for traditional search vs. for Generative AI search

Why this shift matters for content creators?

What surprised me most in my research was discovering that AI systems understand natural language better because large language models were trained on massive amounts of conversational data. This realization has already started changing how I create content—I’m experimenting with question-based headings and making sure each section focuses on one distinct topic.

But I’m still figuring out the bigger question: how do we measure whether these strategies work? How can we tell if our conversational language and Q&A formats truly help AI systems match user intent and generate better responses?

In my next post, I want to show you what I discovered when I dug into the technical details. The biggest eye-opener for me was realizing that when traditional searches remove “filler” words like “how to” from a user’s query, it’s stripping away crucial intent—the user wants actionable instructions, not just information.

The field is moving incredibly fast, and best practices are still being figured out by all of us. I’m sharing what I’ve learned so far, knowing that some of it might evolve as technology does.

August 12, 2025