How LLM citations work: what ChatGPT, Claude, and Perplexity do (and don't) cite

A per-engine breakdown of citation mechanics. Which sources are used, what makes content citable, and concrete examples of cited vs ignored sites.

Written by

Richard van Leeuwen

Founder of Priso. 30+ years of web dev and e-commerce, full-time AI tools since 2022.

6 May 2026·6 min read

If you understand how ChatGPT, Claude, and Perplexity work internally, you'll see why one site gets cited and another gets ignored. The differences are bigger than you'd think. What Perplexity loves, ChatGPT often skips. What Claude grabs, Gemini doesn't see.

Here's what we see per engine in our audits and in public research.

The basic mechanism

Every AI search engine works in three steps:

Retrieval — a query hits an index, which returns 10-50 documents
Re-ranking — the top N (often 3-10) is selected for relevance
Synthesis — the model reads those N documents and writes one answer, with citation links

The model only cites what survived step 2. So your fight is: get into that top N.

Which index, which re-ranker, which model — that's where the differences live.

ChatGPT (OpenAI)

Index: Bing search index, plus its own real-time fetches via ChatGPT-User

User-agents:

GPTBot — training (not for citations)
OAI-SearchBot — builds the search index
ChatGPT-User — fetches in real time during a chat

What ChatGPT cites readily:

Wikipedia, Reddit, YouTube (in order of frequency per 2026 data)
Sites with strong Bing rankings for the query
Sites with FAQPage schema and clear Q&A structure
Recent content (last 6-12 months for news/tech queries)

What ChatGPT skips:

Sites that block OAI-SearchBot or ChatGPT-User
Pages without a clear H1 or definition sentence
JavaScript-rendered content without server-side rendering
Very long articles (>3000 words) — usually only the opening is taken

Notable detail: ChatGPT's Reddit citation share fell from ~60% to ~10% in six weeks late 2025, after a single Bing parameter change. PR Newswire, Forbes, and Medium absorbed the displaced share. Citations are volatile — expect weeks-to-months volatility, not years.

Claude (Anthropic)

Index: own search layer (since 2025), with Brave as a fallback in some contexts

User-agents:

ClaudeBot — training
Claude-SearchBot — search index for Claude.ai
Claude-User — real-time fetch during a user query

What Claude cites readily:

Deep, technically grounded content
Documentation and how-tos (developer content scores high)
Academic sources and whitepapers
Sites with clear author bylines and bios
Long-form content with strong structure (Claude reads whole documents better than ChatGPT)

What Claude skips:

Marketing fluff without concrete claims
Sites without named authors
Pages with thin content or low information density
Robots.txt blocked for Claude bots

Notable detail: Claude is heavily used by developers and B2B users. For SaaS and developer tooling, a Claude citation can be more valuable than a ChatGPT citation in terms of intent.

Perplexity

Index: own index (primary) + Bing and Google APIs as supplements

User-agents:

PerplexityBot — primary crawler
Perplexity-User — real-time fetch

What Perplexity cites readily:

Recent content (Perplexity weighs freshness more than other engines)
Sites with strong topical authority
News and research sources
Structured data (tables, lists, comparisons)
Niche sources that directly answer a specific question

What Perplexity skips:

Outdated content without a clear date
Vague marketing content
Pure SEO-spam pages

Compliance caveat: Perplexity has been caught multiple times ignoring robots.txt via stealth crawlers (Wired, August 2024 investigation). In 2026 Perplexity is more transparent, but if you really only want the primary crawler in: use Cloudflare WAF with user-agent validation, not just robots.txt.

Google AI Overviews

Index: Google's search index (same as organic)

What AI Overviews cites readily:

Reddit (#1 with 2.2% of all citations)
YouTube (#2)
Quora, LinkedIn, Wikipedia
Sites that rank in the top 10 for the related query
Content with strong E-E-A-T signals (author, expertise, authority, trust)

Notable detail: 88% of AI Overviews cite 3+ sources, only 1% cite a single source. The top 20 domains capture 66% of all citations. Heavy concentration.

Gemini

Index: Google's search index

Heavy overlap with Google AI Overviews on sourcing. Gemini additionally pulls YouTube transcripts and Google Books where available.

Bing Copilot

Index: Bing

Copilot cites similarly to ChatGPT (logical — same index), but weighs official-domain sources (gov, edu, owned domains) heavier.

What actually makes content citable — patterns that work everywhere

After 100+ audits, four patterns hold across all engines:

1. Definition-first structure

Open every page with a direct definition or answer. Not "in this article..." but "X is Y that does Z, because...". Models surprisingly often lift this opening sentence verbatim.

2. Concrete claim with source

Vague claims get paraphrased or thrown out. Concrete claims with a source get cited. Compare:

❌ "Many people use AI search engines" ✅ "ChatGPT processes 250-500 million weekly queries (Similarweb, Q1 2026)"

3. Named author with external presence

Anonymous content or "team Priso" doesn't work. An author with a name, title, and LinkedIn or Wikipedia link gets weighted heavier. This signal comes straight from Google's E-E-A-T research and has been picked up by most AI systems.

4. Consistency across the web

Models build entity graphs. If your brand is "Priso" on your site, "Priso BV" on LinkedIn, and "Priso AI" in a news article, that confuses the model. Keep your name, description, and URL consistent across structured data, social profiles, and PR.

Anti-cloaking: serve the same content to bots and humans

A mistake we see too often: sites showing "AI-friendly" content to bots and commercial versions to users. Or vice versa. Google, Anthropic, and OpenAI all penalize cloaking. Keep it simple: same HTML for every visitor, server-side rendered where possible, no hidden text.

Examples from practice

Cited in our test queries (Priso audit data):

A Dutch SaaS startup with question-format blog posts, FAQ schema, and a founder byline with LinkedIn link → consistently cited in ChatGPT and Claude for "X tool Netherlands" queries
An e-commerce site with detailed product specs and structured comparison tables → showing up in Perplexity for "best X" searches

Not cited:

A large Dutch retailer with strong organic SEO but no author info, with a fully client-side rendered homepage → ignored by all AI engines
A blog-heavy SaaS site that accidentally blocks GPTBot and ClaudeBot via Cloudflare → invisible to ChatGPT and Claude, while Perplexity still cites

The second case is fixable in 5 minutes. The first is a multi-week project.

What to do today

Test your site on 5 main queries in ChatGPT, Claude, and Perplexity. Are you cited?
Check your robots.txt: open your site to OAI-SearchBot, Claude-SearchBot, PerplexityBot
Apply definition-first structure to your top 5 pages
Add author bios with LinkedIn link
Run an audit on Priso — we test all of the above signals automatically

Check your citation readiness in 60 seconds

FAQ

How often does ChatGPT update its index? Continuously, but visible effects of site changes typically appear within 1-3 weeks for popular pages. Less-trafficked pages can take 4-8 weeks.

Does a citation always lead to a click? No. Many citations don't generate a click (zero-click answers). But the brand mention itself has value — comparable to a mention in a traditional outlet.

Does it work for non-English content? Yes. All major engines speak fluent Dutch, German, French, Spanish. ChatGPT and Claude score well on nuance; Perplexity is more literal but fine.

Should I optimize per AI engine separately? 80% of signals work everywhere. The remaining 20% is engine-specific (e.g. freshness weighs heavier on Perplexity, structured data heavier on Google AI Overviews).

Written by Richard van Leeuwen, founder of Priso. Working with AI tooling since 2022, shipping AI agents in production since 2024.