How LLM citations work: what ChatGPT, Claude, and Perplexity do (and don't) cite
A per-engine breakdown of citation mechanics. Which sources are used, what makes content citable, and concrete examples of cited vs ignored sites.
If you understand how ChatGPT, Claude, and Perplexity work internally, you'll see why one site gets cited and another gets ignored. The differences are bigger than you'd think. What Perplexity loves, ChatGPT often skips. What Claude grabs, Gemini doesn't see.
Here's what we see per engine in our audits and in public research.
The basic mechanism
Every AI search engine works in three steps:
- Retrieval — a query hits an index, which returns 10-50 documents
- Re-ranking — the top N (often 3-10) is selected for relevance
- Synthesis — the model reads those N documents and writes one answer, with citation links
The model only cites what survived step 2. So your fight is: get into that top N.
Which index, which re-ranker, which model — that's where the differences live.
ChatGPT (OpenAI)
Index: Bing search index, plus its own real-time fetches via ChatGPT-User
User-agents:
GPTBot— training (not for citations)OAI-SearchBot— builds the search indexChatGPT-User— fetches in real time during a chat
What ChatGPT cites readily:
- Wikipedia, Reddit, YouTube (in order of frequency per 2026 data)
- Sites with strong Bing rankings for the query
- Sites with FAQPage schema and clear Q&A structure
- Recent content (last 6-12 months for news/tech queries)
What ChatGPT skips:
- Sites that block
OAI-SearchBotorChatGPT-User - Pages without a clear H1 or definition sentence
- JavaScript-rendered content without server-side rendering
- Very long articles (>3000 words) — usually only the opening is taken
Notable detail: ChatGPT's Reddit citation share fell from ~60% to ~10% in six weeks late 2025, after a single Bing parameter change. PR Newswire, Forbes, and Medium absorbed the displaced share. Citations are volatile — expect weeks-to-months volatility, not years.
Claude (Anthropic)
Index: own search layer (since 2025), with Brave as a fallback in some contexts
User-agents:
ClaudeBot— trainingClaude-SearchBot— search index for Claude.aiClaude-User— real-time fetch during a user query
What Claude cites readily:
- Deep, technically grounded content
- Documentation and how-tos (developer content scores high)
- Academic sources and whitepapers
- Sites with clear author bylines and bios
- Long-form content with strong structure (Claude reads whole documents better than ChatGPT)
What Claude skips:
- Marketing fluff without concrete claims
- Sites without named authors
- Pages with thin content or low information density
- Robots.txt blocked for Claude bots
Notable detail: Claude is heavily used by developers and B2B users. For SaaS and developer tooling, a Claude citation can be more valuable than a ChatGPT citation in terms of intent.
Perplexity
Index: own index (primary) + Bing and Google APIs as supplements
User-agents:
PerplexityBot— primary crawlerPerplexity-User— real-time fetch
What Perplexity cites readily:
- Recent content (Perplexity weighs freshness more than other engines)
- Sites with strong topical authority
- News and research sources
- Structured data (tables, lists, comparisons)
- Niche sources that directly answer a specific question
What Perplexity skips:
- Outdated content without a clear date
- Vague marketing content
- Pure SEO-spam pages
Compliance caveat: Perplexity has been caught multiple times ignoring robots.txt via stealth crawlers (Wired, August 2024 investigation). In 2026 Perplexity is more transparent, but if you really only want the primary crawler in: use Cloudflare WAF with user-agent validation, not just robots.txt.
Google AI Overviews
Index: Google's search index (same as organic)
What AI Overviews cites readily:
- Reddit (#1 with 2.2% of all citations)
- YouTube (#2)
- Quora, LinkedIn, Wikipedia
- Sites that rank in the top 10 for the related query
- Content with strong E-E-A-T signals (author, expertise, authority, trust)
Notable detail: 88% of AI Overviews cite 3+ sources, only 1% cite a single source. The top 20 domains capture 66% of all citations. Heavy concentration.
Gemini
Index: Google's search index
Heavy overlap with Google AI Overviews on sourcing. Gemini additionally pulls YouTube transcripts and Google Books where available.
Bing Copilot
Index: Bing
Copilot cites similarly to ChatGPT (logical — same index), but weighs official-domain sources (gov, edu, owned domains) heavier.
What actually makes content citable — patterns that work everywhere
After 100+ audits, four patterns hold across all engines:
1. Definition-first structure
Open every page with a direct definition or answer. Not "in this article..." but "X is Y that does Z, because...". Models surprisingly often lift this opening sentence verbatim.
2. Concrete claim with source
Vague claims get paraphrased or thrown out. Concrete claims with a source get cited. Compare:
❌ "Many people use AI search engines" ✅ "ChatGPT processes 250-500 million weekly queries (Similarweb, Q1 2026)"
3. Named author with external presence
Anonymous content or "team Priso" doesn't work. An author with a name, title, and LinkedIn or Wikipedia link gets weighted heavier. This signal comes straight from Google's E-E-A-T research and has been picked up by most AI systems.
4. Consistency across the web
Models build entity graphs. If your brand is "Priso" on your site, "Priso BV" on LinkedIn, and "Priso AI" in a news article, that confuses the model. Keep your name, description, and URL consistent across structured data, social profiles, and PR.
Anti-cloaking: serve the same content to bots and humans
A mistake we see too often: sites showing "AI-friendly" content to bots and commercial versions to users. Or vice versa. Google, Anthropic, and OpenAI all penalize cloaking. Keep it simple: same HTML for every visitor, server-side rendered where possible, no hidden text.
Examples from practice
Cited in our test queries (Priso audit data):
- A Dutch SaaS startup with question-format blog posts, FAQ schema, and a founder byline with LinkedIn link → consistently cited in ChatGPT and Claude for "X tool Netherlands" queries
- An e-commerce site with detailed product specs and structured comparison tables → showing up in Perplexity for "best X" searches
Not cited:
- A large Dutch retailer with strong organic SEO but no author info, with a fully client-side rendered homepage → ignored by all AI engines
- A blog-heavy SaaS site that accidentally blocks GPTBot and ClaudeBot via Cloudflare → invisible to ChatGPT and Claude, while Perplexity still cites
The second case is fixable in 5 minutes. The first is a multi-week project.
What to do today
- Test your site on 5 main queries in ChatGPT, Claude, and Perplexity. Are you cited?
- Check your robots.txt: open your site to
OAI-SearchBot,Claude-SearchBot,PerplexityBot - Apply definition-first structure to your top 5 pages
- Add author bios with LinkedIn link
- Run an audit on Priso — we test all of the above signals automatically
Check your citation readiness in 60 seconds
Check your citation readiness in 60 secondsFAQ
How often does ChatGPT update its index? Continuously, but visible effects of site changes typically appear within 1-3 weeks for popular pages. Less-trafficked pages can take 4-8 weeks.
Does a citation always lead to a click? No. Many citations don't generate a click (zero-click answers). But the brand mention itself has value — comparable to a mention in a traditional outlet.
Does it work for non-English content? Yes. All major engines speak fluent Dutch, German, French, Spanish. ChatGPT and Claude score well on nuance; Perplexity is more literal but fine.
Should I optimize per AI engine separately? 80% of signals work everywhere. The remaining 20% is engine-specific (e.g. freshness weighs heavier on Perplexity, structured data heavier on Google AI Overviews).
Written by Richard van Leeuwen, founder of Priso. Working with AI tooling since 2022, shipping AI agents in production since 2024.