▎ Research foundation · 1 peer-reviewed paper · 4,500 words · 2026-05-01 ← bestaeoskill.com
▎ Research Foundation

The peer-reviewed science behind Generative Engine Optimization.

Most GEO/AEO content on the internet is opinion, anecdote, or vendor speculation. This page is different. It's the canonical research foundation for best-aeo-skill, focused on the one peer-reviewed paper that established GEO as a research field — and how every scoring weight in our skill traces back to it.

Primary paperAggarwal et al. (2024)
Tested queries10,000
Validated tactics9
Headline impact+115% citations
▎ The State of GEO Research

Generative Engine Optimization (GEO) is a young field. The term was formally introduced in November 2023 with the arXiv preprint of "GEO: Generative Engine Optimization" by a Princeton-led team, and presented at KDD 2024 — the Association for Computing Machinery's premier data science conference. Before that paper, the entire literature on optimizing for AI-generated answers was practitioner blog posts and vendor whitepapers.

The Princeton paper changed that. It formalized GEO as a measurable discipline by:

  • Building GEO-bench, a 10,000-query benchmark spanning 9 domains (legal, history, science, business, etc.)
  • Defining Position-Adjusted Word Count (PAWC) and Subjective Impression as standardized citation-quality metrics
  • Empirically testing 9 distinct optimization tactics against this benchmark
  • Measuring per-tactic and per-domain effects with statistical rigor

Two years later (2026), the paper has accumulated hundreds of citations and remains the only widely-cited peer-reviewed work that quantifies which tactics actually move the needle on AI citation rates. Every other "GEO study" you'll see on the internet — from agencies, vendors, or commenters — either cites this paper or makes uncalibrated claims.

That's why best-aeo-skill operationalizes Princeton specifically. When the user asks "will this work?" — we can point to peer review, not anecdote.

▎ Princeton KDD 2024 — Deep Dive
▎ Primary citation
"GEO: Generative Engine Optimization"
Aggarwal, P., Murahari, V., Rajpurohit, T., Kalyan, A., Narasimhan, K., Deshpande, A.
Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '24) · ACM, August 2024

The setup

The team built GEO-bench: 10,000 user queries across 9 domains. For each query, they generated a baseline response using a generative engine (a synthesized answer with cited sources). Then they applied each of 9 candidate optimization tactics to the source content and re-ran the query — measuring whether the modified source got more visibility in the new synthesized response.

"Visibility" was operationalized two ways:

  • Position-Adjusted Word Count (PAWC) — how much of the synthesized answer is sourced from this page, weighted by where in the answer it appears (top-of-answer = higher weight)
  • Subjective Impression — judges' rating of how prominently the source is featured in the response

Both metrics moved together for most tactics. The paper reports composite "visibility uplift" percentages, which we use throughout this site.

The headline result

The single most surprising finding from the paper:

Source emphasis — the simple act of bolding citations or framing them prominently — increased citation likelihood by +115%. This was the strongest effect of any tactic tested. Aggarwal et al., 2024 — Section 5.2

Two implications:

  • You can more than double your AI citation rate with formatting alone, no new content needed
  • Most existing content on the internet under-emphasizes its sources, which is why "everyone" is dissatisfied with their AI search performance

The paper also identified two negative findings: tactics that reduce visibility. Keyword-stuffing was the most prominent — confirming that the same tactic that hurts in modern Google also hurts in generative engines, possibly more aggressively.

▎ The 9 Validated Tactics · measured impact
# Tactic Description Visibility impact
1Source emphasisBold or otherwise emphasize cited sources, references, attribution.+115%
2Expert quotesAdd 2-4 attributed quotations per ~1000 words. Use quotation marks with speaker name.+41%
3StatisticsAdd numeric claims with sources. Target ~1 stat per 200 words.+40%
4Inline citationsReference primary sources at the point of claim, not only at the bottom.+30%
5Authority signalingCredential markup, named contributors, institutional affiliation.+25%
6Improved fluencyNatural language; reduced formulaic phrasing; varied sentence length.+15%
7Easy-to-readFlesch-Kincaid grade 8-10. Higher (academic) loses general AI; lower (oversimplified) loses authority.+12%
8Topic relevanceOne primary topic per page. Avoid multi-topic mash-up content.+10%
9Keyword stuffingStuffing the page with target keywords.-22%
▎ How best-aeo-skill operationalizes Princeton · 9 tactics → 9 measurable signals

A research paper is just text until someone implements it. We built best-aeo-skill as a one-to-one operationalization. Each Princeton tactic maps to a specific evidence collector in our scoring engine, and each collector maps to a numbered Rule in SKILL.md:

Source emphasis (+115%)
citation_check
Rule 12, 15
Expert quotes (+41%)
quote_extractor
Rule 13, 14
Statistics (+40%)
statistic_density
Rule 11
Inline citations (+30%)
citation_check
Rule 12
Authority signaling (+25%)
author_check
Rule 41, 56, 57
Improved fluency (+15%)
fluency_check
Rule 19, 20, 21
Easy-to-read (+12%)
readability
Rule 19
Topic relevance (+10%)
passage_score
Rule 35
Keyword stuffing (-22%)
hedge_density
Rule 91

When you run bestaeo audit, each finding the skill returns is grounded in this map. If a finding says "Add expert quotes — projected +12 GEO score," you can trace it to quote_extractor → Rule 13 → Aggarwal et al., 2024, Section 5.2, Tactic 2. No invented metrics.

▎ Industry Empirical Data · 2026 figures we track

Beyond the Princeton paper, the field generates ongoing empirical data from industry sources. We track the most useful figures and update our scoring weights when reliable measurements appear:

25.11%
Google searches triggering AI Overviews (Q1 2026)
87%
AI referral traffic via ChatGPT
10.13%
Domains with /llms.txt
3.2×
More citations for content under 30 days

Sources we cite

  • SE Ranking — audited 300,000 domains for llms.txt presence (Q1 2026); reports 10.13% adoption.
  • Superlines — quarterly tracking of Google AI Overview trigger rates; up from 13.14% in March 2025 to 25.11% in Q1 2026.
  • Position.digital — analysis of AI referral traffic distribution across engines; ChatGPT dominates at 87%.
  • HubSpot — case studies showing 6× AI-referred trial uplift within 7 weeks of consistent optimization.
  • OpenAI usage reports — ChatGPT WAU 900M, monthly visits 5.72B (2026).
  • SimilarWeb — zero-click search rate tracking; 43% in standard mode, 93% with AI Mode active.

None of these are peer-reviewed in the academic sense, but they are traceable empirical figures from organizations whose business depends on the data being accurate. We treat them as Tier-2 citations: useful, but explicitly marked as industry data, not peer-reviewed research.

▎ best-aeo-skill methodology · composite scoring + confidence labels

The 4-vector composite

The Princeton tactics cluster into four orthogonal vectors. We weight them based on what's most actionable for the typical site:

  • Technical Accessibility (20%) — robots.txt, AI bot allowance, JS rendering. If crawlers can't reach you, prose doesn't matter.
  • Content Citability (35%) — statistic density, expert quotes, citations, freshness. The single biggest weight, because Princeton's strongest tactics live here.
  • Structured Data (20%) — FAQPage, Article, Organization, HowTo, Speakable. Beyond Princeton, but empirically high-leverage for AI Overviews and Perplexity.
  • Entity & Brand Signals (25%) — author credentials, Knowledge Graph linking, NAP consistency. Sustained citation requires entity presence, not just one-off content quality.

Weights adapt to your business profile (SaaS, e-commerce, publisher, local, agency, devtools, academic, default). A SaaS landing isn't audited like a news article; the Schema vector matters more for SaaS, Citability matters more for publishers.

Confidence labels

Every finding output by the skill carries one of three labels:

  • Confirmed — directly observed by an evidence collector. Example: parse_html.py returned no <title> tag.
  • Likely — inferred from ≥2 collectors that agree. Example: schema_validate found no FAQPage AND quote_extractor detected Q&A patterns.
  • Hypothesis — LLM judgment or single weak signal. Always flagged for human review.

This is the anti-hallucination guarantee: no recommendation is ever presented without a label. If a tool tells you "fix this" without saying how confident it is — be skeptical.

Score bands

  • 86-100 Excellent · cited frequently · maintain freshness
  • 68-85 Good · regular citation, gaps to fix · apply top-3 fixes
  • 36-67 Foundation · indexed but rarely cited · run full audit, fix everything
  • 0-35 Critical · effectively invisible · fix Technical and Schema first, then content

Below 36, a low score is almost always a technical or schema problem, not a content problem. The audit's recommended action ordering reflects this.

▎ Bibliography
[1] Aggarwal, P., Murahari, V., Rajpurohit, T., Kalyan, A., Narasimhan, K., Deshpande, A. (2024). "GEO: Generative Engine Optimization." Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '24). arxiv:2311.09735 · doi:10.1145/3637528.3671900
[2] SE Ranking (2026). "llms.txt Adoption Audit: 300,000 Domains." SE Ranking Research Q1 2026.
[3] Superlines (2026). "AI Search Statistics 2026: Quarterly Tracking." Superlines Research Q1 2026.
[4] Position.digital (2026). "AI Referral Traffic Distribution: Engine-by-Engine." Position Industry Report 2026.
[5] HubSpot (2026). "Answer Engine Optimization Case Studies." HubSpot Marketing Blog 2026.
[6] OpenAI (2026). "ChatGPT Usage Disclosure 2026." OpenAI public reports.
[7] SimilarWeb (2026). "Zero-Click Search Rate Tracking." SimilarWeb Research 2026.
[8] Schema.org Community (2026). "Schema.org Vocabulary Specification." schema.org. schema.org
[9] llmstxt.org (2024). "llms.txt Specification." llmstxt.org. llmstxt.org
[10] Google Search Central (2026). "AI Overviews Documentation." Google Developers. developers.google.com
▎ Next steps