Your robots.txt file may be hiding your site from Gemini, and you would never know it from your search rankings. Google runs two distinct crawler families, and a single misplaced directive will quietly cut you off from one without affecting the other.
Two Crawler Families, One Domain
Google operates a family of crawlers[1]:
- Googlebot indexes pages for Google Search
- Google-Extended handles AI training and Gemini real-time answers[2]
- AdsBot evaluates landing page quality for Google Ads
- APIs-Google serves Google APIs and internal products
Google-Extended was introduced in September 2023 as a dedicated token for AI training access, independent of search indexing.
How They Differ
Googlebot crawls for search relevance, while Google-Extended crawls for comprehension. It trains Gemini's models and powers AI Overviews. The key distinction is that they respond to different robots.txt directives. A rule targeting Googlebot does not apply to Google-Extended, and vice versa.
How to Control Access
Block only Google-Extended (keep search, opt out of AI):
``
User-agent: Google-Extended
Disallow: /
``
Block all (removes from both search and AI):
``
User-agent: *
Disallow: /
``
Common Mistakes
- Accidentally blocking Google-Extended via a catch-all
User-agent: *rule without realizing it - Assuming that blocking Google-Extended only affects training, when it also affects Gemini's real-time answers
- Using
noindexthinking it only affects search (it affects all Google crawlers)
How Scanner Helps
Scanner audits your robots.txt for unintentional Google-Extended blocks. See also How Agent Crawlers Work.