Googlebot can’t crawl A technical, practical guide for internet disruptions (with a special warning for Iran-hosted sites)
If you’ve ever watched your rankings collapse right after an internet disruption, you already know the feeling: it’s what happens when Googlebot can’t crawl your site.
- Your site loads for you.
- Customers inside the country can sometimes access it.
- But Google traffic falls hard, impressions disappear, and keyword positions become chaotic.
Then someone says, “Don’t worry, it’ll come back when the internet is stable.”
Sometimes it does. But when it doesn’t, the reason is usually not “SEO quality” or a mysterious penalty.
It’s simple:
Google can’t reliably reach your site.
And when Google can’t crawl, it can’t maintain trust in what it has indexed.
This post explains what actually happens in those situations—especially for sites hosted in Iran—why the symptoms show up the way they do in Search Console, and what a real recovery plan looks like (not just “wait and publish more content”).
This is also directly connected to the core strategy we use across this site:
Indexing is not guaranteed. Eligibility is earned.
If your site is not consistently crawlable, it is not consistently rankable.
The core truth: when Googlebot can’t crawl rankings don’t fall first — crawl reliability falls first
Most people think Google ranks pages and then “decides” to drop them.
In reality, ranking is downstream of much more basic signals:
- Can Googlebot reach the URL consistently?
- Can it fetch key files (robots.txt, sitemap, HTML) without error or timeouts?
- Can it crawl enough to refresh confidence in the content?
- Is the URL still considered “worth keeping” in the index compared to alternatives?
When access breaks (or becomes unstable), Google’s system reduces crawl frequency. Over time, Google becomes less confident about the page’s freshness and availability, and indexing stability degrades.
That’s why after disruptions you often see:
- Important pages remain technically “known” but stop appearing.
- Queries lose impressions before they visibly lose average position.
- More URLs shift into gray states like “Crawled – currently not indexed”.
This is not a punishment. It’s a consequence.
“But the site works for me”: why this is common in disruptions
A key misunderstanding comes from testing the site only from inside the affected geography.
You open your website and it loads, so you assume Google can crawl it too.
But Googlebot is not testing from your local network.
In practice, during disruptions, this is common:
- Users inside the country can access the site intermittently.
- Googlebot (or any international crawler) hits:
- Timeouts
- 5xx server errors
- DNS failures
- handshake/route instability
- partial blocks or traffic shaping
From your perspective the site “works”. From Google’s perspective the site is unreliable.
And reliability is an indexing signal.
Why it’s often worse for Iran-hosted sites for Googlebot can’t crawl
During international connectivity restrictions, the risk increases for sites hosted within Iran because:
- The inbound/outbound international routes may become unstable.
- Upstream network policies can impact accessibility from outside.
- DNS propagation, routing, or filtering can make global access inconsistent.
- CDN coverage is often weaker or misconfigured for global bots.
The result is that Googlebot visits less, fetches less, and updates confidence less frequently.
And when the crawl rate declines, the index begins to thin out.
This is why “hosting location” becomes an SEO risk factor in certain environments—not because of SEO myths, but because of crawl path reality.
The typical Search Console pattern after disruption
Here’s what you usually see, in order:
1) Impressions drop before rankings “look” broken after Googlebot can’t crawl
Your GSC graph often shows impressions dropping hard first.
This happens because fewer pages are being shown for fewer queries.
If Google loses confidence in a chunk of URLs, it may stop serving them broadly even before average position reflects it cleanly.
2) Average position becomes noisy
If your URL set shrinks and only a subset of pages still shows, “average position” can fluctuate wildly. It becomes less useful as a stable KPI.
3) Coverage shifts into gray states
You start seeing more of these:
- Discovered – currently not indexed
- Crawled – currently not indexed
These statuses are not always “quality problems.” In disruption contexts, they often indicate that Google is not confidently maintaining the page in the index.
4) Googlebot can’t crawl Stats show reduced crawling
If you track crawl stats (and ideally server logs), you’ll often see reduced frequency, fewer bytes downloaded, and more fetch errors.
This is the point where many teams make a wrong move:
They publish more pages.
Which usually makes it worse.
Why “publishing more content” can backfire after an outage
When crawl reliability drops, your crawl budget and crawl allocation become tighter.
If you add more URLs while Google is already struggling to fetch your existing important pages, you can trigger:
- index bloat
- more “discovered but not indexed”
- weaker internal link equity concentration
- more template duplication
- cannibalization
- slower refresh cycles on your most valuable pages
That’s why our approach on insight.ramfaseo.se leans heavily on indexing guardrails and architecture discipline:
- Decide what must be indexable vs noindex/canonical
- Keep sitemaps clean (only indexable URLs)
- Strengthen hubs and clusters so Google is guided toward your highest-value pages
- Reduce noise when crawling becomes expensive
If you have a hub page for Internal Linking & Site Architecture, this post belongs as a cluster under it.
The real root cause: Google needs stable access to 3 things
When teams say “Google can’t crawl”, they often mean “Google can’t crawl everything.”
But in recovery scenarios, you don’t need everything immediately.
You need stable access to the fundamentals:
1) robots.txt
If Googlebot can’t fetch robots.txt reliably, crawling becomes unpredictable.
Sometimes Google will pause crawling when it can’t determine crawl permissions.
2) sitemap.xml (and sitemap index files)
If sitemaps are unreachable, Google loses a major discovery and prioritization mechanism—especially important for large sites.
3) your most important templates/pages
Typically:
- home
- category hubs
- top commercial landing pages
- top informational pillar pages
- critical programmatic pages (if you use them)
- important blog posts and guides
If these become unreliable, the index becomes unreliable.
This is why the “fix” is not “a few SEO tasks.”
The fix is ensuring stable bot access.
The strategic solution: build “bot-access continuity” into your infrastructure
In markets with disruption risk, SEO needs an additional layer:
Your site must remain crawlable for global bots even when local connectivity is unstable.
This can be achieved through several architectures. The right one depends on your setup, scale, and risk tolerance, but common approaches include:
Option A: CDN / Reverse Proxy in front of origin
A properly configured CDN/reverse proxy can:
- absorb routing instability
- cache critical assets and even HTML in certain strategies
- provide consistent global access
- reduce origin server load
- reduce timeouts
This often becomes the most practical path for maintaining stable fetchability.
Option B: GeoDNS / smart routing
GeoDNS can route different regions to different endpoints or paths.
Used correctly, it can help preserve accessibility for international crawlers.
Used incorrectly, it can create duplicate versions and canonical chaos.
If you do GeoDNS, you must define:
- one canonical host
- consistent content and canonical signals
- stable sitemap and robots location
- strict index/noindex rules for alternates
Option C: Monitoring from outside the affected geography
Most teams only monitor uptime locally. That’s not enough.
You need:
- external fetch monitoring (multiple non-local regions)
- alerting for timeouts and 5xx
- checks for robots and sitemap availability
- checks for key pages
This is not an advanced luxury. In disruption-risk contexts, it’s baseline.
The recovery checklist (what to do when the damage has started)
If your rankings are already unstable, do not panic-publish.
Use this sequence instead.
Step 1: Verify access the way Google sees it
Test from outside the local network/geography:
- robots.txt
- sitemap.xml
- key landing pages
- a sample of important content pages
If you can’t confirm stable access from outside, nothing else matters.
Step 2: Check for server-side evidence
If possible:
- review server logs
- identify timeouts, 5xx spikes, blocked user agents
- see whether Googlebot hits are declining
- validate whether responses are slow or failing under load
If your host/provider can’t give logs, at least check:
- error logs
- uptime monitoring
- performance metrics
Step 3: Inspect GSC Coverage + Crawl stats
Look for:
- rising “not indexed” states
- crawl requests dropping
- fetch failures
- sitemap read issues
Step 4: Reduce indexing noise temporarily
If crawl is constrained, help Google focus:
- ensure sitemaps contain only indexable URLs
- consider noindex/canonical for low-value variants
- avoid generating new URL sets during instability
This aligns with our general stance: indexing rules before scaling.
Step 5: Re-stabilize access, then re-trigger discovery
Once access is stable:
- resubmit sitemaps
- validate robots fetch
- request indexing for the highest-priority URLs (limited use)
- ensure internal linking points clearly to your money pages and pillars
Index recovery is not instant. But you can stop the bleeding quickly if access is fixed.
The most common mistakes I see (and why they prolong recovery)
Mistake 1: assuming “it will come back automatically”
Sometimes it does—but only if the crawl reliability returns fast enough and the URL set is clean.
If your site has many low-value URLs, duplicates, and weak architecture, recovery often lags or becomes incomplete.
Mistake 2: pushing more pages
More URLs when crawl is weak = more risk.
Mistake 3: changing too many variables at once
If you migrate, change templates, rewrite internal linking, and publish aggressively all at the same time during instability, you won’t know what helped or hurt.
Mistake 4: forgetting that Google indexes systems, not intentions
Your intention may be “I’m serious about SEO now.”
Google only sees:
- fetch success rate
- response time
- page discoverability
- duplication signals
- internal linking structure
- user satisfaction signals (where measurable)
Intent doesn’t rank.
Systems do.
How this connects to pillar/cluster SEO strategy
This topic is not “news”. It’s foundational.
In a pillar/cluster model, this post should sit under something like:
- Internal Linking & Site Architecture (pillar)
- Crawling & indexing stability (cluster)
- Crawl budget preservation (cluster)
- Index/noindex/canonical gating logic (cluster)
- Sitemaps as discovery control (cluster)
Because disruption-related ranking loss is usually a combination of:
- access stability
- indexing eligibility
- crawl budget allocation
- architecture clarity
The more disciplined your site structure, the faster recovery tends to be—because Google can re-evaluate your important pages efficiently.
Practical takeaway: if you operate in disruption-risk markets, “SEO” includes infrastructure
If you work in environments where connectivity can become unstable, your SEO strategy should include a dedicated layer called something like:
Bot Access Continuity Plan
Minimum requirements:
- stable robots and sitemaps
- stable access to key pages
- external monitoring
- clean indexing rules
- architecture that concentrates value
- controlled URL growth
Without this, every disruption becomes:
- a ranking crash
- a slow re-index cycle
- a re-building cost
And the business pays for it again and again.
Final summary
- Googlebot must be able to crawl your site from outside your local environment.
- If it can’t, crawl drops. When crawl drops, indexing stability degrades.
- When indexing degrades, rankings “fall” because pages stop being served.
- This is often worse for Iran-hosted sites during international restrictions.
- The fix is not more content. The fix is stable bot access + clean indexing rules + strong architecture.
- Build a continuity plan: CDN/reverse proxy, GeoDNS where appropriate, and external monitoring.
If you want SEO to be resilient, treat crawlability like uptime.
Because for Google, it is uptime.
