Googlebot can’t crawl A technical, practical guide for internet disruptions (with a special warning for Iran-hosted sites)

If you’ve ever watched your rankings collapse right after an internet disruption, you already know the feeling: it’s what happens when Googlebot can’t crawl your site.

Your site loads for you.
Customers inside the country can sometimes access it.
But Google traffic falls hard, impressions disappear, and keyword positions become chaotic.

Then someone says, “Don’t worry, it’ll come back when the internet is stable.”

Sometimes it does. But when it doesn’t, the reason is usually not “SEO quality” or a mysterious penalty.

It’s simple:

Google can’t reliably reach your site.
And when Google can’t crawl, it can’t maintain trust in what it has indexed.

This post explains what actually happens in those situations—especially for sites hosted in Iran—why the symptoms show up the way they do in Search Console, and what a real recovery plan looks like (not just “wait and publish more content”).

This is also directly connected to the core strategy we use across this site:

Indexing is not guaranteed. Eligibility is earned.
If your site is not consistently crawlable, it is not consistently rankable.

The core truth: when Googlebot can’t crawl rankings don’t fall first — crawl reliability falls first

Most people think Google ranks pages and then “decides” to drop them.

In reality, ranking is downstream of much more basic signals:

Can Googlebot reach the URL consistently?
Can it fetch key files (robots.txt, sitemap, HTML) without error or timeouts?
Can it crawl enough to refresh confidence in the content?
Is the URL still considered “worth keeping” in the index compared to alternatives?

When access breaks (or becomes unstable), Google’s system reduces crawl frequency. Over time, Google becomes less confident about the page’s freshness and availability, and indexing stability degrades.

That’s why after disruptions you often see:

Important pages remain technically “known” but stop appearing.
Queries lose impressions before they visibly lose average position.
More URLs shift into gray states like “Crawled – currently not indexed”.

This is not a punishment. It’s a consequence.

“But the site works for me”: why this is common in disruptions

A key misunderstanding comes from testing the site only from inside the affected geography.

You open your website and it loads, so you assume Google can crawl it too.

But Googlebot is not testing from your local network.

In practice, during disruptions, this is common:

Users inside the country can access the site intermittently.
Googlebot (or any international crawler) hits:
- Timeouts
- 5xx server errors
- DNS failures
- handshake/route instability
- partial blocks or traffic shaping

From your perspective the site “works”. From Google’s perspective the site is unreliable.

And reliability is an indexing signal.

Why it’s often worse for Iran-hosted sites for Googlebot can’t crawl

During international connectivity restrictions, the risk increases for sites hosted within Iran because:

The inbound/outbound international routes may become unstable.
Upstream network policies can impact accessibility from outside.
DNS propagation, routing, or filtering can make global access inconsistent.
CDN coverage is often weaker or misconfigured for global bots.

The result is that Googlebot visits less, fetches less, and updates confidence less frequently.

And when the crawl rate declines, the index begins to thin out.

This is why “hosting location” becomes an SEO risk factor in certain environments—not because of SEO myths, but because of crawl path reality.

The typical Search Console pattern after disruption

Here’s what you usually see, in order:

1) Impressions drop before rankings “look” broken after Googlebot can’t crawl

Your GSC graph often shows impressions dropping hard first.
This happens because fewer pages are being shown for fewer queries.

If Google loses confidence in a chunk of URLs, it may stop serving them broadly even before average position reflects it cleanly.

2) Average position becomes noisy

If your URL set shrinks and only a subset of pages still shows, “average position” can fluctuate wildly. It becomes less useful as a stable KPI.

3) Coverage shifts into gray states

You start seeing more of these:

Discovered – currently not indexed
Crawled – currently not indexed

These statuses are not always “quality problems.” In disruption contexts, they often indicate that Google is not confidently maintaining the page in the index.

4) Googlebot can’t crawl Stats show reduced crawling

If you track crawl stats (and ideally server logs), you’ll often see reduced frequency, fewer bytes downloaded, and more fetch errors.

This is the point where many teams make a wrong move:

They publish more pages.

Which usually makes it worse.

Why “publishing more content” can backfire after an outage

When crawl reliability drops, your crawl budget and crawl allocation become tighter.

If you add more URLs while Google is already struggling to fetch your existing important pages, you can trigger:

index bloat
more “discovered but not indexed”
weaker internal link equity concentration
more template duplication
cannibalization
slower refresh cycles on your most valuable pages

That’s why our approach on insight.ramfaseo.se leans heavily on indexing guardrails and architecture discipline:

Decide what must be indexable vs noindex/canonical
Keep sitemaps clean (only indexable URLs)
Strengthen hubs and clusters so Google is guided toward your highest-value pages
Reduce noise when crawling becomes expensive

If you have a hub page for Internal Linking & Site Architecture, this post belongs as a cluster under it.

The real root cause: Google needs stable access to 3 things

When teams say “Google can’t crawl”, they often mean “Google can’t crawl everything.”

But in recovery scenarios, you don’t need everything immediately.

You need stable access to the fundamentals:

1) robots.txt

If Googlebot can’t fetch robots.txt reliably, crawling becomes unpredictable.
Sometimes Google will pause crawling when it can’t determine crawl permissions.

2) sitemap.xml (and sitemap index files)

If sitemaps are unreachable, Google loses a major discovery and prioritization mechanism—especially important for large sites.

3) your most important templates/pages

Typically:

home
category hubs
top commercial landing pages
top informational pillar pages
critical programmatic pages (if you use them)
important blog posts and guides

If these become unreliable, the index becomes unreliable.

This is why the “fix” is not “a few SEO tasks.”

The fix is ensuring stable bot access.

The strategic solution: build “bot-access continuity” into your infrastructure

In markets with disruption risk, SEO needs an additional layer:

Your site must remain crawlable for global bots even when local connectivity is unstable.

This can be achieved through several architectures. The right one depends on your setup, scale, and risk tolerance, but common approaches include:

Option A: CDN / Reverse Proxy in front of origin

A properly configured CDN/reverse proxy can:

absorb routing instability
cache critical assets and even HTML in certain strategies
provide consistent global access
reduce origin server load
reduce timeouts

This often becomes the most practical path for maintaining stable fetchability.

Option B: GeoDNS / smart routing

GeoDNS can route different regions to different endpoints or paths.
Used correctly, it can help preserve accessibility for international crawlers.

Used incorrectly, it can create duplicate versions and canonical chaos.

If you do GeoDNS, you must define:

one canonical host
consistent content and canonical signals
stable sitemap and robots location
strict index/noindex rules for alternates

Option C: Monitoring from outside the affected geography

Most teams only monitor uptime locally. That’s not enough.

You need:

external fetch monitoring (multiple non-local regions)
alerting for timeouts and 5xx
checks for robots and sitemap availability
checks for key pages

This is not an advanced luxury. In disruption-risk contexts, it’s baseline.

The recovery checklist (what to do when the damage has started)

If your rankings are already unstable, do not panic-publish.

Use this sequence instead.

Step 1: Verify access the way Google sees it

Test from outside the local network/geography:

robots.txt
sitemap.xml
key landing pages
a sample of important content pages

If you can’t confirm stable access from outside, nothing else matters.

Step 2: Check for server-side evidence

If possible:

review server logs
identify timeouts, 5xx spikes, blocked user agents
see whether Googlebot hits are declining
validate whether responses are slow or failing under load

If your host/provider can’t give logs, at least check:

error logs
uptime monitoring
performance metrics

Step 3: Inspect GSC Coverage + Crawl stats

Look for:

rising “not indexed” states
crawl requests dropping
fetch failures
sitemap read issues

Step 4: Reduce indexing noise temporarily

If crawl is constrained, help Google focus:

ensure sitemaps contain only indexable URLs
consider noindex/canonical for low-value variants
avoid generating new URL sets during instability

This aligns with our general stance: indexing rules before scaling.

Step 5: Re-stabilize access, then re-trigger discovery

Once access is stable:

resubmit sitemaps
validate robots fetch
request indexing for the highest-priority URLs (limited use)
ensure internal linking points clearly to your money pages and pillars

Index recovery is not instant. But you can stop the bleeding quickly if access is fixed.

The most common mistakes I see (and why they prolong recovery)

Mistake 1: assuming “it will come back automatically”

Sometimes it does—but only if the crawl reliability returns fast enough and the URL set is clean.

If your site has many low-value URLs, duplicates, and weak architecture, recovery often lags or becomes incomplete.

Mistake 2: pushing more pages

More URLs when crawl is weak = more risk.

Mistake 3: changing too many variables at once

If you migrate, change templates, rewrite internal linking, and publish aggressively all at the same time during instability, you won’t know what helped or hurt.

Mistake 4: forgetting that Google indexes systems, not intentions

Your intention may be “I’m serious about SEO now.”

Google only sees:

fetch success rate
response time
page discoverability
duplication signals
internal linking structure
user satisfaction signals (where measurable)

Intent doesn’t rank.

Systems do.

How this connects to pillar/cluster SEO strategy

This topic is not “news”. It’s foundational.

In a pillar/cluster model, this post should sit under something like:

Internal Linking & Site Architecture (pillar)
- Crawling & indexing stability (cluster)
- Crawl budget preservation (cluster)
- Index/noindex/canonical gating logic (cluster)
- Sitemaps as discovery control (cluster)

Because disruption-related ranking loss is usually a combination of:

access stability
indexing eligibility
crawl budget allocation
architecture clarity

The more disciplined your site structure, the faster recovery tends to be—because Google can re-evaluate your important pages efficiently.

Practical takeaway: if you operate in disruption-risk markets, “SEO” includes infrastructure

If you work in environments where connectivity can become unstable, your SEO strategy should include a dedicated layer called something like:

Bot Access Continuity Plan

Minimum requirements:

stable robots and sitemaps
stable access to key pages
external monitoring
clean indexing rules
architecture that concentrates value
controlled URL growth

Without this, every disruption becomes:

a ranking crash
a slow re-index cycle
a re-building cost

And the business pays for it again and again.

Final summary

Googlebot must be able to crawl your site from outside your local environment.
If it can’t, crawl drops. When crawl drops, indexing stability degrades.
When indexing degrades, rankings “fall” because pages stop being served.
This is often worse for Iran-hosted sites during international restrictions.
The fix is not more content. The fix is stable bot access + clean indexing rules + strong architecture.
Build a continuity plan: CDN/reverse proxy, GeoDNS where appropriate, and external monitoring.

If you want SEO to be resilient, treat crawlability like uptime.
Because for Google, it is uptime.

When Google Can’t Crawl Your Site, Rankings Don’t “Drop” — They Get De-Indexed