Why Pages Get Crawled but Not Indexed: 12 Real Causes + Fixes (A Practical, Real-World Troubleshooting Guide)

You open Google Search Console, you check “Pages,” and you see the exact phrase that ruins your mood: Crawled – currently not indexed (or something close to it). The weird part is that it’s not “Blocked,” not “Disallowed,” not “Error.” Google is literally telling you: we came, we fetched, we looked… and we decided not to keep it.

If you’ve done SEO for long enough, you know the dangerous moment is right here—because this is where many people panic-rewrite content, change URLs, add random internal links, resubmit a hundred times, and accidentally create a bigger mess. Most of the time, the fix is not “more words.” It’s clearer signals and removing contradictions.

Crawling and indexing are not the same step. Crawling is access. Indexing is a decision. And Google’s indexing decision is not a single test—it’s the outcome of multiple signals: content value, duplication/canonical signals, renderability, internal importance, and overall site quality patterns.

This guide is how I troubleshoot this in real projects when I want a predictable outcome. Not theory. Not “maybe.” Actual causes, actual symptoms, and the fastest clean fixes that don’t create technical debt later.


First: What “Crawled but Not Indexed” actually means (without the fluff)

When Google says it crawled the page but didn’t index it, it usually means one of these stories is happening:

  1. Google fetched the URL and decided it’s not worth indexing right now (quality/value issue).
  2. Google fetched it and decided another URL is the canonical (duplication/canonical conflict).
  3. Google fetched it but effectively “saw” empty/weak content due to rendering, blocked resources, or a soft 404 pattern.
  4. Google fetched it but the page is low priority inside your own site structure (weak internal linking / orphan / not in the “important” cluster).
  5. Google’s crawling resources are being wasted on noise (parameters, filters, infinite URLs), so indexing becomes selective.

That’s it. The faster you identify which story you’re in, the faster you fix it.


The fix order that prevents wasted work (use this every time)

When I want a clean diagnosis without spiralling into an “SEO audit rabbit hole,” I check in this order:

Step A — Indexability & fetch health

  • Is it indexable? (noindex? headers? wrong status code?)
  • Can Google fetch it consistently? (200 OK, stable TTFB, no 5xx bursts?)

Step B — Canonical & duplication story

  • Is Google choosing another canonical?
  • Are variants fighting? (http/https, www/non-www, slash/no-slash, parameters, paginated versions)

Step C — Content value & intent fit

  • Does the page actually satisfy the query intent?
  • Does it have unique value beyond boilerplate?

Step D — Rendering visibility (especially mobile)

  • Does the main content exist in HTML?
  • Is it JS-dependent? Are resources blocked? Are there runtime errors?

Step E — Internal signals & architecture

  • Is the page important inside the site?
  • Is it linked from relevant hubs? Is it only in the sitemap?

This order matters because you don’t want to spend 3 hours rewriting content when the real issue is “Google picked a different canonical,” or “the page is a soft 404,” or “rendered content is blank on mobile.”


12 real causes + fixes (the ones that keep showing up in production sites)

1) Hidden noindex (meta robots or X-Robots-Tag)

This still happens more than you’d expect—especially after theme changes, staging migrations, or security plugins.

Symptoms

  • In page source: <meta name="robots" content="noindex">
  • Or in headers: X-Robots-Tag: noindex
  • Search Console URL Inspection shows “Excluded by ‘noindex’ tag”

Fix

  • Remove noindex from the page template or HTTP header.
  • Confirm it’s removed across all variants (http/https, www/non-www).

2) You allow crawling, but you block what Google needs to render

This is classic. Robots.txt allows the page itself, but blocks /assets/ or /wp-content/ or critical JS/CSS. The result? Google fetches the HTML but cannot render meaningful content, and indexing becomes unstable.

Symptoms

  • “View crawled page” (in GSC Inspection) shows missing layout or missing main content
  • Page appears fine for users but looks thin to Google

Fix

  • Don’t block critical render resources.
  • Keep robots blocks for truly non-indexable patterns (admin, cart steps, internal searches), not core CSS/JS.

3) Google chooses a different canonical (and your page loses)

Canonical is not a command. It’s a suggestion. If your site sends mixed signals, Google will choose the canonical that makes more sense to it.

Symptoms

  • GSC says: “Duplicate, Google chose different canonical than user”
  • Or page is “Alternate page with proper canonical tag”
  • Your sitemap lists URL A, but internal links point to URL B

Fix

  • Use self-referential canonical on the preferred URL.
  • Align these three:
    1. Internal links → preferred URL
    2. Sitemap → preferred URL
    3. Canonical tag → preferred URL
  • Remove redirect chains and variant duplication.

4) Soft 404: page returns 200 but behaves like “not found”

A soft 404 is one of the most common reasons for “crawled but not indexed.” Google fetched it, but the content looked like a placeholder, thin “no results,” or a fake page.

Symptoms

  • Empty category/tag pages
  • “No products found” pages
  • Out-of-stock pages that become almost blank
  • Generic “This content is unavailable” pages returning 200

Fix

  • If the page shouldn’t exist: return 404 or 410 (don’t fake it with 200).
  • If it should exist: add meaningful content and clear purpose.

5) The page is thin—not short—thin

Thin content is not about word count. It’s about whether the page reduces uncertainty and helps the user complete a job.

Symptoms

  • Mostly generic lines, definitions, repeated intros
  • Page says “it depends” with no decision criteria
  • Looks okay to the writer, but doesn’t solve the user’s real question

Fix (real fix)

  • Add decision value: examples, edge cases, workflow, what to check first, what to ignore, what to measure.
  • Make the first 10 seconds undeniable: “you’re in the right place and here’s what to do.”

6) Boilerplate dominates the page (template-to-content ratio is bad)

This happens when your header/footer/sidebar and repeated blocks are bigger than your unique content. Google crawls it and says “this is not a distinct document.”

Symptoms

  • Every page has identical FAQ sections, identical paragraphs
  • Only a small variable changes (city name, product name, tag name)

Fix

  • Increase unique main content.
  • Remove repeated blocks that add zero value.
  • Stop mass-generating near-duplicate pages unless they represent real search intent.

7) Duplicate URLs generated by filters, parameters, and faceted navigation

Ecommerce sites and large blogs love generating infinite URL variants: sorting, filtering, tracking, pagination, and more.

Symptoms

  • Many crawled pages with ?sort= ?filter= ?utm=
  • GSC shows indexing issues on parameter URLs
  • Log files show Googlebot spending time on junk URLs

Fix

  • Decide which facet pages deserve indexing (very few do).
  • Canonical duplicates to the main category/page.
  • Stop linking to parameter variants internally.
  • Consider robots/meta rules for non-value parameter patterns.

8) Orphan pages: Google can crawl them, but your site doesn’t “vouch” for them

Sitemap alone is not a strong endorsement. If a URL has no meaningful internal links, Google may crawl it but treat it as low importance.

Symptoms

  • URL is only in sitemap
  • Not linked from hub pages, navigation, or relevant posts

Fix

  • Add contextual internal links from relevant pages.
  • Put it in a cluster: pillar → supporting posts → related links.

9) Rendering issues: the real content is behind JavaScript (especially on mobile)

This is where Next.js, React, and heavy JS sites get hit. Users “see” content because their browser runs JS. Googlebot might not render it the same way (or it renders but sees content too late, too unstable, or too broken).

Symptoms

  • View-source has almost no main content
  • GSC “HTML” looks empty or minimal
  • Errors appear in console logs during rendering
  • Mobile experience is slower, and content appears late

Fix

  • Prefer SSR/SSG for indexable pages.
  • Ensure main content exists in initial HTML response.
  • Reduce render-blocking, fix runtime errors, and keep critical content above the fold.

10) The page is slow/unreliable when Google crawls (timeouts / 5xx bursts)

Google doesn’t love unstable pages. If fetch quality is inconsistent, indexing becomes inconsistent too.

Symptoms

  • Random 5xx spikes in logs
  • High TTFB
  • Crawl anomalies in Search Console

Fix

  • Stabilize caching (server + CDN).
  • Fix database and server load bottlenecks.
  • Remove heavy plugins/scripts that slow first byte.
  • Validate on mobile networks (because that’s closer to real-world crawling scenarios).

11) Wrong intent match: you wrote “about the topic,” not “for the query”

This is the silent killer. Page looks professional, but it doesn’t match the job behind the keyword.

Symptoms

  • Query intent is “how to fix,” but page is definitions
  • Query intent is “compare,” but page is a general overview
  • Query intent is “is this normal,” but page is generic SEO advice

Fix

  • Rewrite the structure, not just the sentences:
    • Start with the answer
    • Then diagnostics
    • Then step-by-step actions
    • Then edge cases
    • Then a short checklist

12) New site trust curve + weak topical authority pattern

On a fresh site, Google crawls widely but indexes selectively. If your site feels scattered or low depth, indexing becomes conservative.

Symptoms

  • “Discovered – currently not indexed” and “Crawled – currently not indexed” across many URLs
  • Low topical depth in clusters
  • Too many pages published quickly with limited uniqueness

Fix

  • Publish in tight clusters (pillar + support).
  • Strengthen internal linking between related posts.
  • Keep sitemap clean (canonical only).
  • Focus on fewer, stronger pages first—then expand.

Real-world “quick win” playbook (what I do on day one)

If I want a fast improvement without guessing:

  1. Pick 10 important URLs with “crawled not indexed”
  2. For each:
    • Confirm 200 OK
    • Confirm noindex is absent
    • Check canonical and whether Google chose another
    • Check “view crawled page” rendering
    • Check internal links count (are they orphan?)
  3. Fix the first obvious root cause
  4. Only then: request indexing

This avoids random changes and makes results predictable.


Mobile-first angle (because Google isn’t judging your desktop)

Even when the page is “fine” on your desktop, indexing can still fail if:

  • Mobile content is truncated or hidden behind accordions incorrectly
  • Layout shifts cause unstable rendering
  • JS heavy interactions delay content visibility
  • Fonts and CSS block meaningful paint
  • Cookie banners or overlays cover the content

A page that feels annoying on mobile is often a page that Google becomes cautious about indexing, especially on newer sites where trust is still forming.


E-E-A-T and “indexability trust” (how it connects in practice)

E-E-A-T is not a checkbox, but it influences whether pages feel like legitimate documents worth indexing.

On technical SEO content, “experience” shows up as:

  • Real workflows (what to check first, what fails in real sites)
  • Real examples (parameter traps, soft 404 patterns, canonical conflicts)
  • Clear, confident guidance (not vague “it depends”)

On the trust side:

  • Clear author identity (bio, experience, consistency)
  • Clean site structure (clusters, navigation, internal linking discipline)
  • No spam patterns (auto-generated thin pages)

This isn’t YMYL like medical advice, but it still sits in “trust territory.” If your content looks templated or mass-produced, Google becomes conservative.


Practical checklist (copy-paste for your workflow)

For each URL stuck in “crawled but not indexed” confirm:

  • ✅ Returns 200 consistently (no weird redirects)
  • ✅ No meta/header noindex
  • ✅ Canonical is self and consistent
  • ✅ Sitemap lists only canonical URLs
  • ✅ Internal links point directly to canonical
  • ✅ No soft-404 behavior / empty templates
  • ✅ Main content is visible without JS dependency (or SSR/SSG works)
  • ✅ Mobile UX is stable (no overlays hiding content, no big CLS)
  • ✅ No parameter duplication multiplying crawl noise
  • ✅ Page has unique value (examples, process, edge cases)

If you fix the story, indexing usually follows.

Ramin AmirHaeri
Ramin AmirHaerihttps://insights.ramfaseo.se
As Search Engine Optimization Manager at Magic Trading Company LLC, I lead strategic SEO initiatives that have significantly enhanced brand visibility in the GCC market. My work focuses on technical SEO audits, keyword research, and content marketing, all aligned with Google’s EEAT and Core Web Vitals standards. These efforts have resulted in improved domain authority and substantial growth in organic traffic.Through my agency, Ramfa SEO, I specialize in high-impact SEO strategies for international clients, achieving millions of indexed keywords across multiple countries. My areas of expertise include e-commerce SEO, technical SEO, and comprehensive SEO audits, with a results-oriented approach to boosting online presence in competitive markets.Over the years, I’ve worked across a wide range of industries and website stacks — from WordPress and Shopify to custom-built platforms — and I’m comfortable collaborating with product, design, and engineering teams regardless of the language or framework behind the site. For me, SEO isn’t “one CMS” or “one tactic”; it’s a system that connects technical performance, content, and business goals into measurable growth. I enjoy working with teams that value clarity, long-term thinking, and clean execution — and I’m always open to thoughtful conversations where strategy, structure, and search performance matter.

Latest articles

Related articles

Leave a reply

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.