Google’s John Mueller lately answered a query about phantom noindex errors reported in Google Search Console. Mueller asserted that these reviews could also be actual.
Noindex In Google Search Console
A noindex robots directive is likely one of the few instructions that Google should obey, one of many few ways in which a website proprietor can train management over Googlebot, Google’s indexer.
And but it’s not completely unusual for search console to report being unable to index a web page due to a noindex directive that seemingly doesn’t have a noindex directive on it, no less than none that’s seen within the HTML code.
When Google Search Console (GSC) reviews “Submitted URL marked ‘noindex’,” it’s reporting a seemingly contradictory state of affairs:
- The positioning requested Google to index the web page through an entry in a Sitemap.
- The web page despatched Google a sign to not index it (through a noindex directive).
It’s a complicated message from Search Console {that a} web page is stopping Google from indexing it when that’s not one thing the writer or web optimization can observe is going on on the code degree.
The particular person asking the query posted on Bluesky:
“For the previous 4 months, the web site has been experiencing a noindex error (in ‘robots’ meta tag) that refuses to vanish from Search Console. There isn’t a noindex wherever on the web site nor robots.txt. We’ve already appeared into this… What could possibly be inflicting this error?”
Noindex Reveals Solely For Google
Google’s John Mueller answered the query, sharing that there have been all the time a noindex displaying to Google on the pages he’s examined the place this sort of factor was occurring.
Mueller responded:
“The instances I’ve seen previously had been the place there was really a noindex, simply typically solely proven to Google (which might nonetheless be very onerous to debug). That stated, be at liberty to DM me some instance URLs.”
Whereas Mueller didn’t elaborate on what could be happening, there are methods to troubleshoot this problem to seek out out what’s happening.
How To Troubleshoot Phantom Noindex Errors
It’s potential that there’s a code someplace that’s inflicting a noindex to point out only for Google. For instance, it might have occurred {that a} web page at one time had a noindex on it and a server-side cache (like a caching plugin) or a CDN (like Cloudflare) has cached the HTTP headers from that point, which in flip would trigger the outdated noindex header to be proven to Googlebot (as a result of it steadily visits the location) whereas serving a contemporary model to the location proprietor.
Checking the HTTP Header is straightforward, there are various HTTP header checkers like this one at KeyCDN or this one at SecurityHeaders.com.
A 520 server header response code is one which’s despatched by Cloudflare when it’s blocking a person agent.
Screenshot: 520 Cloudflare Response Code
Beneath is a screenshot of a 200 server response code generated by cloudflare:
Screenshot: 200 Server Response Code
I checked the identical URL utilizing two completely different header checkers, with one header checker returning a a 520 (blocked) server response code and the opposite header checker sending a 200 (OK) response code. That reveals how in another way Cloudflare can reply to one thing like a header checker. Ideally, attempt checking with a number of header checkers to see if there’s a constant 520 response from Cloudflare.
Within the state of affairs the place an internet web page is displaying one thing completely to Google that’s in any other case not seen to somebody trying on the code, what you could do is to get Google to have a look at the web page for you utilizing an precise Google crawler and from a Google IP handle. The best way to do that is by dropping the URL into Google’s Wealthy Outcomes Take a look at. Google will dispatch a crawler from a Google IP handle and if there’s one thing on the server (or a CDN) that’s displaying a noindex, this can catch it. Along with the structured knowledge, the Wealthy Outcomes take a look at may also present the HTTP response and a snapshot of the net web page displaying precisely what the server reveals to Google.
While you run a URL via the Google Wealthy Outcomes Take a look at, the request:
- Originates from Google’s Knowledge Facilities: The bot makes use of an precise Google IP handle.
- Passes Reverse DNS Checks: If the server, safety plugin, or CDN checks the IP, it would resolve again to googlebot.com or google.com.
If the web page is blocked by noindex, the instrument can be unable to supply any structured knowledge outcomes. It ought to present a standing saying “Web page not eligible” or “Crawl failed”. When you see that, click on a hyperlink for “View Particulars” or increase the error part. It ought to present one thing like “Robots meta tag: noindex” or ‘noindex’ detected in ‘robots’ meta tag”.
This strategy doesn’t ship the GoogleBot person agent, it makes use of the Google-InspectionTool/1.0 person agent string. Which means if the server block is by IP handle then this methodology will catch it.
One other angle to examine is for the state of affairs the place a rogue noindex tag is particularly written to dam GoogleBot, you possibly can nonetheless spoof (mimic) the GoogleBot person agent string with Google’s personal User Agent Switcher extension for Chrome or configure an app like Screaming Frog set to establish itself with the GoogleBot person agent and that ought to catch it.
Screenshot: Chrome Person Agent Switcher
Phantom Noindex Errors In Search Console
These sorts of errors can really feel like a ache to diagnose however earlier than you throw your palms up within the air take a while to see if any of the steps outlined right here will assist establish the hidden motive that’s liable for this problem.
Featured Picture by Shutterstock/AYO Manufacturing
Source link


