Common oversights that can impede Google from crawling your content
Getting deindexed by Google can be one of the most horrendous experiences of your SEO journey. Many times, the reason for deindexation is because Google is not able to crawl your website. Your website will only appear in the Google search results if it has been crawled by Googlebot. If the crawlers do not crawl your site for any reason, your website won’t appear and rank in the search results.
How does your website rank on the search engine? First, it is crawled by search engine bots, and then they index your site into their directory. Then your website starts appearing on the SERPs for relevant terms of your content. We need SEO to rank the website on the first page of the search engine. However, as we said earlier, SEO won’t help if your website isn’t even indexed by Google.
Today, we are going to show you some of the common reasons that stop Google from crawling your content. Let’s first check out how to find out that your webpages are even indexed or not?
Checking if your site is indexed by google or not
There are multiple ways with which you can see whether your site has been indexed by Google or not. One such common approach is to go to Google search engine and type the URL of your domain with “site:” before it, i.e. “site:yourwebsiteurl.com”. All the pages shown in the SERPs have been indexed by Google.
Note that it can take some time for your site to get crawled and indexed by Google. The waiting time differentiates for all the website.
The best way to make sure that Google indexes your site is to submit a sitemap in Google Search Console. There are many free tools available online to create a sitemap for your website.
Many times, even after submitting a sitemap, the pages aren’t indexed by Google. The common reason for it is that Google didn’t crawl those pages. Let’s have a look at all the common reasons that block Google from crawling your website.
Accidentally adding a no index meta tag to a page’s header.
Accidentally adding a no-index meta tag on your website can deindex your website from Google. If your site isn’t getting crawled by any crawler, the first thing you need to do is to check the code of your website. Maybe there is a no-index meta tag added to the head section of your code.
You should check that the following meta tag exists in the section of your page or not.
If it does, then this code is blocking crawlers from crawling your page. If the crawlers have already crawled your website, but if you accidentally add this code, it will get de-indexed from Google, and the crawlers will stop crawling your site. One such thing happened to a famous blogger whose web developer accidentally added a “no-index” meta tag in their code. They fixed the mistake within a day, but it took them more than six months to get their usual traffic back. You can read about that store here: What Happens When You Accidently De-Index Your Website.
You should always be careful about the “no-index” tag. A mistake can have a significant consequence on your website traffic.
Submitting the wrong sitemap
Submitting the wrong sitemap or a sitemap error can be a reason why Google is not crawling your website well. Always look for your sitemap “status” and “discovered URLs” in your Google Search Console account.
If there is any sitemap error, Google will also stat a reason why your sitemap was not able to fetch or why some of the URLs on your site was not able to be crawled by Google. You should look into those reasons and then try to fix the sitemap error.
If you are using WordPress, then you can use plugins like Yoast SEO to submit your sitemap. By doing this method, you don’t need to submit sitemap again and again; the sitemap gets automatically updated if there comes any change on your website. If you are not a WordPress user, then there are many free reliable services available online where you can create a sitemap of your site.
Mistakes in robots.txt file
This is the most common reason that blocks Google from crawling your website. Maybe you are accidentally blocking Google or other search engine crawlers to crawl your site through your robots.txt file.
A robots.txt file allows or blocks search engine crawlers to crawl the specific parts of your website. With robots.txt, you can tell search engine crawlers which file or pages the crawlers can or can’t crawl from your website.
Sometimes people accidentally block crawlers to access some part of their website or their whole website through their robots.txt file. You can see all the blocked pages in your Google Search Console account.
All the pages that are blocked from crawlers through your robotx.txt can generally be found here in the “excluded” section. Note that some of the pages in the “excluded” section can be from other reasons as well.
As robots.txt is the most common reason why most websites don’t get crawl by Googlebot; We have made an exclusive tutorial on how can you configure your robots.txt file. Check it here in the next section.
How to configure robots.txt file?
First thing you need to is to find out that you have a robots.txt file on your website or not. The way to do is to add “/robots.txt” after your site URL, i.e. yoursitedomain.com/robots.txt.
A typical robots.txt file looks like this.
If you have already added a robots.txt file and you can’t find it, then it may be in a subfolder like www.example.com/index/robots.txt. Always make sure to put your robots.txt file in the root directory or else it won’t be discovered by Google or any other crawler.
Below are some of the common robots.txt directives that you should know:
User-agent: * – This is the first line in the robots.txt files to explain all the crawlers that what you want them to crawl on your site.
User-agent: Googlebot – This tells only what you want Google’s crawler to crawl.
Disallow: / – This command blocks all the crawlers to crawl your site.
Disallow: This command tells all the crawlers to crawl your entire site.
Now, open your robots.txt file and find out if the robots.txt file has blocked any of your relevant pages. If it has been blocking an important page and you want it to be crawled, then you can update your robots.txt file and then update in the Search Console. Here’s a guide by Google on How to submit an updated robots.txt file to Google Search Console.
If you are using WordPress, then you can use the Yoast SEO plugin to edit your robots.txt file.
When do you need to block crawlers?
You may have wondered why do we even need to block pages or use a robots.txt file on our website. There are many reasons like you don’t want the search engine to crawl and index specific page of your site, or you don’t want your whole site to appear on the search engine. But the most important reason is this:
Google has a limited crawling budget. Yes, you have heard it right! Google has to crawl billions of pages on the internet. It has limited crawling resources. By letting Google crawl your irrelevant pages, you are wasting Google’s crawling resources. If you have a big website with a lot of pages in it, you should block irrelevant or useless pages from getting crawled by Google through the robots.txt file.
Getting crawled and indexed by Google is the first step of your Search Engine Optimization. Your all SEO efforts will go in vain if your pages are not getting indexed by Google. You should always be aware of the pages that are not getting indexed by Google. Many people keep creating content and doing SEO and find out later that Google indexed none of the pages.
Always keep an eye on your search console account for any new error related to indexing. Refrain from making any mistake yourself to deindex your whole site or any pages as sometimes consequences can be terrible.