Soon Soon
Already a SEONAUT? Login here
Did you subscribe to our Youtube channel? If not, it would be our pleasure if you would join our community!

Disallowing Bots in robots.txt is ruining your SEO

Disallowing the Googlebot through the robots.txt file can ruin many years of SEO effort, but if you notice it on time, fixing it is ridiculously simple. On the other hand the same robots.txt can save an online store relaunch and help you to do the relaunch in the right way. It can also help you manage your crawling budget and help you index a new online store a bit faster.
 
Lesson Timestamps
L1:
See more

Have a Question?

To ask a practical question:

To be able to help you with obstacles which occur while you are working for an employer or a client, we need a lot more input. That data is sensitive, therefor SEONAUTs have the option to ask the mentor 1on1 questions in private and to provide more details.

To unlock even more features for $17/mo:

To ask a theoretical question:

We love the SEOLAXY community an we provided free answers on YouTube for many years. Today it is physically not possible to answer all of them. But we are still commited to answer all theoretical questions and questions about the lesson in YouTube comments.

If you have a theoretical question about this lesson:

Lesson Transcript:

robots.txt File can Both Ruin and Save Your Online Store

Disallowing the Googlebot through the robots.txt file can ruin many years of SEO efforts, but if you notice it on time, fixing it is ridiculously simple. On the other hand the same robots.txt file can save an online store relaunch and help you do the relaunch in the right way. It can also help you managing your crawling budget and help you index a new online store a bit faster. Let's dig in.

What is a robots.txt File?

The robots.txt file is a standard used by websites to communicate with web crawlers like the Googlebot and other bots. It is a part of the robots exclusion protocol and is used primarily to manage and control the behavior of search engine bots that visit the website, in our case an online store. The robots.txt file tells the Googlebot which pages or sections of a website should not be crawled.

Why is robots.txt Important for SEO?

The robots.txt file also plays a critical role in controlling and optimizing how search engine bots crawl and index an online store which can significantly impact your online store's search engine performance. For example, if you disallow the Googlebot the entry to your online store. Mid-size and especially large online stores should carefully manage their indexing and crawling budget and and a part of those tasks is taking care about the robots.txt instructions.

Use cases of robots.txt for Ecommerce SEO

Almost all online stores are denying the Googlebot crawling unimportant URLs for SEO like the admin panel, special versions of parameter URLs and also often internal search results. But those tasks are not so important for an Ecommerce SEO perspective. The only important case you should know about right now is that robots.txt can ruin all of your SEO efforts or of your predecessors if they make this one mistake. That mistake is just having one fatal character wrong in the most important instruction in the robots.txt file and removing that one character can save an online store from an SEO catastrophe. And no, this is not a breaking news headline. Denying entry for the Googlebot through the robots.txt file is not a rare case and has really a catastrophic impact on SEO. In this lesson we are going to see how you can detect that big mistake and remove it and how can you use the robots.txt file to speed up indexing for new online stores.

The robots.txt Syntax

Before we can do that we need to understand the robots.txt syntax. A syntax is a fancy technical word for a set of rules for putting words and symbols together in a language, whether it's for writing sentences, writing code or writing instructions like in this case. Luckily, the syntax of robots.txt has only three rules and only a few words. It's not complicated, I promise. Let's take a look how an ordinary robots.txt should look like, if there are no restrictions. The first line says that we are writing a rule for all user agents. "User agents" is again a fancy name for bots like the Googlebot. Using a star equals writing all bots, so this is a rule for all bots. The second line clearly states that we are disallowing something. After the column we should write what we are disallowing. Because there is nothing after the column, that means that we are not disallowing anything. Quite confusing, right? So, this rule reads: all user agents are not disallowed anything or in other words all user agents are allowed everything. So this is a perfectly normal and good robots.txt file. Now look carefully. If we only add a slash after the second column, we're getting this. The slash is symbolizing the root of a domain meaning we are disallowing the bot to enter our domain at all. For example if our online store is using the domain name myonlinestore.com the slash would be interpreted like this meaning we are disallowing entry to our online store for all bots including the Googlebot One wrong slash and you have blocked Googlebot out of entering your online store. Sadly, the Googlebot in most cases is very strictly following robots.txt's recommendations. We could call them restrictions or rules but most of the time they are just recommendations, because sometimes the Googlebot ignores them, but not if you write this certain command. So if your robots.txt file looks like this you shouldn't panic, but you should act immediately if you want to allow the googlebot to enter your online store, because he won't if you leave it like this. For those using WordPress or their ShopCMS WooCommerce be aware that ticking this tiny checkbox does exactly this. It writes that comment in the robots.txt so please never click it and never use this command if you're not 100% sure you don't want the Googlebot to enter your online store. So just by removing that slash, saving the robots.txt file, you have undone a possible catastrophe.

When Should We Disallow Access for the Googlebot via robots.txt

So is there a case where we should disallow access for the Googlebot via the robots.txt file? Yes, a very important one. If you are creating a new online store while working on it and until you're ready for the launch, you should disallow the Googlebot to access it. It is the same when you're going on a first date. Usually want to present yourself in your best version so you should do the same with the first encounter of your online store with the Googlebot. Sometimes it won't be enough to block the curious Googlebot. In addition to that, you want to password protect your online store from all eyes, your competition and the Googlebot. You have many ways to do it, with a plugin or addon, depending which shop CMS you're using or simply google “htpwd” and give that to your developers.

Using the Google Site Parameter to Check Indexing

If you have already made that mistake and your online store is under construction but you didn't disallow Googlebot access to it via the robots.txt, you can do it right now and you should check if Google has already indexed your online store by using the site parameter. You simply open Google, enter "site" followed by a colon and then your domain name. So don't use HTTPS or www for it. Just the domain name and hit enter or search. If you see no results, you're lucky. If you see a few URLs on that list, don't worry add the disallow rule right now and block any access with the already mentioned methods and in a few days those URLs will disappear from Google. In 99% of the cases they don't rank anyway so nobody can find them anyway right now. Please notice that this applies only for new online stores. If you're doing a online store relaunch, this error could be fatal, therefore we should mention how to prevent that fatal error. This is actually the most important use of the robots.txt file, so if your team is preparing an online store relaunch or redesign you want to do that behind curtains, right? So good online stores use test servers also known as staging servers, where they test the functionality, usability and SEO requirements before making the relaunch public. So very often that version will be placed at something like this. test.onlinestore.com or staging.myonlinestore.com But sometimes they will use just another place. Wherever they are putting it, your job as an Ecommerce SEO Specialist is to tell them that they have to disallow entry for the Googlebot from the beginning and hide it behind restricted access like mentioned.

Subdomain Googlebot Access Management

So is it possible to manage Googlebot access for a subdomain through the domain robots.txt file? No, it is not. Google is treating every subdomain as a standalone domain from every perspective and especially from the SEO perspective. Google has made big changes in the last updates and made this publicly clear. Subdomains of high authority websites profited earlier from that domain authority. Now, it is not the case anymore and it was always the case regarding robots.txt that each subdomain needs its own robots.txt file.

Where Should the robots.txt Be Placed?

The location of the robots.txt has always to be in the root folder. In cases of domains it can't be placed inside any folder. For our example, domain that should be like this and in case of subdomains it is the same here is an example. So now you know where to place the robots.txt file but what if you don't have any robots.txt file at all?

robots.txt Not Found/Unreachable/Not Fetched/Not Accessible

Let's take a look inside that. What to do if you get a warning inside the Google Search Console or any other place like robots.txt not found, robots.txt unreachable, robots.txt not fetched, robots.txt not accessible. Those warnings are telling you only one thing, either your robots.txt doesn't exist or it is not in the right place. If Google can't find a robots.txt file, it assumes that you are allowing access to every bot and behaves like that. So if you want to grant access for all bots to everything, not having a robots.txt file is not critical. Still, it would be nice at least to create one. Here is the content of the robots.txt. Pause the video, make a screenshot or write it down. You will need it later in your career.

How to Create a robots.txt File?

You can create a robots.txt file using the text edit app on your MacOS or the notepad app on your Windows by simply writing exactly this, saving it as a txt file, but if you're working with developers, you should let them do the work. Just inform them about it and write how it should look like and where they should place it and check afterwards if they have done it by checking if it's existing in the right place and if it has the right content. In the SEOLAXY ACADEMY and MASTERCLASS you will learn how to write tickets for developers and a lot more about agile management, but this fix can be done by an email. If you need the ticket you can download it in the SEOLAXY resources. You can open any robots.txt file of any domain if it exists just by requesting that file like this And voila, you can see it. This is always the case, so it is easy for you to check the robots.txt file of your online store easily and see if the developers have done it right. If you open a few robots.txt files you are going to notice that some of them are simple and some of them have more than two, three lines. Let's take a look at the default robots.txt file of an WooCommerce online store and Shopify online store and Magento online store, a Shopware online store and Wix online store. That might confuse you right now, but again there is no need to worry. The default robots.txt files are almost always safe to use and allow the Googlebot entrance. Just look out for those two lines if you notice indexing or crawling problems. If you see those, you already know how to react. In most cases some self-proclaimed SEO plugins do that mess or the misuse of real SEO plugins by you can also be dangerous. So if your online store is using an SEO plugin check if you're using a real SEO plugin or some wannabe SEO plugin. Be ware, the popularity of an SEO plugin doesn't guarantee you that they are safe to use. We will cover an overview of them when the time comes. Until then, make sure that you know what you're clicking on. We have seen how big the damage can be if you tick the wrong checkbox inside the most popular CMS. WordPress and WooCommerce are great CMSes for many use cases, but nobody can protect you from clicking on that checkbox. Now, let's take a look at the robots.txt and how you can write many recommendations for the Googlebot together. The most important of them is which things you would like Googlebot not to crawl. For example, the admin panel. Remember, these are seen as recommendations for the Googlebot, not as rules.

Disallow Access to a Certain Folder in robots.txt

There is no need to block access for the Googlebot to the admin panel or the checkout for an online store. But if you want to do it, how to do it? You write a new rule for every folder like this. For example, for the admin panel: in the robots.txt of WordPress you can find, for example, this instruction. Or in Shopify this one. You can disallow not only certain folders, you can also disallow certain paths or certain parameters, but this is way too much for you right now. You have learned the most important thing in this lesson meaningful for Ecommerce SEO Juniors. Let's take a look how to block a certain bot like the Googlebot. Instead of the star, we we write the official name of that bot. For the Googlebot it's just "Googlebot", so blocking him, just him, would look like this and allowing him access and blocking all other bots would look like this. You can also stack instructions, for example, deny all bots access to these folders. So you don't need to repeat the user agent instruction every time. If you have multiple rules for one of them or all of them, you can group them.

Adding a Sitemap to the robots.txt

One useful and safe thing to do as a Junior Ecommerce SEO Specialist is to add the path to your sitemap inside the robots.txt file. That will make the job of the Googlebot easier, because he will try to crawl the URLs in that sitemap with a higher priority. We will show you in one of the next SEOLAXY SCHOOL lessons how sitemaps should be created and structured, but for right now you should just know about this possibility. So adding a sitemap to a regular robots.txt file would be adding just this one line. It is way safer to add the whole URL instead of the relative URL like this. So the complete good robots.txt file would then look like this. The sitemap does not have to be in the the domain root, unlike the robots.txt file it can be located anywhere and it does not have to be an XML file, it could also be a TXT file, for example. Congratulations! You have learned how to prevent the most critical SEO catastrophe possible and how to speed up indexing of a new online store. If you have got value out of this, please consider becoming a SEOLAXY member whom we call SEONAUTS and boost your SEO career by becoming a certified Ecommerce SEO Specialist. Inside the SEOLAXY Community you will meet others who are on the same journey as you are. In the next lesson we will cover the .htaccess file, and the second biggest possible SEO mistake and learn about redirections, which play a very important part in Ecommerce SEO. See you next time.

Become a Frame

As a member, you have access to everything inside the SEOLAXY, not depending on the package you chose. The community is completely free. For all other parts of SEOLAXY, you are using your gems to unlock content you want to access.

Smaller Plan

EXPLORER

$17
monthly payment
Billed Monthly Billed Yearly -23%
WARNING

This subscription is only bookable until October. Subscribers can keep it as long their wish, but others will not be able to choose it anymore!

SPECIALIST

$42
monthly payment
Billed Monthly Billed Yearly -23%
WARNING

This subscription is only bookable until November. Subscribers can keep it as long their wish, but others will not be able to choose it anymore!

FREELANCER

$97
monthly payment
Billed Monthly Billed Yearly -23%
Bigger Plan
Smaller Plan
Bigger Plan
Already a SEONAUT? Login