How Robots.txt Works

By Bobby McIntyre / May 11, 2025

Introduction to robots.txt → https://goo.gle/4gbNmcl
Control what you share with Google → https://goo.gle/3VnyLBU
Open Source robotstxt library & tester → https://github.com/google/robotstxt

What is a robots.txt file and how does it impact website indexability? Join Martin Splitt on this Search Lightning Talk as he covers how robots.txt interacts with robots meta tags, HTTP headers, and how to use them to control search engine crawlers’ access to your website.

Chapters:
0:00 – Intro
0:17 – Why do I need robots.txt?
0:40 – What is the robots meta tag?
2:47 – When to use noindex vs. disallow
3:48 – Why is Googlebot still crawling certain pages?
5:14 – Conclusion

Watch more Search Central Lightning Talks → https://goo.gle/lightning-talks
Subscribe to the Google Search Central YouTube Channel → https://goo.gle/SearchCentral

#GoogleSearch #SearchLightningTalks

Speaker: Martin Splitt
Products Mentioned: Search Console

source

17 thoughts on “How Robots.txt Works”

@GoogleSearchCentral
May 11, 2025 at 8:13 pm

Watch more SEO tips on our channel. → https://goo.gle/SearchCentral

Reply
@sergiodalmasso
May 11, 2025 at 8:13 pm

Thanks

Reply
@BrandonRoberts-q5b
May 11, 2025 at 8:13 pm

Anyone tried using HasData while considering robots.txt rules on their projects? Would love to hear how it handles that!

Reply
@bvd1022
May 11, 2025 at 8:13 pm

Hi, I have a custom url for my site that is using the blogger platform. I have the site connected to AdSense, but I keep getting this earnings error message whenever I login to AdSense. I have pasted the code into the blogger root directory in the blogger dashboard several times either through mobile, tablet or laptop and I still can't solve the error. Would anyone know how to fix this because I've done everything I can think of and I'm not making progress. Any info would be appreciated. Thanks.

Reply
@tyler.bastian
May 11, 2025 at 8:13 pm

Does robots.txt file takes priority over a sitemap?

Reply
@cchana
May 11, 2025 at 8:13 pm

This video answered a question I had long been meaning to ask. robots.txt is probably the most obvious way for me to control traffic but that could mean exposing end points that are not supposed to be public. using the x-robots-tag header is way better for me as it's controlled within my app/site's layer and means I can selectively include or exclude content and control. So thanks for making this video!

Reply
@EverettSizemore
May 11, 2025 at 8:13 pm

I don't understand why Google can't just assume that if a website owner blocks something in robots.txt it should IMPLY that it is not to be indexed. You create a conundrum for us here. If we open up a path to be crawled that has potentially infinite URLs, or at least hundreds of thousands, just so you can see the Noindex tag it still creates crawling issues. I have seen the messages in GSC and log file entries. Google never seems to stop trying. YEARS go by and they're still checking out these Noindexed URLs. The ratio of these bunk URLs to real content being crawled is sometimes 99 to 1 even years after the robots Noindex tag is implemented. Making us choose between keeping a URL from being crawled and keeping a URL from being indexed is ridiculous. WHY?! What is the point of this? Please tell me because I have been doing this for 17 years and this is something I simply cannot understand. Every SEO I talk to has the same problem. It could be so easy to fix. IF we're blocking something from being crawled, just assume we don't want it indexed either. EASY! In what situation do you think someone would block a URL that they want to be indexed? Perhaps by accident, but that's their problem.

Reply
@NathanVeenstra
May 11, 2025 at 8:13 pm

Now who would want to block SteveBot from crawling?

Reply
@Lychnuchus
May 11, 2025 at 8:13 pm

Hey Martin, is an increasing number of "blocked by robots.txt" in the index coverage report a negative signal for Google? Or is it just fine if i know where it comes from?

Context:
I purposely blocked a lot of parameters from crawling on my site, because there are literally millions, and I can't get rid of them. I don't want googlebot to waste budged on those, and I don't want to allow crawling and noindex those parameters, because then google will still need to crawl them and waste the budget. So I believe blocking them is just fine, but the increasing report is frightening me a little bit :/

Reply
@i.k.8868
May 11, 2025 at 8:13 pm

Ok, thank you! That is all good and well, but I was just wondering… Why did you destroy search?

Reply
@shaikatray
May 11, 2025 at 8:13 pm

Of course, Yes! Martin, please keep us posted! 🙌

Reply
@BudionoSukses
May 11, 2025 at 8:13 pm

thank you

Reply
@SergioU-vj1ik
May 11, 2025 at 8:13 pm

It would be great to have something that would combine a disallow from robots.txt and a noindex tag. Maybe a noindex rule inside a robots.txt (when I don't want a bot to crawling a page neither indexing it).

Besides, another interesting situation to explain would be when you have a page that can be accessed from different URLs (because of parameters, etc.). It would be a canonical situation, but I don't want to use them because I know which one is the main URL and I don't want to to take the risk of indexing the wrong URL. Probably a noindex would be the way to go there.

Reply
@dipenbishwasbd
May 11, 2025 at 8:13 pm

I know. ❤

Reply
@laci272
May 11, 2025 at 8:13 pm

if robots.txt says don't crawl.. why the need to add it to the index? so people take it out and leave the meta on the page, so google can still crawl the page for it's own uses, but not index it (like you wanted in the first place?)

Reply
@steveguyman
May 11, 2025 at 8:13 pm

hey i'm in the video

Reply
@AmanBamba
May 11, 2025 at 8:13 pm

Do google have only their people as bronze members and above or the common people can also become a diamond level product expert.

I have written 50 great answers. I don't know why i am not getting even a single recommended answer.

Reply

Related Posts

17 thoughts on “How Robots.txt Works”

Leave a Comment Cancel Reply