Introduction to robots.txt → https://goo.gle/4gbNmcl
Control what you share with Google → https://goo.gle/3VnyLBU
Open Source robotstxt library & tester → https://github.com/google/robotstxt
What is a robots.txt file and how does it impact website indexability? Join Martin Splitt on this Search Lightning Talk as he covers how robots.txt interacts with robots meta tags, HTTP headers, and how to use them to control search engine crawlers’ access to your website.
Chapters:
0:00 – Intro
0:17 – Why do I need robots.txt?
0:40 – What is the robots meta tag?
2:47 – When to use noindex vs. disallow
3:48 – Why is Googlebot still crawling certain pages?
5:14 – Conclusion
Watch more Search Central Lightning Talks → https://goo.gle/lightning-talks
Subscribe to the Google Search Central YouTube Channel → https://goo.gle/SearchCentral
#GoogleSearch #SearchLightningTalks
Speaker: Martin Splitt
Products Mentioned: Search Console
source
Watch more SEO tips on our channel. → https://goo.gle/SearchCentral
Thanks
Anyone tried using HasData while considering robots.txt rules on their projects? Would love to hear how it handles that!
Hi, I have a custom url for my site that is using the blogger platform. I have the site connected to AdSense, but I keep getting this earnings error message whenever I login to AdSense. I have pasted the code into the blogger root directory in the blogger dashboard several times either through mobile, tablet or laptop and I still can't solve the error. Would anyone know how to fix this because I've done everything I can think of and I'm not making progress. Any info would be appreciated. Thanks.
Does robots.txt file takes priority over a sitemap?
This video answered a question I had long been meaning to ask. robots.txt is probably the most obvious way for me to control traffic but that could mean exposing end points that are not supposed to be public. using the x-robots-tag header is way better for me as it's controlled within my app/site's layer and means I can selectively include or exclude content and control. So thanks for making this video!
I don't understand why Google can't just assume that if a website owner blocks something in robots.txt it should IMPLY that it is not to be indexed. You create a conundrum for us here. If we open up a path to be crawled that has potentially infinite URLs, or at least hundreds of thousands, just so you can see the Noindex tag it still creates crawling issues. I have seen the messages in GSC and log file entries. Google never seems to stop trying. YEARS go by and they're still checking out these Noindexed URLs. The ratio of these bunk URLs to real content being crawled is sometimes 99 to 1 even years after the robots Noindex tag is implemented. Making us choose between keeping a URL from being crawled and keeping a URL from being indexed is ridiculous. WHY?! What is the point of this? Please tell me because I have been doing this for 17 years and this is something I simply cannot understand. Every SEO I talk to has the same problem. It could be so easy to fix. IF we're blocking something from being crawled, just assume we don't want it indexed either. EASY! In what situation do you think someone would block a URL that they want to be indexed? Perhaps by accident, but that's their problem.
Now who would want to block SteveBot from crawling?
Hey Martin, is an increasing number of "blocked by robots.txt" in the index coverage report a negative signal for Google? Or is it just fine if i know where it comes from?
Context:
I purposely blocked a lot of parameters from crawling on my site, because there are literally millions, and I can't get rid of them. I don't want googlebot to waste budged on those, and I don't want to allow crawling and noindex those parameters, because then google will still need to crawl them and waste the budget. So I believe blocking them is just fine, but the increasing report is frightening me a little bit :/
Ok, thank you! That is all good and well, but I was just wondering… Why did you destroy search?
Of course, Yes! Martin, please keep us posted! 🙌
thank you
It would be great to have something that would combine a disallow from robots.txt and a noindex tag. Maybe a noindex rule inside a robots.txt (when I don't want a bot to crawling a page neither indexing it).
Besides, another interesting situation to explain would be when you have a page that can be accessed from different URLs (because of parameters, etc.). It would be a canonical situation, but I don't want to use them because I know which one is the main URL and I don't want to to take the risk of indexing the wrong URL. Probably a noindex would be the way to go there.
I know. ❤
if robots.txt says don't crawl.. why the need to add it to the index? so people take it out and leave the meta on the page, so google can still crawl the page for it's own uses, but not index it (like you wanted in the first place?)
hey i'm in the video
Do google have only their people as bronze members and above or the common people can also become a diamond level product expert.
I have written 50 great answers. I don't know why i am not getting even a single recommended answer.