Robots TXT & Sitemap XML: The Infinity Stones of SEO's Infinity Gauntlet

Mar 29 2018

Robots TXT & Sitemap XML: The Infinity Stones of SEO’s Infinity Gauntlet

Written by Aj Aviado

Now that we’ve suckered you into this post with that title, we can now talk about these elements of your website that, until today, website owners still dismiss. Like Normies not knowing what Infinity Stones are.



Because of the rise of technology today, many business owners have come to appreciate what SEO can do to boost the strength and reach of their business. Want to be easily discovered by searchers? Optimise your GMB. Want your business to pop up on search results better? Utilize your keywords properly. Want to have a good reputation in the eyes of Google? Create and serve great content. These are just a few of the important elements that can give you the power to move quicker towards eliminating your competition.


But there are a few elements that, for some reason (and this is speaking from experience) get forgotten. These elements are the mere fact that you need to properly set up your Robots.TXT and Sitemap.XML. People are just too excited and giddy to target keywords, create content, build links etc. but they forget how the website can also be optimized from the ground up. But, here’s why you need them.




First, let’s define what is a Robots.TXT File. It’s a file that helps you instruct search engine crawlers or web robots on which pages to crawl and shouldn’t crawl. If created properly, it can direct the crawlers or robots to index the important parts of the website and exclude the unnecessary ones.


What happens if this isn’t set up? Your website can still be seen, but it won’t be the priority. Also, if this is set up incorrectly, there can be disastrous effects like your site not being seen on search results at all.


This small file that you add to the back end of your website, can save you a lot of trouble and can actually give your business the boost it needs. Imagine how your business will improve if the website that’s made to reach out to more potential customers out on the web isn’t even indexed. Your competitors will just step on you like dirt. And in business, you shouldn’t let that happen.

How and where do you get such a file? Well the standard set up or format, if you will, is like this:


User-agent: [user-agent name]

Disallow: [URL string not to be crawled]
Allow: [URL string you want to be crawled]

Sitemap: [url of your sitemap.xml]


That’s right. Every time you need a Robots.TXT format, just come visit us on this post. *wink, wink*


On the user-agent, you can specify certain web crawlers (msnbot, slurp or discobot etc.) that you want to command.


For example you don’t want discobot to crawl your site. This is the set up you need:


User-agent: discobot

Disallow: /


But wait, there’s more! Want a crawl delay? For when your site is a big website that needs to be careful and not blow up the server? Then you’ll need something like this:


User-agent: discobot

Crawl-delay: 20


This means that the robot will have to wait 20 seconds to properly crawl all of the site.  A good example of big sites that use this is Twitter.


The Disallow segment of the Robots.TXT is the part where you specify which pages not to crawl. You don’t have to place the whole URL in it, you just have to do it like this:


Disallow: /search/realtime

Disallow: /search/users


Once again stealing examples off Twitter.


*You can also add in specific pages you want to allow robots to crawl, simply by using Allow instead of Disallow like:


Allow: /search/realtime

Allow: /search/users


Not stolen this time. I just replaced it.


Lastly, the Sitemap. This is where you place the URL of your Sitemap.XML so the crawlers get to go to that file immediately. Speaking of which…




This is a file that serves as a list of all the pages of the website. This basically helps robots crawl those pages better.


One aspect of it that baffles me, is the fact that many website owners transfer from an HTTP to an HTTPS version of the domain and not update the sitemap.xml. This can be dangerous. Not only will robots crawl the HTTP urls, it will also confuse them and make them notice that you haven’t updated your sitemap.xml file.


It should be clear to everyone that Sitemap.XMLs are very important for SEO. This basically helps the pages of your site be seen quicker by Google and its little minions of robots. Remember that Google ranks pages not just websites in general. This then directly helps with your business and it being seen out there on the web. So if you have changes on your pages, you need these changes to reflect too on your Sitemap.XML.


How and where to get this? There are tools online that help generate XML Sitemaps like XML Sitemaps.Com. But these days, if you have a site that has constant updates and you’re the edgy type of SEO like us, you’d want to have an XML Sitemap that can auto-update. These can be plugins on different CMS’ out there.


Bonus: What are Crawlers / Web Crawlers / Robots / Spiders?



These aren’t literal multi-legged animals living inside your computer. These are programs that automatically go through every existing page out on the web (as long as, of course, if they’re permitted to crawl these pages). These are the ones that help index the pages of your website.


Much like the Infinity Gauntlet in the comics, it’s just a gold glove without the infinity stones. A tool for marketing your business online won’t be powerful if it doesn’t have the necessary parts to power it. That’s not a spoiler by the way, that’s the truth.

Social Whatsapp Whatsapp Us