in this virtual world, Every website contains a robots.txt file. It is a protocol called (Robots Exclusion Protocol) or (REP) to control search engine robots and gives the webmasters to decide how these files could interact with search engine robots or indexer and how to crawl or index their page’s.
So why robots.txt is important?
In a simple word, robots.txt is very widely important for Seo. Because it is handy for allowing or excluding a page or resource from indexing, Create no-follow pages, disallow web crawlers to crawl a page and submit a sitemap to the Search engine. If you want to index your page in the search engine, you just need to make sure that your page is accessible by the search engine robot and not blocked by the robots.txt file.
How can I use robots.txt?
Using the robots.txt file is straightforward. Look at your website’s root directory. You will find a .txt file named robots. If you do not find one, create a new one. Now, make sure that your robot.txt file is public or easily accessible by the following URI: http://www.yourwebsite.com/robots.txt
What is the functionality of robots.txt?
You can use robot.txt to disallow search engine robots to accessing your page and collecting data or following the links on your page. Here’s how you can use this functionality.
If you want to prevent your domain from indexing then use the following robots.txt codes:
User-agent: * Disallow: /
The “User-agent: *” means all types of robots from all search engines and “Disallow:/” means all of your web pages and resources from this domain.
To allow, all robots to access your web page, use the following robot.txt
If you want to exclude only a page or resource your robots.txt should be like this:
User-agent: * Disallow: /wp-admin/
If you want to block a particular search engine’s robot to crawl your web page, Your robot.txt should be:
User-agent: BadBot Disallow: /
Besides this, if you want to allow only a single robot to access your web page and block the other robots you can define it like this:
User-agent: Google Disallow: User-agent: * Disallow: /
This way you can only permit Google bot to access your web page and blocked the other bot’s by the next piece of code.
If you want to exclude all files except one, make sure that your allowed file is one level up from your disallowed files. For Instance, if you only want to allow your homepage and exclude other pages, your robots.txt should be:
User-agent: * Disallow: /~home/disallowedpage/
Or, you can disallow one by one which you want to hide from a search engine.
User-agent:* Disallow: /~home/disallowedpage-1/ Disallow: /~home/disallowedpage-2/ Disallow: /~home/disallowedpage-3/
Caution: Whether you use “dis-allow method” or not your files are still accessible from the internet. So if you want to hide your file’s from public access, make sure to use password protocol or protect it by password. So your data will be no longer accessible without a password.
Is it possible to prevent robots without robots.txt?
robots.txt is important for each website. But you can still tell search engine bot’s not to indexing and not following your links for a particular page. You can use robot meta tags to simply doing this by Placing this code before </head> to work correctly. A simple robot meta tag usually looks like this:
<meta name="robot" content="noindex"/>
This code snippet tells the search engine bots not to index this page which header area contains these meta tags.
<meta name="robot" content="index, nofollow"/>
The code snippet tells search engine bots to index the page but not follow any links on this page.
<meta name="robot" content="noindex, nofollow"/>
The code snippet tells search engine bots not to index the page and not follow any links on this page.
<meta name="robot" content="noindex, follow"/>
The above code snippet tells search engine bots not to index the page but follows any links of this page.
Caution: if you don’t an expert doesn’t place these tags in an auto-generated script like PHP or ASP. You can use these meta tags for single stand-alone HTML files. For WordPress, you can use ALL in One SEO or WordPress Seo by Yoast.