How to setup robots.txt
What is robots.txt?
A robots.txt file on your website tells a search engine to ignore specified files or directories. You might want to create one to exclude certain content because it is private (such as an administrative interface) or that it is misleading or irrelevant to the categorization to your website as a whole.
You can create the robots.txt file manually, using any text editor. It should be an ascii-encoded text file and the filename should all be in lowercase. Once you’ve created your robots.txt file, save it to the root of your domain with the name robots.txt. This is where robots will check for your file. If it’s saved anywhere else, they will not find it.
Setting it up
- User-agent: the search engine robot
- Disallow: the URL that will be blocked
You can include as many entries as you want.
Blocking search engine bots
A comprehensive list of search engine bots can be found here
Blocking a page:
Disallow: /my_page.html
Blocking an entire directory (which includes all pages in it):
Disallow: /blocked_dir/
Block a site:
Disallow: /
To block files of a specific file type (.exe) from the google bot:
User-agent: Googlebot
Disallow: /*.exe$
More advanced settings (only works with the goolgebot)
Block access to all URLs that include a question mark:
User-agent: Googlebot
Disallow: /*?
If you have created a robots.txt file for your site and would like to check the validity of it, there is a free service that will do this for you here
0 comments
Kick things off by filling out the form below.
Leave a Comment