Creating search engine friendly WordPress robots.txt file
Thursday, 22 November 2007
At the moment of writing, the core of this blog is WordPress version 2.3.1. It is sad that it comes not quite ready for the search engines. One of the reasons is missing robots.txt file. Actually if you enter address http://yoursite.com/robots.txt WordPress will show some data from internal configuration, but there is some room for improvement.
Let’s do it right!
First, create empty robots.txt file on your local computer. And prepare to fill it with the content explained below.
The first line allows all search engine spiders on your site. Of course it will not protect from parasitic and cloaking crawlers, but all the useful search engine bots read this line correct.
User-agent: *
Assume, that you already have blog with pretty much content, and do not want bot to occupy all the web-server’s resources while spidering your content. So, set the interval between crawling links to 10 seconds.
Crawl-delay: 10
Now, the most important part. We will disallow to crawl some of the files and directories on your WordPress blog. Those two files below won’t add any bonuses to your search engine position anyway…
Disallow: /license.txt Disallow: /readme.html
Is there something interesting on admin panel?
Disallow: /wp-admin.php
Usually RSS feeds are available through different links, so disallow other locations
Disallow: /wp-atom.php Disallow: /wp-commentsrss2.php Disallow: /wp-feed.php Disallow: /wp-rss.php Disallow: /wp-rss2.php Disallow: /wp-rdf.php
Those files should be closed for spiders
Disallow: /wp-blog-header.php Disallow: /wp-comments-popup.php Disallow: /wp-comments-post.php Disallow: /wp-config-sample.php Disallow: /wp-config.php Disallow: /wp-cron.php Disallow: /wp-links-opml.php Disallow: /wp-login.php Disallow: /wp-mail.php Disallow: /wp-pass.php Disallow: /wp-register.php Disallow: /wp-settings.php Disallow: /wp-trackback.php Disallow: /xmlrpc.php
Let’s disallow some directories to be crawled. First - admin directory, this is important!
Disallow: /wp-admin/
Then - all the rest.
Disallow: /wp-content/ Disallow: /wp-includes/ Disallow: /trackback/
Your feed is linking to your site - so do not waste crawler time and do not penalize yourself.
Disallow: /feed/
Of course, there may be something interesting for spiders on the default WordPress upload directory, so lets allow them to crawl it.
Allow: /wp-content/uploads/
Show the location of the sitemap if you have it. This plug-in can generate it for you and may enter similar line itself.
Sitemap: http://yoursite.com/sitemap.xml
Save the file and upload it into root directory of your site.
What you can expect after:
- Higher ranking of your main topics
- More accurate index of your site content
- Less penalties by avoiding duplicate content on RSS feeds
- Less traffic from web crawlers (but not less visits!)
- Higher self-esteem
So, we have finished. Too lazy to ctrl+c and ctrl+v??? Here is the compiled robots.txt. Replace “naslenas.com” with your actual WP blog location ![]()
