It's been almost an year now, since the search community introduced the rel="nofollow" attribute. It was a ground breaking technique for spammers and not only, but the question is does it still work?
Google started the trend when they announced the new attribute. Along with them hoped onboard also Yahoo and MSN and a lot of other blog-related sites. The original idea was to stop the comment spam on blogs. Well, it had an impact, but not the one desired, because spam still exists even with the nofollow-ed links.
Another use that a lot of the SEO world saw, was controlling the outbound links on a page, by adding rel="nofollow" to the links you want not to hand over a part of your page rank. And another way of using nofollow, which I myself used a few times, is to give nofollow to a link which performs an operation which has an effect on the user's browsing session, but would not affect the search engines' bots. I know that theoretically this shouldn't be done by GET methods, as the HTTP RFC specifies. It is said that for actions the POST method should be used. But sometimes, just sometimes it looks better with a link than a button to submit a form.
One day a few months ago I wanted to see which search engines obey to this rule, and which don't, which follow links which are given rel="follow" and which don't. I put a special file on my site fur testing purposes only, which I marked it with nofollow. A couple of months later here we are, and I have an interesting piece of data I want to share.
| Month | Crawler | Times accessed |
|---|---|---|
| Dec 2005 | Yahoo | 1 |
| Dec 2005 | Java 1.5.0 | 5 |
| Dec 2005 | Java 1.4.2 | 1 |
| Dec 2005 | Robot OmniExplorer_Bot/5.20 | 1 |
| Dec 2005 | Crawler | 2 |
| Dec 2005 | Java 1.4.1 | 2 |
| Dec 2005 | Larbin | 1 |
| Dec 2005 | Robot | 1 |
| Jan 2006 | Java 1.5.0 | 8 |
| Jan 2006 | Java 1.4.2 | 6 |
| Jan 2006 | Look | 1 |
| Jan 2006 | Crawler | 2 |
| Jan 2006 | Larbin | 2 |
| Jan 2006 | Yahoo | 1 |
| Jan 2006 | GoogleBot 2.1 | 9 |
| Jan 2006 | Java 1.4.1 | 6 |
| Jan 2006 | Robot OmniExplorer_Bot/5.85a | 1 |
| Jan 2006 | Robot geniebot | 1 |
| Jan 2006 | PsBot | 1 |
| Feb 2006 | Java 1.5.0 | 3 |
| Feb 2006 | Java 1.4.1 | 3 |
| Feb 2006 | Robot OmniExplorer_Bot/5.96 | 1 |
| Feb 2006 | Java 1.4.2 | 1 |
| Feb 2006 | Robot OmniExplorer_Bot/6.13c | 1 |
| Feb 2006 | GoogleBot 2.1 | 2 |
So when we add them up, in a three months interval, the page was crawled as such:
| Crawler | Times accessed |
|---|---|
| Java 1.5.0 | 16 |
| GoogleBot 2.1 | 11 |
| Java 1.4.1 | 11 |
| Java 1.4.2 | 8 |
| Crawler | 4 |
| Larbin | 3 |
| Yahoo | 2 |
| Robot OmniExplorer_Bot/5.20 | 1 |
| Robot | 1 |
| Look | 1 |
| Robot OmniExplorer_Bot/5.85a | 1 |
| Robot geniebot | 1 |
| PsBot | 1 |
| Robot OmniExplorer_Bot/5.96 | 1 |
| Robot OmniExplorer_Bot/6.13c | 1 |
It seems that Google alone did the most crawling, from those which seem to be a search engine crawler. So what does this mean? Are the search engines not respecting their own imposed crawling filter?
Comments
at 15:04 on 14/Aug/2006
![]()
at 19:15 on 02/Jun/2007
![]()
at 16:55 on 05/Aug/2009
![]()
at 21:51 on 25/May/2006
Comment by tim boucher