Nasty crawlers
Published December 11th, 2004 in GeneralThere’s a discussion on the TextDrive forums about how the MSN spider bot behaves. And it’s quite rude.
Microsoft wanted to be able to boast with a large page index when their new MSN Search went public beta. So they released the leash on the MSN crawler and let it index at full speed, saturating the bandwidth of the victim site if necessary.
That equals about $150 of bandwidth bills in two weeks for TextDrive, or $4000 yearly. So it was banned for a while until it behaved properly. Paying $4000 per year just to be in a search engine is madness.
MSN Search isn’t very smart either. Quite frankly, it’s stupid. It wasn’t quite banned from TextDrive servers; it actually got a redirect via mod_security to the MSN Bot info page. It then parsed the info on that page as if it was the result from the pages it was denied access to, and added it as a search result for those pages.
Stupid, stupid, stupid.
I also had a visit from the Popdex crawler today.
My definition of rude and abusive bots is as follows: if it leaves a referrer without the referring page actually containing a link to my site, it is considered fraudulent behavior. If it gorges and gobbles pages at a rapid pace, it is considered abuse of my site.
The Popdex crawler did both. It crawled 350 pages in two minutes. Twice per page, for 700 requests in two minutes. And it filled my logs with fake referrals to popdex.com.
Bam. Banned.
About this siteRecent PostsRecent Comments |
||||
No Comments to “Nasty crawlers”
Please Wait
Leave a Reply