Apr 25 2008

Today in Emails

Some email about making custom smileys arrives from MSN Live. I don’t even use the “official” MSN client since I refuse to use anything that shoves ads in my face in an intrusive manner.

I do use the MSN Messenger service though, so it might be prudent to just unsubscribe to their marketing spam rather than flag it spam and miss potential emails about the actual service. So I click the unsubscribe link at the bottom of the mail, log in, select “I don’t want this stuff” at three different places, click submit, and… a red-colored text that says “Error 500″ appears.

I try to submit two more times. More error 500.

This is where I sigh, go back to Gmail, and click “report spam”.

Here’s some helpful hints from an email user: I can report your mail as spam with a single click. If you can’t add a one-click unsubscribe link (also, it helps if it actually works), then I can report your mail as spam with a single click. I don’t want to have to jump through hoops to do this. There’s no need for me to have to log in and navigate the site to find my account preferences.

It’s all a cost/benefit calculation. This cost me time. The benefit was that I might still want to get things like password reminders (in case I suffer from sudden brain trauma) and information about service changes.

Had this been a web shop I would most likely have clicked “report as spam” right away. I can accept having to provide a password, but after that it should opt me out instantly.

The worst offender I’ve seen here is CD-WOW. After logging in the user preferences had two places where you needed to deselect your spamming preferences — under separate tabs with nearly identical names. I only spotted one of them, and logically assumed that it would work. Then I got more marketing trash from them, got annoyed since I had already declared my preferences, and now their mails go straight to the spamcan.


May 18 2006

Spam of the Year

This spam to my Gmail account had me laughing out loud.

Subject: Ihre Domain www.gmail.com ist nicht bei Google gelistet!

It’s a German spammer trying to tell me that “my” domain, gmail.com, isn’t listed in Google!

So yeah, you stupid spammer dudes at Finke Marketing. Thanks for the chuckles.


Mar 20 2006

Snailmail Spam

I had a letter waiting for me when I got home after watching V For Vendetta, of which I might rant later. US air mail, eh? I rip it open.

Some silly domain registrar, Domain Registry of America (who in the fine print state they are not affiliated with or endorsed by the government of the United States) wants me to host this very domain, and what a fancy name said domain has, at them instead of Gandi, my current registrar.

I would consider it if their yearly fee wasn’t 216% of what I’m paying right now. €26 per year? I pay €12 now. That pricing has no attachment whatsoever to reality.

And why does your spam look so much like a bill? Hidden deep in a paragraph you say that it isn’t, yet you do your best to make it look like one. Add some FUD about “losing your online identity” and I put you on my shit list.

Well, thanks for sending me something to light the fireplace with.


Feb 22 2005

SNAFU

The front page was b0rked for a couple of hours. I still had Movable Type installed, and it looks like someone managed to send a trackback (I had deleted mt-comments.cgi to prevent spam, since I don’t even use MT any more), which caused the front page to be overwritten by MT.

I took this as a sign that it was time to upgrade to WordPress 1.5. So I did. All is well again.

Later: Surprise surprise, it was trackback spam.


Jan 30 2005

An introduction to mod_security

Inspired by this article I decided to make a similar article that shows the advantages of mod_security over stopping spam by using mod_rewrite.

I started using TextDrive in June 2004. When comment spam became a very large problem for Movable Type users due to poor programming in mt-comments.cgi, a mailing list was set up to figure out a way to fight back against spam. And mod_security was our weapon of choice.

Addition: I should mention that other TextDrive users usually won’t have to bother blocking the common spam; we spot attacks very quickly on the aforementioned mailing list and add a global rule to block it across all TextDrive servers.

This is what mod_security has to say about itself in a single paragraph:

ModSecurity is an open source intrusion detection and prevention engine for web applications. Operating as an Apache Web server module, the purpose of ModSecurity is to increase web application security, protecting web applications from known and unknown attacks.

While mod_rewrite is good at rewriting URLs, it’s a very poor choice for fighting spam. It requires quite a lot of obscure commands to block a single URL. mod_security, on the other hand, can block an URL with a single line in your .htaccess file.

I won’t explain how you install mod_security, so let’s pretend we already covered that part. Now for the good stuff.

Configuring mod_security

This is how you start mod_security, either in your global Apache configuration, or in a .htaccess file:

SecFilterEngine DynamicOnly
SecFilterScanPOST On
SecAuditLog logs/audit_log

The first line tells Apache to run mod_security, but only on dynamic pages (PHP, CGI scripts, whathaveyou). You can also set it to On instead of DynamicOnly, if you want to scan all requests for all pages.

The second line is where mod_security really starts to trounce mod_rewrite: enable scanning of POST headers. This is something that mod_rewrite is unable to do.

The POST data is the actual data that gets submitted to a web server, such as comment forms. This means that mod_security can filter based on content in the comments, and even in specific fields, if you only want to make a rule based on the author of a comment.

The third line tells Apache where to store the audit log from mod_security. This log file contains everything that mod_security catches, if you have configured it to log that particular rule.

Let’s add a fourth line before we begin the actual block rules: the default action.

SecFilterDefaultAction "deny,log,status:412"

This set the default action for rules that have no action defined, so that you don’t have to re-type the action for every rule. This line sets the default mode to “block the request, log it, and give the client an Error 412.”

I prefer Error 412 (Precondition Failed) over Error 403 (Access Denied). 403 is “You’re not allowed to be here,” while 412 is “We don’t serve your kind here.” 403 is the “Staff only” sign; 412 is the bouncer at the door checking his list of misbehaving persons.

Let’s start blocking!

Now, let’s build some rules. The basic rules have two formats:

SecFilter PATTERN [ACTION]

This scans the request for PATTERN, and uses the default action if it matches PATTERN. It also accepts an optional ACTION argument, which uses the same format as the SecFilterDefaultAction above. If you have lots of spam to block, it’s easier to define a default action and only use the first version to block spam.

However, it doesn’t scan the POST headers unless we told mod_security to do so. Which we did above. So you could create a rule to stop viagra spam like this:

SecFilter "viagra"

This will block referral spam containing “viagra” in the URL or in a comment (since we enabled POST scans). But since SecFilter scans the entire request, it also checks for it in the user agent field. While I don’t know about any browsers called “Viagra” we can never be sure that they really do exist, and that’s why I prefer to be very specific about what part of the request should be scanned. We really don’t want to block legitimate users by accident, like comments containing “Hey, I get tons of Viagra spam too!”

You can also use regular expressions in the rules:

SecFilter "(viagra|mortgage|herbal)"

If we want to use an action different than the default action, we can do it like this:

SecFilter "viagra" "allow,nolog"

This will allow anything containing “viagra” to pass the filter, and it won’t be logged in the audit log.

Selective blocking

To do a more specific scan, we can use SecFilterSelective instead. It takes the following arguments:

SecFilterSelective LOCATION PATTERN [ACTION]

Now we can define what part of the request we want to scan in, by supplying the LOCATION argument before the PATTERN argument. Let’s say we get tons of referral spam by someone pimping his “buyviagra.com” site. We can scan in the referral header only and block his entire domain from ever referring us:

SecFilterSelective "HTTP_REFERER" "buyviagra.com"

NOTE: As of mod_security 1.8, there is no need to escape dots in domain names. This is managed automatically by mod_security.

Presto! We never see referral spam from that domain again. Note that I did not supply the ACTION argument, since it saves me some typing to let the default action trickle down from the settings above. It also makes it easier to read the rules.

Note, however, that this only blocks referrals from that specific domain. There’s nothing stopping him from referral spamming with “buymyviagradamnit.com” instead. We can of course use regular expressions here as well:

SecFilterSelective "HTTP_REFERER" "(viagra|mortgage|texasholdem)"

There are many fields you can scan selectively, and you can also define several fields to scan on the same line. Just separate them by commas in the LOCATION argument. For a list of all fields you can scan selectively, please see the reference manual.

Blocking IP addresses

If there’s a specific IP address that hits you especially hard, you can block it by scanning the REMOTE_ADDR header:

SecFilterSelective "REMOTE_ADDR" "^83.142.57.250$"

Note that I begin the pattern with ^ and end it with $. These are regular expression special characters that tell it to only match from the beginning of the line, as well as the end of the line. If I didn’t have the starting ^, I would not only block 83.142.57.250, but also 183.142.57.250 since it contains the same pattern. Using them both means “match the entire line.”

Scanning POST payloads

So far we’ve done the same things that we can do with mod_rewrite, and the only advantage has been that it saved us some typing and resulted in more readable lines. Now for something that mod_rewrite cannot do: scanning POST content!

The POST headers contain the contents of forms that are submitted to the server from the browser. Scanning this means you can scan the contents of comments, and find attempted spam even there. Use the POST_PAYLOAD location to scan:

SecFilterSelective "POST_PAYLOAD" "(mortgage|viagra)"

And now nobody can post comments containing mortgage or viagra any more.

But it doesn’t stop there! You can also scan inside specific arguments in the POST payload. Let’s say we want to allow people to talk about viagra and other spammy words, but disallow those words in the URL field in Movable Type and WordPress. In both of these, the URL field is called url.

SecFilterSelective "ARG_url" "(mortgage|viagra)"

Closing statements

That was a brief introduction to the most useful features of mod_security. Remember to always think about what it is you will really block with the rule you just wrote, and figure out a way to be specific enough without trapping legitimate users.

Mark Pilgrim once wrote an entry about the futility of blocking specific domains, and I agree completely.

Savor this moment, folks. You can tell your children stories of how, back in the early days of weblogging, you could print out the entire spam blacklist on a single sheet of paper. Maybe with two or three columns and a smallish font, but still. Boy, those were the days.

And they won’t last. They absolutely won’t last. They won’t last a month. The domain list will grow so unwieldy so quickly, you won’t know what hit you. It’ll get so big that it will take real bandwidth just to host it. Keeping it a free download will make you go broke. Code is free, but bandwidth never will be. Do you have a business plan? You’ll need one within 6 months. Mark Pilgrim

This is why it will be very tiresome to block specific domains. Right now there is a spammer who has bought expired domains, and use them for referral and comment spam. There’s nothing spammy about these domain names; no “viagra” or “mortgage” that you can scan for. As the master of your own domain, there’s not really much you can do about attacks like these except for blocking the individual domains.

The real battle here must be fought at a server-wide level. There are Apache modules in the works that can scan hits across entire web servers and all the domains hosted there, and find patterns in these hits. Unless it’s the Google bot doing a drive-by, 200 domains hosted by the same company are very unlikely to be hit by the same comment spam within 24 hours, and here you can find a pattern and block it.

There is already a module originally designed to fight DDoS attacks. By modifying the thresholds on this module, it can be used to block IP addresses that try to flood with comments or referrals too fast.

But it will almost always be the spammer that chooses the battlefield. The spammers just have to open the floodgate on their spam tools; it is us normal users that have to bother about verifying the visitors and comments so we don’t block the genuine stuff by accident. There are good countermeasures against comment spam, but the only 100% certain method is to disable comments completely.

It wouldn’t be too hard to script a browser to make it a spam tool, and I have reason to suspect that spammers already do this. Imagine a worm that infects Windows computers around the world (not too taxing on the imagination), and then sits hidden and uses Internet Explorer to act, sound and smell like a genuine browser, including calculacing hash cash and other popular spam/DDoS countermeasures. The spammers don’t care; they have all the time in the world and aren’t even using their own computers for the calculations.

Imagine 500,000 of these computers, all able to be remotely controlled by spammers who then pay for access to their network of distributed zombie machines with real browsers doing the work as to better look like a genuine commenter. Even if a zombie only sends one spam comment per minute to avoid detection by flood countermeasures, that’s still 100,000 comments in a minute from the entire zombie net. 360,000,000 comments in a day.

The hash cash and other checksum systems will say that these are genuine comments. That’s why a good spam countermeasure uses several methods to scan the incoming comment. Again, the spammers put the burden of using resources on their victims.

Further reading


Jan 10 2005

200:1

Since I switched from Movable Type to WordPress, I’ve had one (1) legit comment.

And 200 attempted spam comments that Spam Karma shot on sight.

Addendum: 11 days later, and I’ve passed 900.


Dec 12 2004

I should really stop doing this

The change should have been completely transparent and invisible to you, so here’s the info: I’m running WordPress now.

So that’s the fourth time I’ve switched blog CMS this year. Movable Type → WordPress → Textpattern → MT again → WordPress.

The main reason for switching is that the comment spam problem for Movable Type has become completely unbearable. It can quite literally kill a server. MT-Blacklist helps, but has a flaw that allows some comment spam to pass right through it.

And then there’s the whole “rebuild on every comment” aspect. If MT gets hit with 50 spams in 10 seconds, that’s 50 mt-comments.cgi processes that are all rebuilding a page. Sometimes the same page.

The TextDrive servers can easily push 20,000,000 hits per day. Yet mt-comments.cgi can effectively push the server load up into the 300s. This data speaks for itself — 94% of the hits are to mt-comments.cgi. 3-400 of them are proper comments, the rest is spam.

WordPress and Textdrive are dynamic. There’s no page to generate every time a comment hits.

I still get spam, though. Spammers monitor web services like Weblogs.com and go spam them as soon as they see an updated blog there. I get some spam every time I write a new entry.

Enter Spam Karma. So far it’s stopped spam dead in its tracks, and this far more CPU efficient than MT-Blacklist. It works great, and the focus is to require as little interaction as possible from the blog owner.

In closing, WordPress has matured immensely since I last tried it. This will be my weapon of choice for quite some time now.


Dec 11 2004

Nasty crawlers

There’s a discussion on the TextDrive forums about how the MSN spider bot behaves. And it’s quite rude.

Microsoft wanted to be able to boast with a large page index when their new MSN Search went public beta. So they released the leash on the MSN crawler and let it index at full speed, saturating the bandwidth of the victim site if necessary.

That equals about $150 of bandwidth bills in two weeks for TextDrive, or $4000 yearly. So it was banned for a while until it behaved properly. Paying $4000 per year just to be in a search engine is madness.

MSN Search isn’t very smart either. Quite frankly, it’s stupid. It wasn’t quite banned from TextDrive servers; it actually got a redirect via mod_security to the MSN Bot info page. It then parsed the info on that page as if it was the result from the pages it was denied access to, and added it as a search result for those pages.

Stupid, stupid, stupid.

I also had a visit from the Popdex crawler today.

My definition of rude and abusive bots is as follows: if it leaves a referrer without the referring page actually containing a link to my site, it is considered fraudulent behavior. If it gorges and gobbles pages at a rapid pace, it is considered abuse of my site.

The Popdex crawler did both. It crawled 350 pages in two minutes. Twice per page, for 700 requests in two minutes. And it filled my logs with fake referrals to popdex.com.

Bam. Banned.


Aug 31 2003

Corporate retards

precisionintelligence.com. Referral spam. Killfile. I also noted that my old pal globoads.com made a visit. Same treatment.

It won’t make them stop, but at least I don’t have to see them in my log files.

Yes, I’m on a crusade.