Browsing the blog archives for January, 2005

An introduction to mod_security

Inspired by this article I decided to make a similar article that shows the advantages of mod_security over stopping spam by using mod_rewrite.

I started using TextDrive in June 2004. When comment spam became a very large problem for Movable Type users due to poor programming in mt-comments.cgi, a mailing list was set up to figure out a way to fight back against spam. And mod_security was our weapon of choice.

Addition: I should mention that other TextDrive users usually won’t have to bother blocking the common spam; we spot attacks very quickly on the aforementioned mailing list and add a global rule to block it across all TextDrive servers.

This is what mod_security has to say about itself in a single paragraph:

ModSecurity is an open source intrusion detection and prevention engine for web applications. Operating as an Apache Web server module, the purpose of ModSecurity is to increase web application security, protecting web applications from known and unknown attacks.

While mod_rewrite is good at rewriting URLs, it’s a very poor choice for fighting spam. It requires quite a lot of obscure commands to block a single URL. mod_security, on the other hand, can block an URL with a single line in your .htaccess file.

I won’t explain how you install mod_security, so let’s pretend we already covered that part. Now for the good stuff.

Configuring mod_security

This is how you start mod_security, either in your global Apache configuration, or in a .htaccess file:

SecFilterEngine DynamicOnly
SecFilterScanPOST On
SecAuditLog logs/audit_log

The first line tells Apache to run mod_security, but only on dynamic pages (PHP, CGI scripts, whathaveyou). You can also set it to On instead of DynamicOnly, if you want to scan all requests for all pages.

The second line is where mod_security really starts to trounce mod_rewrite: enable scanning of POST headers. This is something that mod_rewrite is unable to do.

The POST data is the actual data that gets submitted to a web server, such as comment forms. This means that mod_security can filter based on content in the comments, and even in specific fields, if you only want to make a rule based on the author of a comment.

The third line tells Apache where to store the audit log from mod_security. This log file contains everything that mod_security catches, if you have configured it to log that particular rule.

Let’s add a fourth line before we begin the actual block rules: the default action.

SecFilterDefaultAction "deny,log,status:412"

This set the default action for rules that have no action defined, so that you don’t have to re-type the action for every rule. This line sets the default mode to “block the request, log it, and give the client an Error 412.”

I prefer Error 412 (Precondition Failed) over Error 403 (Access Denied). 403 is “You’re not allowed to be here,” while 412 is “We don’t serve your kind here.” 403 is the “Staff only” sign; 412 is the bouncer at the door checking his list of misbehaving persons.

Let’s start blocking!

Now, let’s build some rules. The basic rules have two formats:

SecFilter PATTERN [ACTION]

This scans the request for PATTERN, and uses the default action if it matches PATTERN. It also accepts an optional ACTION argument, which uses the same format as the SecFilterDefaultAction above. If you have lots of spam to block, it’s easier to define a default action and only use the first version to block spam.

However, it doesn’t scan the POST headers unless we told mod_security to do so. Which we did above. So you could create a rule to stop viagra spam like this:

SecFilter "viagra"

This will block referral spam containing “viagra” in the URL or in a comment (since we enabled POST scans). But since SecFilter scans the entire request, it also checks for it in the user agent field. While I don’t know about any browsers called “Viagra” we can never be sure that they really do exist, and that’s why I prefer to be very specific about what part of the request should be scanned. We really don’t want to block legitimate users by accident, like comments containing “Hey, I get tons of Viagra spam too!”

You can also use regular expressions in the rules:

SecFilter "(viagra|mortgage|herbal)"

If we want to use an action different than the default action, we can do it like this:

SecFilter "viagra" "allow,nolog"

This will allow anything containing “viagra” to pass the filter, and it won’t be logged in the audit log.

Selective blocking

To do a more specific scan, we can use SecFilterSelective instead. It takes the following arguments:

SecFilterSelective LOCATION PATTERN [ACTION]

Now we can define what part of the request we want to scan in, by supplying the LOCATION argument before the PATTERN argument. Let’s say we get tons of referral spam by someone pimping his “buyviagra.com” site. We can scan in the referral header only and block his entire domain from ever referring us:

SecFilterSelective "HTTP_REFERER" "buyviagra.com"

NOTE: As of mod_security 1.8, there is no need to escape dots in domain names. This is managed automatically by mod_security.

Presto! We never see referral spam from that domain again. Note that I did not supply the ACTION argument, since it saves me some typing to let the default action trickle down from the settings above. It also makes it easier to read the rules.

Note, however, that this only blocks referrals from that specific domain. There’s nothing stopping him from referral spamming with “buymyviagradamnit.com” instead. We can of course use regular expressions here as well:

SecFilterSelective "HTTP_REFERER" "(viagra|mortgage|texasholdem)"

There are many fields you can scan selectively, and you can also define several fields to scan on the same line. Just separate them by commas in the LOCATION argument. For a list of all fields you can scan selectively, please see the reference manual.

Blocking IP addresses

If there’s a specific IP address that hits you especially hard, you can block it by scanning the REMOTE_ADDR header:

SecFilterSelective "REMOTE_ADDR" "^83.142.57.250$"

Note that I begin the pattern with ^ and end it with $. These are regular expression special characters that tell it to only match from the beginning of the line, as well as the end of the line. If I didn’t have the starting ^, I would not only block 83.142.57.250, but also 183.142.57.250 since it contains the same pattern. Using them both means “match the entire line.”

Scanning POST payloads

So far we’ve done the same things that we can do with mod_rewrite, and the only advantage has been that it saved us some typing and resulted in more readable lines. Now for something that mod_rewrite cannot do: scanning POST content!

The POST headers contain the contents of forms that are submitted to the server from the browser. Scanning this means you can scan the contents of comments, and find attempted spam even there. Use the POST_PAYLOAD location to scan:

SecFilterSelective "POST_PAYLOAD" "(mortgage|viagra)"

And now nobody can post comments containing mortgage or viagra any more.

But it doesn’t stop there! You can also scan inside specific arguments in the POST payload. Let’s say we want to allow people to talk about viagra and other spammy words, but disallow those words in the URL field in Movable Type and Wordpress. In both of these, the URL field is called url.

SecFilterSelective "ARG_url" "(mortgage|viagra)"

Closing statements

That was a brief introduction to the most useful features of mod_security. Remember to always think about what it is you will really block with the rule you just wrote, and figure out a way to be specific enough without trapping legitimate users.

Mark Pilgrim once wrote an entry about the futility of blocking specific domains, and I agree completely.

Savor this moment, folks. You can tell your children stories of how, back in the early days of weblogging, you could print out the entire spam blacklist on a single sheet of paper. Maybe with two or three columns and a smallish font, but still. Boy, those were the days.

And they won’t last. They absolutely won’t last. They won’t last a month. The domain list will grow so unwieldy so quickly, you won’t know what hit you. It’ll get so big that it will take real bandwidth just to host it. Keeping it a free download will make you go broke. Code is free, but bandwidth never will be. Do you have a business plan? You’ll need one within 6 months. Mark Pilgrim

This is why it will be very tiresome to block specific domains. Right now there is a spammer who has bought expired domains, and use them for referral and comment spam. There’s nothing spammy about these domain names; no “viagra” or “mortgage” that you can scan for. As the master of your own domain, there’s not really much you can do about attacks like these except for blocking the individual domains.

The real battle here must be fought at a server-wide level. There are Apache modules in the works that can scan hits across entire web servers and all the domains hosted there, and find patterns in these hits. Unless it’s the Google bot doing a drive-by, 200 domains hosted by the same company are very unlikely to be hit by the same comment spam within 24 hours, and here you can find a pattern and block it.

There is already a module originally designed to fight DDoS attacks. By modifying the thresholds on this module, it can be used to block IP addresses that try to flood with comments or referrals too fast.

But it will almost always be the spammer that chooses the battlefield. The spammers just have to open the floodgate on their spam tools; it is us normal users that have to bother about verifying the visitors and comments so we don’t block the genuine stuff by accident. There are good countermeasures against comment spam, but the only 100% certain method is to disable comments completely.

It wouldn’t be too hard to script a browser to make it a spam tool, and I have reason to suspect that spammers already do this. Imagine a worm that infects Windows computers around the world (not too taxing on the imagination), and then sits hidden and uses Internet Explorer to act, sound and smell like a genuine browser, including calculacing hash cash and other popular spam/DDoS countermeasures. The spammers don’t care; they have all the time in the world and aren’t even using their own computers for the calculations.

Imagine 500,000 of these computers, all able to be remotely controlled by spammers who then pay for access to their network of distributed zombie machines with real browsers doing the work as to better look like a genuine commenter. Even if a zombie only sends one spam comment per minute to avoid detection by flood countermeasures, that’s still 100,000 comments in a minute from the entire zombie net. 360,000,000 comments in a day.

The hash cash and other checksum systems will say that these are genuine comments. That’s why a good spam countermeasure uses several methods to scan the incoming comment. Again, the spammers put the burden of using resources on their victims.

Further reading

The One True iPod

So I finally ordered that iPod mini I’ve been drooling about. I was hoping that Steve would pull a new iPod mini model out of a body orifice during Macworld, but sadly he didn’t. So it appears that me and the rest of the world said “screw it” and ordered an iPod mini at the very same time.

Saruman with iPod

Today I got the email that said they’ve shipped it. 8-9 days delivery time.

8-9 days?

Do they walk across the Alps with it? Through the Mines of Moria, into the fiery lands of Mordor where shadows rule, to cast it not into the fires of Mount Doom, but into my eager hands?

I can understand that it takes time to produce it, especially when there’s very high demand right after Macworld, and I had custom laser etching (for free, no less). But 8-9 days to deliver? Do Apple not use the same postal services as the rest of us mortals?

I’ll instruct my Uruk-Hai hordes to keep an eye out for iPod mini-wearing hobbits. If they spot one, grab him, tie him up, toss him over the shoulder and run through the Darkmere straight to my apartment. Should be faster.

I just need an impressive beard, then I can do a mean Saruman impression. Already got the hair part done.

3 Comments

Gallimaufry

Since I haven’t posted anything substantial here for a long while, I make up for it by repeating cute little memes. This one from Michael Hanscom, who surely got infected by it somewhere else. The rules are simple: 10 random songs from your music library, and some commentary to go.

  • Haujobb, Penetration — I am a big fan of Daniel Myer and all of his musical projects. Haujobb was his first project. First together with Dejan Samardzic, but later on his own after Dejan left Haujobb Apparently, he never left. Where did I get that from? A few of his other projects, all with their distinct styles, are Clear Vision, Cleen (first known as Cleaner), Architect, Hexer, and S’Apex. Just to name a few from the top of my head. While I prefer Haujobb’s earlier style from the first album (while Dejan was still a member), pretty much anything done under the Haujobb name is good stuff. Daniel Myer also has his fingers everywhere in the industrial/electronic scene.
  • The Sepia, Ascension — I don’t really know much about The Sepia, but they do have a bunch of good tracks on their first album, Splintered, from which this song comes. I have no idea what genre to put this in, so I’m simply calling it “electronic.” That’s my backup genre when I don’t know how to classify something.
  • Aslan Faction, Complication — I saw Aslan Faction live in April 2004, and they really got the floor moving. Lee, the drummer, was so enthusiastic that he almost flipped his drumset over. They call themselves industrial, I call them EBM.
  • This Morn’ Omina, Toltec — from the album Le Serpent Blanc/Le Serpent Rouge, which I bought completely on an impulse. Most of the songs are ambient with some occasional noise. This song would be great background music for some science fiction movie with horror elements, such as Event Horizon.
  • Eisbrecher, Mein Blut (Album-Schnitt) — yay, German growling! This song begins like something by Enya or Era. After that, they do their best to sound like Rammstein.
  • Noise Process, Dying World (Sr-Mx) — A slower mix of Dying World from the album Neural Code. I prefer the original. EBM with the standard hissing vocals.
  • The Parallel Project, Deleted Scenes — The Parallel Project is interesting. The first album, Fusion, has vocals from different artists on each track, except for track 6, which instead contains samples from Daniel Myer (I told you he was everywhere…). Deleted Scenes has vocals by Mark Jackson, also known as “that other guy in VNV Nation.” He really doesn’t get much attention standing behind Ronan Harris, but here he proves that he can sing as well as pound the drums when in VNV Nation. The Parallel Project has trance-like sound, but with pleasant vocals from both male and female singers from various other well-known bands.
  • Android Lust, Panic WroughtAndroid Lust was a song of the week here once. Shikhee is something as rare as a woman in a genre usually dominated by men. She can sing with the best of them, though.
  • Cleen, This Autumn — if it isn’t Daniel Myer popping up again. I told you I liked him… This song sounds like something from the Matrix album by Haujobb. Same strange soundscape, beginning with some ambient noises before the bass and piano appears, together with Myer murmuring strange sentenses.
  • Forma Tadre, Looking Glass Men — Forma Tadre is another favorite of mine. Too bad that Andreas Meyer (no relations to Daniel Myer, though they have a project together under the name Newt) only released two albums as Forma Tadre. This album, Navigator, is inspired by the Cthulhu Mythos and contains lots of samples from the movie Stargate. The lyrics have many references to old ancient entities if you know where to listen.
No Comments

Headhunter

Someone at Apple must like Front 242. Can you spot it?

Answer: The headlines. “Lock the Target,” “Spread the Net,” and “Catch the Man” are lyrics from Headhunter by Front 242.

2 Comments

200:1

Since I switched from Movable Type to Wordpress, I’ve had one (1) legit comment.

And 200 attempted spam comments that Spam Karma shot on sight.

Addendum: 11 days later, and I’ve passed 900.

1 Comment

Love

Have you ever been in love? Horrible isn’t it? It makes you so vulnerable. It opens your chest and it opens up your heart and it means that someone can get inside you and mess you up. You build up all these defenses, you build up a whole suit of armor, so that nothing can hurt you, then one stupid person, no different from any other stupid person, wanders into your stupid life… You give them a piece of you. They didn’t ask for it. They did something dumb one day, like kiss you or smile at you, and then your life isn’t your own anymore. Love takes hostages. It gets inside you. It eats you out and leaves you crying in the darkness, so simple a phrase like ‘maybe we should be just friends’ turns into a glass splinter working its way into your heart. It hurts. Not just in the imagination. Not just in the mind. It’s a soul-hurt, a real gets-inside-you-and-rips-you-apart pain. I hate love. Neil Gaiman

No Comments