Feb 23 2009

Spotify Bay

Welcome to Spotify Bay.

There’s an application called SpotSave making waves in the Spotify community. SpotSave lets you save music from Spotify straight to your computer, no strings attached, with the same quality you hear straight from Spotify itself.

I haven’t tried it myself, because to be quite frank, Spotify stinks and doesn’t have any music I enjoy after the Great Purge the record industry performed. (Probably because they don’t really want to see Spotify succeed, because then they’d have to move forward to a new business model.)

Now, consider the following statements:

  • SpotSave lets you connect to Spotify to download music to your computer.
  • µTorrent lets you connect to clients via The Pirate Bay to download music to your computer.

Is there any difference here?

Technically? Not really. Technology doesn’t care about concepts like “copyright” and “fair use.”

Spotify wasn’t designed to let you download music — the intended design is that you stream music to listen to it.

Pissing in the stream

Here’s another thing technology doesn’t care about — the intended design. Here we have another couple of statements to consider:

  • Receive a stream of data from the internet and write it to your hard drive.
  • Receive a stream of data from the internet and don’t write it to your hard drive.

When you download, you receive a data stream from the internet and write it to your hard drive. When you stream, you receive a data stream from the internet and let an application do something with it, and then throw the data away.

From an outside perspective, it looks identical — a data stream going from the internet to your computer. What happens inside your computer is what makes the difference between streaming and downloading.

Once the data stream reaches your computer, it’s a Wild West. Spotify intends for me to stream the data to the Spotify application and never save it, but who are they to tell me what to do with a data stream my computer receives from the internet? Sure, there’s probably some unreadable legalese in the Spotify EULA about this, but that’s not exactly enforcable without a Spotify representative watching over my shoulder, is it?

I haven’t tried SpotSave, but here’s a qualified guess at what happens: it looks at connections to/from your computer, identifies the ones going to Spotify, and then makes a copy of the streamed music and writes it to disk.

This is very basic stuff, and has been done before. It was a popular method to save web radio transmissions for later use, and probably the main reason the record industry got their panties in a bunch about web radio technology in the first place.

Floodgates

Since history tends to repeat itself, this will start an arms race between Spotify and SpotSave. Spotify will start by encrypting their data stream (and I’m surprised they didn’t do it in the first place). If the SpotSave authors pick up the thrown gauntlet, they’ll dig deeper into Spotify’s allocated memory and rip the decrypted stream out of that instead. Spotify might claim the Blizzard defense and state that they own the copyright of a part of memory in your computer and sue SpotSave for copyright infringement. And so on.

This is why DRM – Digital Restriction Management (though some people insist on the R meaning “Rights”) — keeps failing. In order to prevent the product from being copied, they lock it up with encryption. But the customers can’t play it if it’s encrypted, so the key to unlock the encrypted data is also included in the product the customer buys.

That’s right; the customer gets both the lock and the key. It’s always just a matter of time until someone discovers where the key is hidden, and then the floodgates are wide open again. All it takes is one person to discover it and then tell someone else. Security through obscurity isn’t.

Here’s another cute little application of relevance — Mutify. Mutify is an app that also listens to the incoming data stream to Spotify. If it detects a song with a title that is in its database, it simply mutes Spotify until the next song starts. The list of “songs” are, of course, the ads Spotify plays for non-paying accounts. If there are new ads you can just click “This is an ad” in Mutify and enjoy the silence.

The arms race has already started here, and Mutify currently doesn’t work as intended with Spotify — Spotify simply pauses the ad when Mutify mutes the sound. Until then, you can just lower the volume yourself. Let’s see Spotify try to work around that.

On a similar note, there was a faceless TV exec that expressed great horror at the concept of switching to a different channel during the commercial breaks, stating that you violated a social contract by doing so. What if I need to go pee? What if I mute the sound and read a book until the commercials are over?

Owning your own interpretation

I have random thoughts about this all the time — what kind of control do I actually have over the interpretation of data streams arriving at my computer?

Let’s take web pages. They’re written in HTML, which is basically a language that tells your web browser how to display a page.

You could argue that I’m violating a contract by having a program that auto-mutes Spotify whenever an ad plays. Am I violating a contract if I tell my browser to not show images even if the HTML tells it to?

I use GlimmerBlocker to strip out the image tags for ads and banners from the stream of HTML before it reaches my browser. Am I violating any contract here? I’m clearly not viewing the page as the designer intended.

It’s the Wild West again. Once HTML reaches my computer, it’s up to me to render it as I see fit. Noone would argue with me if I surfed with images disabled in the browser due to being on a very slow connection. Stripping out useless banner ads not only preserves your sanity, it also makes the page load way faster due to all the needless crap you don’t have to download.

I’ve specifically configured my ad blocker to let text ads from Google through. These ads aren’t intrusive and don’t tell you to punch the monkey. This is the type of ads I want to encourage, so I let them display.

Once or twice a year I even click on one.


Feb 6 2009

Sharing is Caring

Public service announcement: I read my feeds in Google Reader, and I end up sharing tons of entries I find interesting and/or weird.

Here’s the shared page, or go straight to the feed for it.

There will be the occasional item in Swedish, but most of it is English.

Addendum: I should also mention that I have a habit of sharing things that I know interest people that follow my shared items. Breki wrote about Things recently, so I’ve shared a handful of Things-related entries that show up in my feed.

Want me to share stuff that interests you? Make sure I read your blog and tell me that you follow my shared items, and I’m almost certain to start sharing stuff you’ve blogged about recently.


Jan 23 2009

Bookmarks for January 23rd

I’m trying Postalicious to automagically post my Ma.gnolia bookmarks whenever I have enough of them to post. I’ll need to fiddle a bit with the settings, so for the moment I’m doing a bit of manual stuff. Let’s see how this works and if I actually manage to write stuff between the generated link dumps…

I’ve already found some bug in the default templates that try to stick paragraph tags in silly places.

These are my links for January 23rd from 04:38 to 04:41:

  • Dutch government study: net effect of P2P use is positive – The Dutch Ministry of Economic Affairs commissioned a study by research company TNO about how much Dutch Internet users download music, movies, and games, and what the social and economic effects of this downloading are.
  • Practika: A Free Icon Set – Practika: a free set of 11 practical and useful high-quality icons, designed by DryIcons, especially for Smashing Magazine and its readers. The icons are available in resolutions 64×64px, 128×128px

May 18 2006

Spam of the Year

This spam to my Gmail account had me laughing out loud.

Subject: Ihre Domain www.gmail.com ist nicht bei Google gelistet!

It’s a German spammer trying to tell me that “my” domain, gmail.com, isn’t listed in Google!

So yeah, you stupid spammer dudes at Finke Marketing. Thanks for the chuckles.


Mar 29 2006

Hey, Warren!

Stop doing that, Warren. It’s wide open for abuse, such as me making all your readers who read via your RSS feed see this.

What I’m talking about is publishing all your Technorati occurences and Flickr comments in your article feed. Reading your own stuff? I love it. That’s what I want. Reading what every boring entity on the planet writes about you? In French? No thanks, not interested. I don’t speak French.

I suppose I could have e-mailed Warren about this instead, but where’s the fun in that? Feel free to call me a dick in the comments.


Mar 20 2006

Snailmail Spam

I had a letter waiting for me when I got home after watching V For Vendetta, of which I might rant later. US air mail, eh? I rip it open.

Some silly domain registrar, Domain Registry of America (who in the fine print state they are not affiliated with or endorsed by the government of the United States) wants me to host this very domain, and what a fancy name said domain has, at them instead of Gandi, my current registrar.

I would consider it if their yearly fee wasn’t 216% of what I’m paying right now. €26 per year? I pay €12 now. That pricing has no attachment whatsoever to reality.

And why does your spam look so much like a bill? Hidden deep in a paragraph you say that it isn’t, yet you do your best to make it look like one. Add some FUD about “losing your online identity” and I put you on my shit list.

Well, thanks for sending me something to light the fireplace with.


Jan 30 2005

An introduction to mod_security

Inspired by this article I decided to make a similar article that shows the advantages of mod_security over stopping spam by using mod_rewrite.

I started using TextDrive in June 2004. When comment spam became a very large problem for Movable Type users due to poor programming in mt-comments.cgi, a mailing list was set up to figure out a way to fight back against spam. And mod_security was our weapon of choice.

Addition: I should mention that other TextDrive users usually won’t have to bother blocking the common spam; we spot attacks very quickly on the aforementioned mailing list and add a global rule to block it across all TextDrive servers.

This is what mod_security has to say about itself in a single paragraph:

ModSecurity is an open source intrusion detection and prevention engine for web applications. Operating as an Apache Web server module, the purpose of ModSecurity is to increase web application security, protecting web applications from known and unknown attacks.

While mod_rewrite is good at rewriting URLs, it’s a very poor choice for fighting spam. It requires quite a lot of obscure commands to block a single URL. mod_security, on the other hand, can block an URL with a single line in your .htaccess file.

I won’t explain how you install mod_security, so let’s pretend we already covered that part. Now for the good stuff.

Configuring mod_security

This is how you start mod_security, either in your global Apache configuration, or in a .htaccess file:

SecFilterEngine DynamicOnly
SecFilterScanPOST On
SecAuditLog logs/audit_log

The first line tells Apache to run mod_security, but only on dynamic pages (PHP, CGI scripts, whathaveyou). You can also set it to On instead of DynamicOnly, if you want to scan all requests for all pages.

The second line is where mod_security really starts to trounce mod_rewrite: enable scanning of POST headers. This is something that mod_rewrite is unable to do.

The POST data is the actual data that gets submitted to a web server, such as comment forms. This means that mod_security can filter based on content in the comments, and even in specific fields, if you only want to make a rule based on the author of a comment.

The third line tells Apache where to store the audit log from mod_security. This log file contains everything that mod_security catches, if you have configured it to log that particular rule.

Let’s add a fourth line before we begin the actual block rules: the default action.

SecFilterDefaultAction "deny,log,status:412"

This set the default action for rules that have no action defined, so that you don’t have to re-type the action for every rule. This line sets the default mode to “block the request, log it, and give the client an Error 412.”

I prefer Error 412 (Precondition Failed) over Error 403 (Access Denied). 403 is “You’re not allowed to be here,” while 412 is “We don’t serve your kind here.” 403 is the “Staff only” sign; 412 is the bouncer at the door checking his list of misbehaving persons.

Let’s start blocking!

Now, let’s build some rules. The basic rules have two formats:

SecFilter PATTERN [ACTION]

This scans the request for PATTERN, and uses the default action if it matches PATTERN. It also accepts an optional ACTION argument, which uses the same format as the SecFilterDefaultAction above. If you have lots of spam to block, it’s easier to define a default action and only use the first version to block spam.

However, it doesn’t scan the POST headers unless we told mod_security to do so. Which we did above. So you could create a rule to stop viagra spam like this:

SecFilter "viagra"

This will block referral spam containing “viagra” in the URL or in a comment (since we enabled POST scans). But since SecFilter scans the entire request, it also checks for it in the user agent field. While I don’t know about any browsers called “Viagra” we can never be sure that they really do exist, and that’s why I prefer to be very specific about what part of the request should be scanned. We really don’t want to block legitimate users by accident, like comments containing “Hey, I get tons of Viagra spam too!”

You can also use regular expressions in the rules:

SecFilter "(viagra|mortgage|herbal)"

If we want to use an action different than the default action, we can do it like this:

SecFilter "viagra" "allow,nolog"

This will allow anything containing “viagra” to pass the filter, and it won’t be logged in the audit log.

Selective blocking

To do a more specific scan, we can use SecFilterSelective instead. It takes the following arguments:

SecFilterSelective LOCATION PATTERN [ACTION]

Now we can define what part of the request we want to scan in, by supplying the LOCATION argument before the PATTERN argument. Let’s say we get tons of referral spam by someone pimping his “buyviagra.com” site. We can scan in the referral header only and block his entire domain from ever referring us:

SecFilterSelective "HTTP_REFERER" "buyviagra.com"

NOTE: As of mod_security 1.8, there is no need to escape dots in domain names. This is managed automatically by mod_security.

Presto! We never see referral spam from that domain again. Note that I did not supply the ACTION argument, since it saves me some typing to let the default action trickle down from the settings above. It also makes it easier to read the rules.

Note, however, that this only blocks referrals from that specific domain. There’s nothing stopping him from referral spamming with “buymyviagradamnit.com” instead. We can of course use regular expressions here as well:

SecFilterSelective "HTTP_REFERER" "(viagra|mortgage|texasholdem)"

There are many fields you can scan selectively, and you can also define several fields to scan on the same line. Just separate them by commas in the LOCATION argument. For a list of all fields you can scan selectively, please see the reference manual.

Blocking IP addresses

If there’s a specific IP address that hits you especially hard, you can block it by scanning the REMOTE_ADDR header:

SecFilterSelective "REMOTE_ADDR" "^83.142.57.250$"

Note that I begin the pattern with ^ and end it with $. These are regular expression special characters that tell it to only match from the beginning of the line, as well as the end of the line. If I didn’t have the starting ^, I would not only block 83.142.57.250, but also 183.142.57.250 since it contains the same pattern. Using them both means “match the entire line.”

Scanning POST payloads

So far we’ve done the same things that we can do with mod_rewrite, and the only advantage has been that it saved us some typing and resulted in more readable lines. Now for something that mod_rewrite cannot do: scanning POST content!

The POST headers contain the contents of forms that are submitted to the server from the browser. Scanning this means you can scan the contents of comments, and find attempted spam even there. Use the POST_PAYLOAD location to scan:

SecFilterSelective "POST_PAYLOAD" "(mortgage|viagra)"

And now nobody can post comments containing mortgage or viagra any more.

But it doesn’t stop there! You can also scan inside specific arguments in the POST payload. Let’s say we want to allow people to talk about viagra and other spammy words, but disallow those words in the URL field in Movable Type and WordPress. In both of these, the URL field is called url.

SecFilterSelective "ARG_url" "(mortgage|viagra)"

Closing statements

That was a brief introduction to the most useful features of mod_security. Remember to always think about what it is you will really block with the rule you just wrote, and figure out a way to be specific enough without trapping legitimate users.

Mark Pilgrim once wrote an entry about the futility of blocking specific domains, and I agree completely.

Savor this moment, folks. You can tell your children stories of how, back in the early days of weblogging, you could print out the entire spam blacklist on a single sheet of paper. Maybe with two or three columns and a smallish font, but still. Boy, those were the days.

And they won’t last. They absolutely won’t last. They won’t last a month. The domain list will grow so unwieldy so quickly, you won’t know what hit you. It’ll get so big that it will take real bandwidth just to host it. Keeping it a free download will make you go broke. Code is free, but bandwidth never will be. Do you have a business plan? You’ll need one within 6 months. Mark Pilgrim

This is why it will be very tiresome to block specific domains. Right now there is a spammer who has bought expired domains, and use them for referral and comment spam. There’s nothing spammy about these domain names; no “viagra” or “mortgage” that you can scan for. As the master of your own domain, there’s not really much you can do about attacks like these except for blocking the individual domains.

The real battle here must be fought at a server-wide level. There are Apache modules in the works that can scan hits across entire web servers and all the domains hosted there, and find patterns in these hits. Unless it’s the Google bot doing a drive-by, 200 domains hosted by the same company are very unlikely to be hit by the same comment spam within 24 hours, and here you can find a pattern and block it.

There is already a module originally designed to fight DDoS attacks. By modifying the thresholds on this module, it can be used to block IP addresses that try to flood with comments or referrals too fast.

But it will almost always be the spammer that chooses the battlefield. The spammers just have to open the floodgate on their spam tools; it is us normal users that have to bother about verifying the visitors and comments so we don’t block the genuine stuff by accident. There are good countermeasures against comment spam, but the only 100% certain method is to disable comments completely.

It wouldn’t be too hard to script a browser to make it a spam tool, and I have reason to suspect that spammers already do this. Imagine a worm that infects Windows computers around the world (not too taxing on the imagination), and then sits hidden and uses Internet Explorer to act, sound and smell like a genuine browser, including calculacing hash cash and other popular spam/DDoS countermeasures. The spammers don’t care; they have all the time in the world and aren’t even using their own computers for the calculations.

Imagine 500,000 of these computers, all able to be remotely controlled by spammers who then pay for access to their network of distributed zombie machines with real browsers doing the work as to better look like a genuine commenter. Even if a zombie only sends one spam comment per minute to avoid detection by flood countermeasures, that’s still 100,000 comments in a minute from the entire zombie net. 360,000,000 comments in a day.

The hash cash and other checksum systems will say that these are genuine comments. That’s why a good spam countermeasure uses several methods to scan the incoming comment. Again, the spammers put the burden of using resources on their victims.

Further reading


Dec 11 2004

Nasty crawlers

There’s a discussion on the TextDrive forums about how the MSN spider bot behaves. And it’s quite rude.

Microsoft wanted to be able to boast with a large page index when their new MSN Search went public beta. So they released the leash on the MSN crawler and let it index at full speed, saturating the bandwidth of the victim site if necessary.

That equals about $150 of bandwidth bills in two weeks for TextDrive, or $4000 yearly. So it was banned for a while until it behaved properly. Paying $4000 per year just to be in a search engine is madness.

MSN Search isn’t very smart either. Quite frankly, it’s stupid. It wasn’t quite banned from TextDrive servers; it actually got a redirect via mod_security to the MSN Bot info page. It then parsed the info on that page as if it was the result from the pages it was denied access to, and added it as a search result for those pages.

Stupid, stupid, stupid.

I also had a visit from the Popdex crawler today.

My definition of rude and abusive bots is as follows: if it leaves a referrer without the referring page actually containing a link to my site, it is considered fraudulent behavior. If it gorges and gobbles pages at a rapid pace, it is considered abuse of my site.

The Popdex crawler did both. It crawled 350 pages in two minutes. Twice per page, for 700 requests in two minutes. And it filled my logs with fake referrals to popdex.com.

Bam. Banned.