Spam War
By Leon Erlanger
Besides being the year of war, terrorism, corporate fraud, and blackouts, 2003 was also the year of spam. As more users found their legitimate e-mail vastly outnumbered by spam, spammers and antispam vendors played a constant Tom-and-Jerry game, frantically coming up with evermore-sophisticated techniques to outfox each other.
As recently as a year ago, many antispam solutions relied on keyword recognition to separate spam from legitimate e-mail. Spammers outwitted such strategies by interspersing commas, spaces, exclamation points, and deliberate misspellings (such as V!agra) in headers and message content to get through. We’ve all seen such tricks, but you may not be aware of less obvious ploys that rely on HTML features to foil spam filters. For example, a spammer may intersperse white-on-white text or zero-font-size characters in between visible text. You won’t see such characters unless you select them with your mouse, but filters take them into account. Other tricks include using the   HTML entity to place a space between letters, adding phony HTML style tags, or indicating each letter with an HTML entity. When a keyword filter sees HTML entities and style tags, it simply reads them as text. So if a spammer uses HTML entities for letters and spaces, the filter reads V i a g r a
What a user sees is Viagra.
Spammers also place columns of letters in each cell of an invisible HTML table, so that the filter reads cell by cell, but the recipient reads across the cells. And if that’s not enough, many spammers simply render text as an HTML image. According to Chris Belthoff of antispam vendor Sophos, more than 80 percent of current spam is HTML-based.
Antispam vendors have countered with more sophisticated spam-fighting techniques. For example, Bayesian filtering rates each word and feature of a message for the likelihood it is spam, based on careful analysis of past spam and nonspam e-mail. This is very clever, but spammers have responded by packing messages with lots of legitimate text and features—visible or invisible. Highlight a spam message, and you may find an entire hidden short story, sufficient to thwart such filtering. Another tactic is to put as little information in an actual message as possible or to disguise the entire message as a topic that should interest the recipient, then link to a URL about the real spam topic.
Antispam vendors have added signatures, blacklists, and rule-based filtering to their arsenal. They set up spam honeypots to catch as much spam as possible, then create a signature to identify each. Signatures work particularly well for HTML images, according to Ken Schneider of antispam vendor Brightmail. Vendors often combine this method with blacklists of proxy sites that spammers use to hide their source IP addresses and URLs that spammers use as links. Or they may simply match a URL claiming to be a particular well-known site against its known true URL. Rules-based techniques match messages against a list of vendor rules that identify suspect e-mail. All of these techniques require frequent updating.
The contest continues. Spammers test, retest, and fine-tune their e-mails against real antispam products. They use e-mail bugs, in which one pixel links to a specific URL that tells the spammer which message got through antispam defenses and was opened by which users. They set up Web sites to test their spam against a variety of antispam solutions.
The lesson in all this is to make sure your antispam solution doesn’t rely on a single technique and that vendors demonstrate a commitment to outwitting new spam tricks as they appear. For more information on spam, see “Can E-Mail Survive?”.