[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

9.16 Thwarting Email Spam

In these last days of the Usenet, commercial vultures are hanging about and grepping through news like crazy to find email addresses they can foist off their scams and products to. As a reaction to this, many people have started putting nonsense addresses into their From lines. I think this is counterproductive—it makes it difficult for people to send you legitimate mail in response to things you write, as well as making it difficult to see who wrote what. This rewriting may perhaps be a bigger menace than the unsolicited commercial email itself in the end.

The biggest problem I have with email spam is that it comes in under false pretenses. I press g and Gnus merrily informs me that I have 10 new emails. I say “Golly gee! Happy is me!” and select the mail group, only to find two pyramid schemes, seven advertisements (“New! Miracle tonic for growing full, lustrous hair on your toes!”) and one mail asking me to repent and find some god.

This is annoying. Here’s what you can do about it.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

9.16.1 The problem of spam

First, some background on spam.

If you have access to e-mail, you are familiar with spam (technically termed UCE, Unsolicited Commercial E-mail). Simply put, it exists because e-mail delivery is very cheap compared to paper mail, so only a very small percentage of people need to respond to an UCE to make it worthwhile to the advertiser. Ironically, one of the most common spams is the one offering a database of e-mail addresses for further spamming. Senders of spam are usually called spammers, but terms like vermin, scum, sociopaths, and morons are in common use as well.

Spam comes from a wide variety of sources. It is simply impossible to dispose of all spam without discarding useful messages. A good example is the TMDA system, which requires senders unknown to you to confirm themselves as legitimate senders before their e-mail can reach you. Without getting into the technical side of TMDA, a downside is clearly that e-mail from legitimate sources may be discarded if those sources can’t or won’t confirm themselves through the TMDA system. Another problem with TMDA is that it requires its users to have a basic understanding of e-mail delivery and processing.

The simplest approach to filtering spam is filtering, at the mail server or when you sort through incoming mail. If you get 200 spam messages per day from ‘random-address@vmadmin.com’, you block ‘vmadmin.com’. If you get 200 messages about ‘VIAGRA’, you discard all messages with ‘VIAGRA’ in the message. If you get lots of spam from Bulgaria, for example, you try to filter all mail from Bulgarian IPs.

This, unfortunately, is a great way to discard legitimate e-mail. The risks of blocking a whole country (Bulgaria, Norway, Nigeria, China, etc.) or even a continent (Asia, Africa, Europe, etc.) from contacting you should be obvious, so don’t do it if you have the choice.

In another instance, the very informative and useful RISKS digest has been blocked by overzealous mail filters because it contained words that were common in spam messages. Nevertheless, in isolated cases, with great care, direct filtering of mail can be useful.

Another approach to filtering e-mail is the distributed spam processing, for instance DCC implements such a system. In essence, N systems around the world agree that a machine X in Ghana, Estonia, or California is sending out spam e-mail, and these N systems enter X or the spam e-mail from X into a database. The criteria for spam detection vary—it may be the number of messages sent, the content of the messages, and so on. When a user of the distributed processing system wants to find out if a message is spam, he consults one of those N systems.

Distributed spam processing works very well against spammers that send a large number of messages at once, but it requires the user to set up fairly complicated checks. There are commercial and free distributed spam processing systems. Distributed spam processing has its risks as well. For instance legitimate e-mail senders have been accused of sending spam, and their web sites and mailing lists have been shut down for some time because of the incident.

The statistical approach to spam filtering is also popular. It is based on a statistical analysis of previous spam messages. Usually the analysis is a simple word frequency count, with perhaps pairs of words or 3-word combinations thrown into the mix. Statistical analysis of spam works very well in most of the cases, but it can classify legitimate e-mail as spam in some cases. It takes time to run the analysis, the full message must be analyzed, and the user has to store the database of spam analysis. Statistical analysis on the server is gaining popularity. This has the advantage of letting the user Just Read Mail, but has the disadvantage that it’s harder to tell the server that it has misclassified mail.

Fighting spam is not easy, no matter what anyone says. There is no magic switch that will distinguish Viagra ads from Mom’s e-mails. Even people are having a hard time telling spam apart from non-spam, because spammers are actively looking to fool us into thinking they are Mom, essentially. Spamming is irritating, irresponsible, and idiotic behavior from a bunch of people who think the world owes them a favor. We hope the following sections will help you in fighting the spam plague.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

9.16.2 Anti-Spam Basics

One way of dealing with spam is having Gnus split out all spam into a ‘spam’ mail group (see section Splitting Mail).

First, pick one (1) valid mail address that you can be reached at, and put it in your From header of all your news articles. (I’ve chosen ‘larsi@trym.ifi.uio.no’, but for many addresses on the form ‘larsi+usenet@ifi.uio.no’ will be a better choice. Ask your sysadmin whether your sendmail installation accepts keywords in the local part of the mail address.)

 
(setq message-default-news-headers
      "From: Lars Magne Ingebrigtsen <larsi@trym.ifi.uio.no>\n")

Then put the following split rule in nnmail-split-fancy (see section Fancy Mail Splitting):

 
(...
 (to "larsi@trym.ifi.uio.no"
     (| ("subject" "re:.*" "misc")
        ("references" ".*@.*" "misc")
        "spam"))
 ...)

This says that all mail to this address is suspect, but if it has a Subject that starts with a ‘Re:’ or has a References header, it’s probably ok. All the rest goes to the ‘spam’ group. (This idea probably comes from Tim Pierce.)

In addition, many mail spammers talk directly to your SMTP server and do not include your email address explicitly in the To header. Why they do this is unknown—perhaps it’s to thwart this thwarting scheme? In any case, this is trivial to deal with—you just put anything not addressed to you in the ‘spam’ group by ending your fancy split rule in this way:

 
(
 ...
 (to "larsi" "misc")
 "spam")

In my experience, this will sort virtually everything into the right group. You still have to check the ‘spam’ group from time to time to check for legitimate mail, though. If you feel like being a good net citizen, you can even send off complaints to the proper authorities on each unsolicited commercial email—at your leisure.

This works for me. It allows people an easy way to contact me (they can just press r in the usual way), and I’m not bothered at all with spam. It’s a win-win situation. Forging From headers to point to non-existent domains is yucky, in my opinion.

Be careful with this approach. Spammers are wise to it.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

9.16.3 SpamAssassin, Vipul’s Razor, DCC, etc

The days where the hints in the previous section were sufficient in avoiding spam are coming to an end. There are many tools out there that claim to reduce the amount of spam you get. This section could easily become outdated fast, as new products replace old, but fortunately most of these tools seem to have similar interfaces. Even though this section will use SpamAssassin as an example, it should be easy to adapt it to most other tools.

Note that this section does not involve the spam.el package, which is discussed in the next section. If you don’t care for all the features of spam.el, you can make do with these simple recipes.

If the tool you are using is not installed on the mail server, you need to invoke it yourself. Ideas on how to use the :postscript mail source parameter (see section Mail Source Specifiers) follow.

 
(setq mail-sources
      '((file :prescript "formail -bs spamassassin < /var/mail/%u")
        (pop :user "jrl"
             :server "pophost"
             :postscript
             "mv %t /tmp/foo; formail -bs spamc < /tmp/foo > %t")))

Once you manage to process your incoming spool somehow, thus making the mail contain, e.g., a header indicating it is spam, you are ready to filter it out. Using normal split methods (see section Splitting Mail):

 
(setq nnmail-split-methods '(("spam"  "^X-Spam-Flag: YES")
                             ...))

Or using fancy split methods (see section Fancy Mail Splitting):

 
(setq nnmail-split-methods 'nnmail-split-fancy
      nnmail-split-fancy '(| ("X-Spam-Flag" "YES" "spam")
                             ...))

Some people might not like the idea of piping the mail through various programs using a :prescript (if some program is buggy, you might lose all mail). If you are one of them, another solution is to call the external tools during splitting. Example fancy split method:

 
(setq nnmail-split-fancy '(| (: kevin-spamassassin)
                             ...))
(defun kevin-spamassassin ()
  (save-excursion
    (save-restriction
      (widen)
      (if (eq 1 (call-process-region (point-min) (point-max)
                                     "spamc" nil nil nil "-c"))
          "spam"))))

Note that with the nnimap back end, message bodies will not be downloaded by default. You need to set nnimap-split-download-body to t to do that (see section Client-Side IMAP Splitting).

That is about it. As some spam is likely to get through anyway, you might want to have a nifty function to call when you happen to read spam. And here is the nifty function:

 
(defun my-gnus-raze-spam ()
  "Submit SPAM to Vipul's Razor, then mark it as expirable."
  (interactive)
  (gnus-summary-save-in-pipe "razor-report -f -d" t)
  (gnus-summary-mark-as-expirable 1))

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

9.16.4 Hashcash

A novel technique to fight spam is to require senders to do something costly and demonstrably unique for each message they send. This has the obvious drawback that you cannot rely on everyone in the world using this technique, since it is not part of the Internet standards, but it may be useful in smaller communities.

While the tools in the previous section work well in practice, they work only because the tools are constantly maintained and updated as new form of spam appears. This means that a small percentage of spam will always get through. It also means that somewhere, someone needs to read lots of spam to update these tools. Hashcash avoids that, but instead prefers that everyone you contact through e-mail supports the scheme. You can view the two approaches as pragmatic vs dogmatic. The approaches have their own advantages and disadvantages, but as often in the real world, a combination of them is stronger than either one of them separately.

The “something costly” is to burn CPU time, more specifically to compute a hash collision up to a certain number of bits. The resulting hashcash cookie is inserted in a ‘X-Hashcash:’ header. For more details, and for the external application hashcash you need to install to use this feature, see http://www.hashcash.org/. Even more information can be found at http://www.camram.org/.

If you wish to generate hashcash for each message you send, you can customize message-generate-hashcash (see (message)Mail Headers section ‘Mail Headers’ in The Message Manual), as in:

 
(setq message-generate-hashcash t)

You will need to set up some additional variables as well:

hashcash-default-payment

This variable indicates the default number of bits the hash collision should consist of. By default this is 20. Suggested useful values include 17 to 29.

hashcash-payment-alist

Some receivers may require you to spend burn more CPU time than the default. This variable contains a list of ‘(addr amount)’ cells, where addr is the receiver (email address or newsgroup) and amount is the number of bits in the collision that is needed. It can also contain ‘(addr string amount)’ cells, where the string is the string to use (normally the email address or newsgroup name is used).

hashcash-path

Where the hashcash binary is installed. This variable should be automatically set by executable-find, but if it’s nil (usually because the hashcash binary is not in your path) you’ll get a warning when you check hashcash payments and an error when you generate hashcash payments.

Gnus can verify hashcash cookies, although this can also be done by hand customized mail filtering scripts. To verify a hashcash cookie in a message, use the mail-check-payment function in the hashcash.el library. You can also use the spam.el package with the spam-use-hashcash back end to validate hashcash cookies in incoming mail and filter mail accordingly (see section Anti-spam Hashcash Payments).


[ < ] [ > ]   [ << ] [ Up ] [ >> ]

This document was generated on January 25, 2015 using texi2html 1.82.