Table of Contents
Previous Section Next Section

POPFile

POPFile is a Perl-based Bayesian filter written by John Graham-Cummings and licensed under the GPL. It does a very nice job of filtering your email messages at the client side, assuming you access your mail via POP3. (IMAP and other protocols are scheduled for coverage in upcoming versions.) POPFile is platform-independent, meaning that it can be run on any operating system that runs Perl. The platform used in this chapter is Windows 2000.

POPFile can easily be used in conjunction with other anti-spam measures. In fact, it can be used in place of (or in concert with) the filtering capability of an email client such as Mozilla Messenger. By using the filtering capability in POPFile, the email processing stream can usually be simplified. By using a tool such as POPFile, you can achieve a higher level of anti-spam filtering for the end user.

You should think carefully about using POPFile in a mail architecture where another Bayesian-style classifier is in place on the server side. Although possible, it probably makes sense to forego running a Bayesian classifier on the server and simply run POPFile at the client side. However, any of the other non-Bayesian anti-spam tools on the server can be used successfully in conjunction with POPFile. A tool like SpamAssassin can be used as well because it simply scores messages using Bayesian analysis as only part of the score, instead of routing messages directly to your spam filter. You may want to adjust the Bayesian analysis portion of the SpamAssassin score, however.

Installation

POPFile is provided as a self-installing executable, making it easy to install. The more difficult part is configuring your email client for use with its header or subject modification classification functionality. These topics are the subject of the rest of the chapter.

The most recent version of POPFile can be downloaded from Sourceforge from this URL: http://sourceforge.net/project/showfiles.php?group_id=63137. While we use version 0.21.1 here, any recent version should work with these directions. Select the file called popfile v0.21.1 Windows under the POPFile for Windows heading. When downloaded, save it to your desktop and extract it using your favorite ZIP file extractor or the native Windows XP ZIP utility if you are running under XP.

After it is extracted, run the file called setup. This will guide you through the rest of the installation process. For the most part, the questions the installer asks are self-explanatory. The default settings should be acceptable for most installations. A copy of the release notes is provided in Figure 9.1.

Figure 9.1. POPFile v0.21.1 release notes.


On the POPFile Installation Options page, you may need to adjust the POP3 port (110) and User Interface port (8080) from their defaults if something else on your machine is listening on those ports. Also, you may or may not want to have POPFile start automatically at startup. If you do not check that box, you will need to start it manually. When you are ready to have it run all the time (the recommended mode), you will have to manually put POPFile into your Startup program group. Figure 9.2 shows the POPFile Installation Options screen, with recommended settings.

Figure 9.2. POPFile Installation Options.


The next screen, POPFile Classification Bucket Creation, is where you can tell POPFile about the buckets you want to use. The defaults should be fine; they can be adjusted later. However, if you want to use POPFile for classification of email besides spam, the buckets for doing that can be created now. Figure 9.3 shows this screen with the default settings.

Figure 9.3. POPFile Classification Bucket Creation.


The next set of screens, starting with the one titled POPFile Client Configuration, shows the mail clients POPFile can attempt to automatically set up. POPFile will attempt to set up the email clients that it knows about to work with them. Note that you should go into your client after setup to be sure that POPFile set it up correctly. POPFile can attempt to set up the following types of email clients:

  • Eudora

  • Microsoft Outlook

  • Microsoft Outlook Express

  • Mozilla

If you want POPFile to attempt the setup, check the box when the appropriate Reconfigure screen comes up. The default is to not have POPFile attempt to set up your client for you.

When the POPFile Can Now Be Started screen comes up, make sure the radio button marked Run POPFile in Background is set. Otherwise, you will have to manually start POPFile in order to configure or use it.

The final screen is titled Completing the POPFile Setup Wizard. Make sure the box marked POPFile User Interface is checked so that the configuration part can begin. Assuming you checked the box, the screen shown in Figure 9.4 appears.

Figure 9.4. Completing the POPFile Setup Wizard.


Configuration

The six areas of the POPFile User Interface are as follows:

  • History

  • Buckets

  • Magnets

  • Configuration

  • Security

  • Advanced

The configuration settings available are covered in the following sections (see Figures 9.59.10).

Figure 9.5. History.


Please note that the integration of POPFile with your email client is covered in the subsequent sections of this chapter. POPFile is no different than many of the other anti-spam applications covered in this book, except that it runs on the email client instead of the server. When you run POPFile, you must still set up the email client to filter the messages based upon the classifications POPFile makes. However, the filtering setup on the email client is often simplified by the use of POPFile.

History

The History screen is where you go to reclassify errors made by POPFile. Initially, you will spend a lot of your time here after things are configured and running.

You can also view the message itself, complete with the POPFile classification information, by clicking on the subject of the message. You can reclassify messages by clicking the drop-down box, selecting the proper bucket for the message, and then hitting the Reclassify button. Don't forget to click the Reclassify button before selecting another message page or part of the POPFile configuration page. Your changes will be lost and must be reentered if you exit the page without clicking the Reclassify button.

Buckets

The screen titled Buckets is where most user configuration takes place. The term "buckets" is used by POPFile to refer to a particular classification of messages.

Figure 9.6. Buckets.


You should have one bucket for each type of message you want to classify. In this way, POPFile can be used for much more than classifying spam. It could be used to classify messages from email lists you subscribe to, for example. This screen shows the buckets in use, as well as how they are configured. It also shows a number of statistics and allows you to create, rename, and delete buckets. In addition, you can search buckets for specific words.

Under Bucket Configuration, the following options can be changed:

  • Subject Header Modification

  • X-Text-Classification Header

  • X-POPFile-Link Header

  • Quarantine Message

  • Bucket Color

Subject Header Modification controls whether you want to identify classified messages by changing the subject header. Some email clients (for example, Outlook Express) can't filter based on arbitrary header lines, so the subject line must be modified to identify how messages are classified. This is a bit of an annoyance because when you reply to a message, the name of the bucket will appear in the Subject line. The default for subject modification is on. If you are using the X-Text-Classification header, this can safely be turned off.

The X-Text-Classification Header option enables the use of the message header by that name. This is probably the easiest method to use when classifying your email, assuming that the email clients in question support filtering based upon arbitrary headers. The default for inserting this header is on.

If the X-POPFile-Link Header selection is enabled, POPFile will insert a link to the message in question so that all you have to do is click on the link to reclassify the message if it's been incorrectly classified. The default for inserting this header of the same name is on.

If the Quarantine Message setting is on, then the message will be quarantined within POPFile. Because extra steps are required to get any misclassified email out of quarantine, this is not recommended. However, it can be activated if desired. The default is off.

The final setting, Bucket Color, enables you to set the color for that particular classification of words or related items. This is useful when looking at message detail under History to quickly determine how words in messages are being classified.

Example

If you want all of the email from your friends to end up in a folder called friends on your email client, you can do this in (at least) two different ways, both using a bucket. Each method is outlined here.

The first method involves setting up a bucket called friends in POPFile. After setting up the friends bucket, your email client must be set up to filter the messages tagged with the POPFile classification friends into a folder called friends-folder. After everything has been set up, you train POPFile by classifying every message that came from your friends to go into the friends bucket. After at least one message comes in from each friend, POPFile will classify messages correctly.

An alternative is to set up a From and CC magnet (see next section) for each friend's email address. The magnet points to the friends bucket, so POPFile automatically classifies every message with your friend's email address on the From line correctly, without manual training.

Magnets

Magnets are words that cause messages to be automatically classified as they are processed.

Figure 9.7. Magnets.


Magnets can appear in From, To, Cc, and/or Subject header lines and can be thought of as whitelists/blacklists. Using magnets can reduce the amount of manual reclassifications that you might otherwise have to do.

Example

For example, let's say you want to be sure email from your spouse always ends up in your personal folder. You set a From magnet to be his or her email address, spouse@isp.net, which points at the personal bucket. Then you set up your email client to filter the messages that POPFile classifies as personal to be filed in your personal folder on your email client.

Configuration

The configuration page lists a number of items not covered anywhere else:

  • Skins

  • Language

  • Connection Timeout

  • History Settings

  • Logging

  • POPFile port number

  • POP3 settings

  • Platform settings (system tray and console window)

Most of these are self-explanatory and are based on personal preference (especially Skins, Language, and Platform settings). The defaults should be good for most people. You might want to change the other settings if you are having a problem and need to turn on logging or one of the other low-level features.

Figure 9.8. Configuration.


Security

The settings on this page manage security-related settings. This page covers the following areas:

  • Enabling remote POP3 and HTTP access

  • Remote POP3 auth server and port

  • User Interface Password

  • Automatic Update Checking

  • Reporting Statistics

Figure 9.9. Security.


The most dangerous of these settings is the remote POP3 and HTTP access. Be very careful when turning this option on because it could allow anyone to access your POPFile setup. Remote POP3 auth server is used for setups that require two POP3 servers: one for authentication and one from which to retrieve email. The User Interface Password setting enables you to set up passwords for accessing the POPFile UI. This is especially useful if you want to enable remote access to your setup. The Automatic Update Checking enables POPFile to check for program updates, and Reporting Statistics reports statistics back to the POPFile web site for aggregate reporting.

Advanced

The Advanced screen has two areas:

  • Ignored Words

  • All POPFile Parameters

Figure 9.10. Advanced.


Ignored Words lists all of the words that POPFile will not use as part of its processing. Under most circumstances, there is no reason to change this list. The All POPFile Parameters screen lists all POPFile settings. This is for advanced users only, as there is no validity checking of these settings. Under most circumstances, the other screens should be used for making configuration changes.

Operation

Initially, you will want to set up your email client to not filter based upon POPFile classifications because POPFile will be wildly inaccurate. After an initial training period of 100 to 200 messages, the filtering capability can be set up in the email client. During the initial training period, you will need to retrain POPFile on every message that is sent through so that the filters can define a base level of filtering. You can enable the email client to filter initially, but you will need to check your spam folder frequently for misclassified messages in order to not miss any legitimate messages.

After POPFile is set up, you only need to go into the History screen and reclassify messages that are not classified properly. POPFile will take care of the other routine tasks for you, such as database maintenance, purging of its history, and so on. If interested, you can view the statistics periodically by going into the Buckets page and viewing the accuracy achieved.

    Table of Contents
    Previous Section Next Section