Fri, 27 Jun 2008 00:00:00 GMT

Comcast is Making Email Unreliable

Among the many things that I look after, I manage the email for several hundred domain names.  A large portion of these domains are for individual artist websites and thus have only a couple of actual email addresses.  In most cases I just forward any inbound email to the artist's ISP or web mail email account.  We don't filter that email in any way so everything, including spam, gets forwarded to the owners actually email address.  We have avoided spam filtering because most people already have spam filtering on their email account so our filtering would not be beneficial for the recipient.  I'm sure that the spam filtering efforts by GMail, Comcast, Yahoo, etc is much better that I'm going to be able to implement.

Over the last month or so we've started running into issues with Comcast and AT&T blocking all email from our servers due to the fact that they receive what they consider spam from our servers.  We have gotten our servers unblocked but today, Comcast has blocked us again.  So, to be able to deliver email to Comcast we have to "clean" all email that passes through our servers.  We have no idea about what the triggers are for Comcast to block a server.  The barrier is likely to be fairly low as we don't have all that much email traffic in total.  So to keep our standing with Comcast, we will have to be brutal.  We will have to consider any email that might possibly be spam as spam and bounce it.  If only a tiny percentage of spam gets though our filters, we might get blocked again.  The net effect is that some legitimate email will bounced.

While neither us nor our customers are doing anything wrong, Comcast is forcing us to not just tag potential spam as spam but forcing us to block it entirely.  Essentially they are pushing their problems on us.

The net effect of all this is that Comcast will be forcing many smaller operations that process smaller amounts of email to find their own solutions to deal with the "Comcast" email problem.  Each operator will find a way that will cost in aggregate thousands, maybe millions of man-hours of effort and will at a net, reduce the percentage of successful legitimate email deliveries.  Spam has made email less useful but these efforts by Comcast will be adding some of the last few nails to the email coffin.  I'd love to see email disappear but it won't until something better takes it's place.
Sat, 15 Mar 2008 20:08:00 GMT

I Hate Email

I don't use the word "hate" very often.  I reserve that work for things that I dislike with a real passion but email is becoming one of those things.  If you attempted to follow my previous posting about Controlling SPAM you can guess why I have this passion.

I wish that I could give up email altogether.  I think that this will happen in the next few years but at least at this point, there is not a better alternative for most of the people that I communicate with.  I have found that Twitter and IM have become integral parts of my communications infrastructure but it doesn't and will never come close to replacing the majority of my communications needs.  The long breaks in my blogging record suggest that blogging is not a good communications mechanism for me.  Most of the social networks out there just seem to add to the spam and privacy problems and don't really add much positive to my communications.  I'm just stuck with email for a while.

There are some good technologies out there to "fix" email.  DomainKeys and Sender Policy Framework (SPF) are two technologies that could to a lot to climate the problems with SPAM but there is just too much inertia in the install based of technology and administrator skill sets to actually get a critical mass of adoption.  If the weight of spam has not overcome this inertia by now, I don't think it ever will.

I think that the only thing that will fix the spam problem is something new that replaces email.  That new techology must have obvious benefits and have spam resistance built in from the beginning.  Earlier adopters will legitimize the technology and will eventually drag the rest of the world into using that technology.  We are seeing these kinds of shifts with the use of Facebook and Twitter but the closed, centralzied nature of both these system make them inappropriate for mass adoption that the internet infrastucture level that is required to really replace email.  By the way, when I speak of "email" here, I'm refering to SMTP email.  I think that we will always have email as in electronic mail but it may be based on completely different underlying technology than the SMTP that we see today.

What will replace SMTP email?  That's a pretty tough question.  There doesn't seem to be anything with momemtium on the horizon yet.  It is something that I've been thinking about and does tie into the OpenPersona idea that I've been playing with.  Maybe it will come out of that effort.

Sat, 15 Mar 2008 18:10:00 GMT

Controlling SPAM

I've been noticing that the amount of spam that I get has been going up.  Up until about a month ago, I was receiving about 1000 spam messages a day but that has risen to about 3000 per day over the last week or so.  I have been using GMail for managing my email and it had been great at filtering out this spam.  Virtually no false positives (good messages going into the spam folder) and about 1-2% false negatives (spam not getting put into spam filter).  That left me with about 10-20 spam messages a day to deal with.  Not too much overhead.  Sometime over the last couple of days, Google must have changed their spam filters in some way.  I suspect it was in response to increasing levels of spam.  The net effect was that the false positives went from practically none to about 70%.  In other words, about 70% of my legitmate email was going into the spam folder with 3000 spam messages.

Well that made GMail's spam filter just about useless.  It was time to see if I could figure out some ways to filter out some of this spam before it got to GMail so that I could do occasional, manual false positive checks in the spam folder.  So the first question is "How is it possible to get 3000 spam messages a day?"  That's easy.  I have two domain names that send all email, regardless of address, to my GMail account.  I've had these for many years and use them to create ad hoc "BACN" email addresses for signing up for new services.  I'll call these domains my BACN domains and use BACN.com generically.  I embed a standard code and the website's domain name into the email address so that if I start to get spam, I know who to blame (and block).  For example, my email address might look like this: asdfa.newwebsite.com@BACN.com.  The "asdfa" code (not what I really use) has been a string that I've embedded with the thought that at some time I could use this to help in my spam filtering.  That time is now!

I've learned a few things about spam from using these catchall BACN email setups.  First, a number of websites have sold/given/lost their email lists to spammers.  A couple that come to mind are Napster, Bicycle.com, and my local gas and electricity company.  It is also very interesting to see just how much spam is sent to made up email accounts.  I see a lot of random looking string as email accounts.  Others look like that they might be an account name from some other domain with my BACN domain tacked on the end.  Others include HTML tags and attributes (like HREF or MAILTO) and are obviously due to HTML parsing errors when the spammers were trying to harvest email addresses from web pages.

Another factor in my large number of spam messages is that I manage several hundred domain names.  Some are for my own projects, others are for clients, friends and relatives.  A lot of these domains have legitimate email addresses that forward to me.  I've yet to find any way to keep any email address spam free short of never telling anyone about it and not using it.  Also, when registering these domains, they must have a legitimate contact email address and it's really important that I get any legitimate email that is sent to these accounts.  I have 3 email addresses that are used for this purpose and so they end up in the public whois registration database entries for those domains.  The whois database is a favorite place for spammers to harvest email addresses so these 3 addresses get spammed heavily.

So how to do some pretty brutal spam trimming?  My solution is not for everyone.  It involves Sendmail, Procmail and an extra GMail account.  I happen to have the luxury (and the associated maintenance overhead) of having a dedicated Debian Linux server that handles some of my client's email and all of my email.  I could run spamassassin or other linux server spam filtering software but I want to keep this simple to implement and manage.  I've used these server based spam filters in the past but found them to be overkill for the use of a relatively small number of people.  Spam filtering is not a service that I need to offer my clients.  Most of the email that comes to this server just gets forwarded off to some other email account via a Sendmail virtusertable configuration file.  Even my own email just gets forwarded to my GMail account.  So my first line of defending myself from the spam was to create a local email account that I forward all of my BACN.  I then implemented a procmail filter that would only forward mail that had the the special code "asdfa" in the To address field.  What gets forwarded is what I call potentially good BACN.  What gets left is pure spam and discarded.  Here is an example of that filter with dummy data and email addresses inserted:


:0
* ^To: .*asdfa.*
! spamfilteraccount@gmail.com


spamfilteraccount@gmail.com is not a real GMail account (at least its not mine) but just a place holder for my real, spam filtering only, Gmail account.  I forward my potentially good BACN to this GMail account along with my whois database email addresses and a few other heavily spammed accounts.  In that GMail spam account I set it up to immediately forward all mail to my real GMail account.  This only forward messages that don't get caught in it's spam filter.  False positives in this stream of email are tolerable because this email is BACN plus some spam.

So now I have a 4 level spam filtering strategy.

  1. A sendmail virtusertable file that blocks some known spammed email addresses that I just don't need any more.  Like my bicycle.com website email address.  I also forward email addresses that are my main contact email addresses directly to my main GMail account.  This short circuit of the process reduces the chances of false positives and even if there are false positives, they will show up in my main GMail account.  This account won't get too diluted by spam so I can occasionally check for them.
  2. BACN+spam is sent to a local email account that has a procmail filter to strip out all email that doesn't have "asdfa" in the To field.
  3. Potentially good BACN is sent a special spam GMail account that is used to filter out real spam sent to BACN email addresses.
  4. Finally I use my main GMail account's spam filtering as a final line of defense but I can still check it for false positives.

I implemented this strategy about 3 hours ago.  The procmail filter, has caught about 200 messages since then.  All spam.  The GMail spam account has caught about 40 spam messages.  All real spam sent to my BACN and whois accounts.  My main GMail has caught 5 spam messages and missed one that I had to manually mark as spam.

That feels much better!
Fri, 10 Feb 2006 06:02:00 GMT

Personal Data Backup

I recently read an article by Tim Bray about personal data backup. While the article did not have a lot of specific about software to use, he did provide some very good guidelines to keep in mind.

In that spirit, I thought that I would share my own approach to backing up my personal data in my home environment.

To begin with, I should describe my home setup. My wife and I each have our own home offices with desktop computers. My desktop is running SuSE Linux 9.3 and my wife runs Windows XP Pro. The living room contains another Windows XP Pro machine that is our home entertainment center and contains a considerable amount (400GB) of music and video and is attached to our projector and stereo system.

We also have a "server closet" containing a variable number of PCs running with connected to a KVM switch and a single keyboard/monitor/mouse setup. In that closet there is always an old Debian Linux 3.1 machine, our routers, switches and cable modem. There is generally a couple of other computers depending on current projects.

At the moment I also have 4 computers sitting next to my desktop machine that are involved in the process of testing unattended system installs on refurbished computers for use by the BC Digital Divide.

That adds up to 10 computers in the house but only 3 of them really have "useful" data on them that requires backup considerations. These are mine and my wife's desktop machines and the media machine.

We use a combination of strategies to safeguard the software on our machines. The first is that we make a distinction between media of various types and other personal data. Media is kept on the media server and personal data is kept on our primary desktop machines. The one Windows XP primary desktop machine keeps all data to be backed up under it's "Documents and Settings" directory tree and that is the only part that is backed up. The rest of the system is considered to be easily replacable.

On the SuSE machine the /etc, /home, and /root directories are backed up.

All personal data on the two primary desktop machines are backed up to two different locations every night using unattended scripts that are much too complex to talk about in this discussion. For both machines a full backup is made as compressed archives to a Windows share on the media machine. Secondarily rsync is used to syncronize the personal data with a Debian Linux dedicated server located in California. Using rsync keeps the bandwidth usage to a few 10's of megabites per day.

As it happens, the backup archive on the media machine is about 4.2 GiB so it just fits on a single DVD-RW. Each night after the desktops have completed doing a full backup to the media machine, that backup archive is burnt to a DVD-RW.

The DVD-RW are rotated though a group of 6 disk, one for each day of the week. There is another set of 5 DVD-RWs that are additionally burnt on Mondays so that we have weekly snapshots for the last 5 weeks. On top of that we do an extra DVD-RW burn on or about the 1st of each month. This gives us monthly backups for the last 12 months. So with 23 rotated DVD-RWs we can find just about any version of any document over the last year.
So that was the personal data. What do we do with the media data? That just way too much data to use a traditional DVD rotation. Instead we break the media down into three groups: photos, audio and video. The photos are kept in DVD sized trees on the media server. Photos are kept in our personal data area until there is enough to dump them into the photo tree. When that is done, two copies of the photo data are made to DVD-RWs that backup those photos. That way we have 3 copies of the photos. Audio is treated the same way but using a seperate DVD set. We also have the CDs for most of this audio but it easier to burn the ripped audio than re-ripping it. Video is a little more complex. Most of the video is captured TV shows that we capture with Snapstreams Beyond TV. A lot of this content is just erased after viewing. Most programs are just not worth keeping. Some content, movies and a few TV series, are offloaded to DVD-Rs. They are not turned into the format that DVD players require but are just left in their Windows Media Player or DivX format that we have them in.

Sat, 03 Sep 2005 20:02:00 GMT

Moving Email from Outlook to Thunderbird

In my quest to do all of my tasks under SUSE Linux, getting all of my 15 years of email history moved to Thunderbird from Outlook turned out to be one of the trickiest. At first I tried to install Thunderbird and Outlook 2002 on the same XP machine and then just having Thunderbird import my email. No luck. Thunderbird (or something) would crash about half way through. I tried this many times without luck. I did some internet searches on the problem but did not find a lot to help. Eventually my wife found this story that addressed my issues very well. What seems to work for most people is to import the email into Outlook Express first and then into Thunderbird. This worked well without a crash. This seems to have preserved both my attachments as well as the formatting in any HTML emails.

So I imported my emails in two passes. Once for my recent email and once more for my old archive.pst file. I had to have outlook switch to that old file as it's main .pst file for this part to work.

I also used Dawn to convert my Outlook Contacts folder into an Outlook Express address book. I also then imported this into Thunderbird.

Once I had all my data in Thunderbird under XP I then had to transfer parts of the directory structure from XP to SUSE. This was accomplished by moving the following file from the XP machine to the SUSE machine:

For the address book:
FROM: C:/Documents and Settings/xpuser/Application Data/Thunderbird/Profiles/RANDOM1.default/abook.mab
TO: /home/suseuser/.thunderbird/RANDOM2.default/

For email message:
FROM: C:/Documents and Settings/xpuser/Application Data/Thunderbird/Profiles/RANDOM1.default/Mail/Local Folders/
The files "Outlook Express Mail.msf", "Outlook Express Mail" and the directory "Outlook Express Mail.sbd" are moved to
TO: /home/suseuser/.thunderbird/RANDOM2.default/Mail/Local Folders

I used xpuser to represent the login in account under Windows XP and suseuser as the login account under SUSE. RANDOM1 and RANDOM2 are random strings that Thunderbird associates with the profile directory and is unique for each installation.

Once all of the new email messages are accessible in Thunderbird under SUSE, it is just a matter of cleaning up any directory structure.