Tuesday, September 28, 2010

How I handle my mail

If you mail me at jgc.org your mail is routed to a GMail account and then some magic happens. Your mail will be read by my automatic mail classifying code and automatically labeled. Here's a shot of my GMail labels:


All those labels with a + as a prefix are automatic. Here's what happens: every five minutes a service I've created logs into my GMail account using IMAP and OAuth (the service doesn't know my password, it's authorized via OAuth to access my mail). The service searches for labels with a + prefix and synchronizes.

Then it looks for new messages in my Inbox (and All Mail) and uses machine learning to apply one of the + labels to the message. Next it takes a look through the labels themselves to see if new messages have been labeled manually. It does that so it can learn.

When first set up the service looked through all the messages in the + labels and built a machine learning classifier from the message contents. It immediately after that started classifying mail. But, naturally, it's not 100% perfect and so sometimes it makes mistakes (puts the wrong label on messages). Happily, all I have to do is change the label in GMail and the service will spot the change next time it logs in and will update its classifier.

The entire interface to the automatic classifier is through GMail. In fact, there's no interface at all: it watches my actions and learns. All I have to do is set up labels with a + at the beginning and my mail is magically labeled. If an labeling error occurs I just relabel.

Anyone else OCD enough to want this service? (If you are serious about wanting a service like this please email me and I'll add you to a little mailing list for when I get round to implementing it for public consumption).

12 comments:

gio said...

Is this popfile, or some new magic?

John Graham-Cumming said...

It's based on my commercial email classifier, polymail, which is written entirely in C. So the concept is similar to POPFile, but the actually implementation is completely different.

The C code, as you might imagine, is way faster than the Perl implementation and has a number of tricks to make it scream (for example, it uses hashes instead of words for fast access).

Baishampayan said...

I believe if you share the code on Github or something then we all can benefit from it.

Thanks JGC :)

John Graham-Cumming said...

You can download POPFile (http://getpopfile.org/) and set up the IMAP interface to do something very similar. It is GPL licensed.

jjsquared said...

Interesting...I just simply use gmails built in filtering....any new site or service I sign up for, I cater my email address for that service...so if I signed up at Rdio I would use [email protected]..if I signed up for Mog I would use [email protected]..gmail delivers the mail to everything preceding the plus...so I set up filters for each custom name and apply appropriate labels based on where its coming from

MatStace said...

That's exactly the sort of thing that I've been wanting gmail to do for years (find a +blah in the too address and label it).

I'm definitely CDO (got to put the letters in the correct order ;-p ) to want that sort of thing.

grant said...

I need this.

pqs said...

Yes, please, create your service!

Russell said...

s/it's classifier/its classifier/

John Graham-Cumming said...

The shame, the shame of a misplaced apostrophe.

gio said...

Well, if it is so, I would like to have it as well. Do I have to send you an email, or leaving a comment here is enough?

John Graham-Cumming said...

Send me a mail because I can't contact you directly from here.