Monday, April 24, 2006

Bayesian Poisoning paper pointers

Over in the POPFile forums there's a lively discussion going on about a potential way of getting spam through POPFile and then corrupting the POPFile user's database to increase the false positive rate.

This particular attack uses common English words and relies on an implementation detail of POPFile: the fact that POPFile counts the number of times a word appears in an email. That detail is a little different from most spam filters that might consider a restricted range of hammy or spammy words. However, the attack described by POPFile user Olivier Guillion probably would work. On the other hand, I don't think we're likely to see it in the field primarily because it only affects POPFile and not other spam filters.

During the discussion I mentioned a number of papers that I thought Olivier should read concerning attacks on Bayesian filters. I then realized that they are not necessarily easily available. So, for the sake of everyone being able to read up quickly on this areas, here's a quick bibliography:

Any I've missed?

No comments: