Tuesday, July 14, 2009

How to despam Twitter

Here's how I would despam Twitter:

1. A network of honeypot Twitter accounts. I set up the simplest of all honeypot accounts on Twitter and it has 14 followers. With something more sophisticated you'd catch many more.

2. A Report Spam button. Let anyone report spam from the public timeline. Sending to @spam is just too hard.

3. Integrated SURBL/URIBL/anti-phishing look ups. Expand URL shortener links and perform blacklist checks. In doing this the system can go back and look at tweets after they are posted (long after if necessary) to remove them. Unlike email spam can be cleaned up over time.

4. Look for tweets containing multiple terms from the trending topics. These are almost certainly spams.

5. IP address checks. Use SpamHaus to look for messages coming from known bad networks. Keep track of IP addresses associated with Twitter spam.

6. Machine Learning. All of the above, plus the tweet text can be fed to something like POPFile for a decision.

7. Quiet spam removal. Messages that are considered spam should not be deleted. The links they contain should be disabled (no href) until the person responsible for the tweet complains.

1 comment:

Barry Kelly said...

I'd bump up the machine learning priority and get rid of the SpamHaus lookup - centralized bureaucratic authorities like that almost always act insensitively and evilly, even with the best intentions.