Skip to main content

Posts

Showing posts from January, 2006

Deconstructing Sundance with POPFile

The brave folks at Unspam have taken POPFile and years of data surrounding the Sundance Film Festival in attempt to predict the outcome of the 2006 edition. Specifically, they want to get invited to parties, and they've chosen to geek out on data, bend POPFile to their use (boy, that must have been hard work) and try to predict the hits from the festival.

They claim an 81% accuracy over the past festivals and have a nice web site giving their predictions for this year. The predictions also include a breakdown of how the decisions are made indicating the most important (and worst) words to appear in a review of a movie.

There's also movie metadata like the type of film it was shot on, or who reviewed it. That's a very interesting use of POPFile and if they get it right and are invited to all the cool parties I hope they fulfill my wish and put a good word in for me with Neve Campbell.

The WMF SetAbortProc problem is not a backdoor

Microsoft recently fixed a problem wherein a Windows MetaFile (WMF) could contain arbitrary code that would be executed when the WMF was "played". The problem was particularly bad because a WMF could be played automatically in Internet Explorer by referencing it in an IFRAME if the Windows Picture and Fax Viewer was installed and registered (which by default on recent versions of Windows it was), because that program would automatically handle WMFs and play them to display them.

Steve Gibson suggested in a podcast, which then became a big news story, that he was convinced that this functionality was in fact an intentional backdoor inserted by Microsoft for their own purposes (or at last for the purpose of a rogue engineer inside the company).

Microsoft responded to that accusation with a blog posting by Stephen Toulouse in which he gave some more details of the problem and essentially said that Gibson was wrong (without naming him).

I've looked carefully at this and am con…

Do spammers fear OCR?

Nick FitzGerald recently sent me two sample spams that seem to indicate that some spammers fear that using images in place of words isn't enough. They've started to obscure their messages to prevent optical character recognition.

The first spam appears to be a scan of a document that's been skewed slightly. Now this could be a simple and bad scan.



But the second is even more interesting. It appears to be perfectly normal:



Until you look at the fact that this was actually constructed using <DIV> tags for layout and the breaks between the lines are in the middle of words. Here Nick has kindly inserted borders showing that the words are broken horizontally and then put back in the right position:



But is anyone doing OCR, or are spam filters getting good enough that the spammers are being really paranoid about what they are sending?

The funny think about the second example is that the URL they include is not obfuscated, is clickable and appears in the SURBL :-) So desp…

Python is the new Tcl; Ruby is the new Perl

Given how bad I am at predictions I'd like to give you the following (which you'd better take with a grain of salt): Python is the new Tcl and Ruby is the new Perl. My prediction for 2006 is that most Perl 5 programmers will decide that Ruby looks cool and learn the language and that Python will be a niche language that a die hard core continues to love and embellish.

I think Ruby's the new Perl because of the confluence of three things: Perl 6, RubyGems and Ruby's own Perliness. Firstly, given how hideous Perl is if you're a Perl person (like me) why learn Perl 6 when you could learn Ruby? Ruby's core language is clean, regular expressions are strong and there are many Perl influences (which Ruby is slowly removing to ensure that language doesn't become Perl). Secondly, RubyGems will do for Ruby what CPAN did for Perl.

Python's lack of good package management (if you ignore ActiveState's PyPPM and the recently started Cheese Shop) and idiosync…