Wednesday, December 19, 2007

Is Digg really worth $300m?

There's a rumor going around that Digg is trying to sell itself to somebody (anybody?) for $300m. That seems like a lot of money to me so I went searching for comparative statistics. Prior to taking their series B financing, Digg was rumoured to be looking for (and failing to find) a buyer at $150m.

Back in 2005 News Corp. bought Intermix for $580m. Intermix spent $69m of that money to acquire the remaining 47% of MySpace that it didn't already own. The company had 27 million unique users per month at the time of acquisition. So they paid roughly $22 per user.

Now take a look at Digg today with about 17.8m unique visitors. Using that same metric Digg would be valued at $392m.

But now look at the attention span of those people on the site.

Visitors to look at an an average of 37 pages per visit whereas Digg users look at 5. In terms of time Digg users spend 0.04% of their time on Digg, whereas MySpace engages users for 7.3% of their time. So > 7 times the pages viewed, plus 180x more time spent on MySpace than Digg.

Now, the average stay at Digg is 2m25s and MySpace is 24m26s. So people spend 12 times longer on MySpace than on Digg.

So MySpace is way more engaging that Digg, which isn't surprising given that Digg is place you go to go someplace else. You're unlikely to spend long there.

So how do you discount the $392m based on engagement..? By % of time spent there Digg is worth $2m, by length of stay it's $32m, by pages viewed it's $56m.

Now, Digg has taken about $11m in VC money so assuming participating preferred etc. the VCs aren't going to be happy below that $32m number.

My guess is that Digg is worth somewhere between $50m and $100m. Of course, I don't know what their revenues are and if they are making a lot on advertising then those numbers could change. The rumour has always been that Digg users don't click on ads so their revenue might not be spectacular.

Another telling statistic on Digg is its Alexa rank which dropping fast.

And if they are out there looking to be sold the price will be depressed.

$300m: too high IMHO.

Wednesday, December 05, 2007

The Seven Point Scale

For some time now I've noticed the scoring things on a scale of 1 to 7 seems to be a good way of evaluating some analogue or continuous phenomenon. I was first introduced to 7 point scales by John Ousterhout (who's best known as the Tcl instigator). John likes to use a 7 point scale to evaluate interviewees as follows:

1Worst candidate imaginable. I will quit if you hire them.
2Very negative. Will argue strongly against hiring.
4Totally ambivalent about this candidate.
6Enthusiastic. Will argue strongly to hire.
7Best candidate imaginable. I will quit if you don't hire them.

And we used to look for candidates with 5 and above votes from all interviewers and at least one 6. We did once have someone vote 7 on a candidate, but it's very rare to see 1 or 7.

Turns out that seven point scales are not that uncommon. Kinsey used one in defining types of sexuality:

1Exclusively heterosexual.
2Predominantly heterosexual, only incidentally homosexual.
3Predominantly heterosexual, but more than incidentally homosexual.
4Equally heterosexual and homosexual.
5Predominantly homosexual, but more than incidentally heterosexual
6Predominantly homosexual, only incidentally heterosexual
7Exclusively homosexual.

And there's actually research into the accuracy of 7 points scales. See for example this report that indicates that 7 point scales give as much information as 10 point scales when rating happiness. And here's another paper recommending seven point scales for measurement. And then there's the Likert scale which has been in use since the 1930s which has 5 and 7 point variants.

Seven point scales are neat because they have a clear middle point and between the middle and end points there are just two choices. That gives them to capture variations in opinions without presenting too many choices (leading to vacillation) or too few (meaning that too much data is lost). I'm using them lots of different places: most recently in the votes on books I've recently read.

Monday, December 03, 2007

Transitive decay in social networks

When I was doing research in computer security we often used to say "trust isn't transitive". What we meant was that if Alice trusts Bob and Bob trusts Charlie then we can't assume that Alice trusts Charlie. Another way to think of this is to look at your own friends and friends of friends; do you trust the friends of friends? It's likely that you do not (if you did then it's likely that they would actually already be a friend and not a FOAF).

Clearly, trust is not a constant value across all friends, so each of your N friends will have a trust value ("how much you trust them"), which I'll call Ti, assigned to them. A friend you'd trust with your life has Ti = 1 (perhaps they're a candidate for a BFF), and a friend you don't trust at all has Ti = 0. (I'll ignore the question of why you even have friends with Ti = 0, but in the context of computer social networks you probably do have some).

In social situations we are only exposed to this FOAF trust problem occasionally, but with 'social networking' a current web buzzword we see social networks, or social graphs, and can traverse them. Many web sites are trying to use this graph traversal to build services (e.g. LinkedIn allows you to send requests across the network, or ask questions; Facebook is hoping that graph traversal will be the new application distribution method).

But any graph traversal suffers from the FOAF trust problem. In a social network online this gets expressed by statements like "Just because Alice likes the Werewolf application and shares it with Bob and Charlie is friends with Bob, that doesn't mean that Charlie wants to be a Werewolf", or "A message crossing between more than one hop won't get passed all the time".

I dare say that LinkedIn, Facebook and others could actually characterize the rate at which the FOAF attenuates messages (be they actual messages, application installations, or any other meme) passing through the network.

I'm going to posit that the amount of trust a user would place in a FOAF (and a FOAFOAF, a FOAFOAFOAF, ... ) decays rapidly with the number of FOAF hops traversed.

Intuitively, if Alice trusts Bob with Talice,bob and Bob trusts Charlie with Tbob,charlie then how much does Alice trust Charlie? Less than she trusts Bob because Charlie is not her friend, and she can only evaluate Charlie based on Bob's recommendation. The more she trusts Bob the more she should trust Charlie. So some sort of estimate Talice,charlie is created from Talice,bob and Tbob,charlie taking into account these trust estimates.

A simple combination would be Talice,charlie = Talice,bob * Tbob,charlie (this assumes quite the opposite of the original declaration above: here trust is transitive to a certain degree).

The problem with this is that it treats all trust relationships as having equal weight, no matter how far they are from the original person (in this case, Alice). Imagine the case where Alice trusts Bob with Talice,bob = 1, Bob trusts Charlie Tbob,charlie = 1. This formula gives Talice,charlie = 1 which would probably not reflect most people's intuitive grasp of trust. If in addition, Charlie trusts Dave with Tcharlie,dave = 1 then we get Talice,dave = 1 which seems even more unlikely.

What's needed is a way to decay trust the further apart people are.

One way to to this is for each person to have their own damping factor the encodes how much they trust another person's trust. So Alice might trust other people's recommendations with factor Dalice (in the range [0,1]). The formula would be updated to have

Talice,charlie = Dalice * Talice,bob * Tbob,charlie

Talice,dave = Dalice * Talice,charlie = Dalice * Talice,bob * Dbob * Tbob,charlie * Tcharlie,dave

But that's still essentially linear. I think trust looks more like an inverse square law so that distance is explicitly encoded. With that

Talice,charlie = Dalice * Talice,bob * Tbob,charlie / 1^2

Talice,dave = Dalice * Talice,charlie / 2^2 = Dalice * Talice,bob * Dbob * Tbob,charlie * Tcharlie,dave / 4

This seems to fit better intuition because trust of distant people drops away very rapidly. Now, since this is only a hypothesis it would be interesting to measure the reach of messages passing inside a social network to look at the actual 'pass it on' rates to see if they match intuition.

Anyone out there got lots of social network data I could analyze? Perhaps there's a Facebook application developer who's tracked enough invite/install data that this could be verified.

Saturday, December 01, 2007

Double-checking Dawkins

I've been reading through Richard Dawkins' books and am currently half way through The Blind Watchmaker (2006 paperback edition) and on page 119 he writes:

In my computer's ROM, location numbers 64489, 64490 and 64491, taken together, contain a particular pattern of contents---1s and 0s which---when interpreted as instructions, result in the computer's little loudspeaker uttering a blip sound. This bit pattern is 10101101 00110000 11000000.

Of course, this piqued my curiosity. Did Dawkins just make that up, or is this really the contents of a specific bit of memory on a specific computer?

The book was first published in 1986, so I just had to figure out what it was. Starting with the instructions and converting to hex we have AD 30 C0. Now, considering the main processors around at the time there are three possible interpretations of these three bytes:

Z-80/8080 XOR L ; JR NC C0

6502: LDA C030

6809: JSR 30C0

The first didn't look at all plausible, but both the other two do. The 6809 looks like it could be calling a sub-routine at location 30C0 and the 6502 would work if C030 were actually memory-mapped I/O and a read was necessary to cause the blip.

I couldn't find any machine with a meaningful interpretation of the 30C0 location on a 6809 machine, but 6502 turned out to be a different story.

On an Apple ][ memory in the range C000-C0FF is mapped to various bits of I/O:

The memory space on the Apple II between $C000 and $CFFF was assigned to handle input and output. From $C000 to $C0FF the space was reserved for various soft-switches used to control the display, and various built-in I/O devices, such as the keyboard, paddles, annunciators, and the cassette port.

Poking around further I discovered that a read from C030 (known as SPKR) will blip the speaker on an Apple ][. So Dawkins is telling the truth and he's using an Apple ][.

So returning to the memory locations what program is he talking about? The three memory locations are FBE9, FBEA and FBEB. That turns out to be inside the Monitor ROM on an Apple ][ (the stuff that Woz wrote) and happily a complete ROM listing was provided with the Apple ][.

So looking in a scanned Apple ][ manual we find that those locations are inside the BELL2 routine. Specifically on page 163 of the manual we find the following:

FBE6: 20 A8 FC 598 JSR WAIT 1 KHZ FOR .1 SEC
FBE9: AD 30 C0 599 LDA SPKR
FBEC: 88 600 DEY
FBEF: 60 602 RTS2B RTS

The line in bold is exactly the code that Dawkins is referrring to.

So, while writing this famous text on evolution, Dawkins had time to poke around inside the Apple ][ ROM and write about it in his book.