Monday, March 30, 2009

Measuring download speed by embedding a GIF inside a JavaScript comment

As part of my day job I've been spidering a large number of web sites. While debugging the spider to make sure that its results were consistent I came across a curious web page that contains binary data inside a comment inside <script> tags.

The spider was simply requesting which then goes through a sequence of complicated redirects. Here's the output from the excellent HTTP debugging tool HTTPFox:

GET 301 Redirect:
GET 301 Redirect:
GET 301 Redirect:
GET 200
POST 200

The critical page to watch is When loaded with no cookies it is served containing JavaScript code that captures information about the browser and posts it back to the server:

var originalUrl = "";
var queryString = "?redirectjsp%3Dtrue";
function submitForm() {
var obj = document.frmBrowserProfile;
obj.action = originalUrl ;
obj.action = obj.action + decodeURIComponent(queryString);
obj.bpsw.value = getScreenWidth();
obj.bpsh.value = getScreenHeight();
obj.bpsd.value = getColorDepth();
obj.bpbw.value = getClientWidth();
obj.bpbh.value = getClientHeight();
obj.bpap.value = getAcrobatVersion();
obj.bpfp.value = getFlashVersion();
obj.bpcs.value = getConnectionSpeed();

The interesting part is the function getConnectionSpeed() which is written as follows:

function getConnectionSpeed()
var connectionSpeed = "999999999";
var datasize=10240;
var diffTimeMilliseconds = endTime - startTime;
if(diffTimeMilliseconds > 0)
var diffTimeSeconds = diffTimeMilliseconds/1000;
var bits = (datasize*8);
var kbits = bits/1024;
var connectionSpeed = kbits/(diffTimeSeconds);
catch (e) {} ;
return connectionSpeed;

And that relies on knowing a startTime and endTime. They is done by embedding a 10k GIF inside the web page, not by loading it, but by embedding it inside a JavaScript comment. The time is measured at the start and end of the GIF being loaded into the browser and the bandwidth determined from that.

Here's a shortened (and wrapped) version of the code:

<script language = "Javascript">
date = new Date();
<script language = "Javascript">
ùÿ,?*ÿÿýÛ[email protected]?U2$4¸


Æ;[email protected]áÀÂSØ?SHX༯?Aq ?â[email protected]³-yRº`?ÔW85¾?ò*/
<script language = "Javascript">
date = new Date();

Now the GIF data looked valid (you can see the global color table at the start) I decided to have a go at downloading it and viewing the GIF. Immediately I ran into trouble because the GIF was invalid. Looking at the code in a hex editor I immediately noticed that it was UTF-8 encoded. A quick run through UTF-8 decoding and the GIF was valid, but truncated. Unfortunately it's not very interesting when viewed in a web browser:

I wrote to the webmaster of the USAA web site, but have heard nothing back. Does anyone know if anyone else uses this technique?

PS This system measured my connection speed as 20Mbps. Which is about 5x too large.


Praveen said...

Interesting "dive" into the code!

Is this how Google Analytics measures the download speed?

itsthejay said...

Kind of intrusive though isn't it?

It's an interesting benchmark, if a little inaccurate. I imagine Google Analytics and CrazyEgg could get some use out of this.

orip said...

I wonder what other connection parameters are doing to skew the result, e.g. gzip compression

Anonymous said...

If you're going to do this, you need to have a lot more data than just 10k. The resolution on the timers available in some browser/OS combinations is.. 25ms? Very big, anyway.

eduardorochabr said...

My javascript is gzipped, I am affraid it wont work for me.

Anonymous said...

I guess the gzip compression is not really a problem since the data is already compressed in GIF.

Anonymous said...

compression is an issue, since html (with the js embedded) is transferred first, unpacked and then executed. Hence, start/endTime are measured when transfer is done...