Thursday, October 22, 2009

Some real data about JavaScript tagging on web pages

Since March of this year I've been running a private web spider looking at the number of web tags on web pages belonging to the Fortune 1000 and the top 1,000 web sites by traffic. Using the spider I've been able to see which products are deployed where, and how those products are growing or shrinking.

The web tags being tracked are those used for ad serving, web analytics, A/B testing, audience measurement and similar.

The spider captures everything about the page, including screen shots, and I'm able to drill in to see the state of a page and all its includes at the time of spidering. Here's shot of Apple with all the detail that the spider keeps.

The first interesting thing is to look at the top 1,000 web sites by traffic and see how many different tags are deployed per page. The average is 2.21, but if you exclude those that have no tags at all then the average is 3.10. Here's the distribution of number of tags against percentage of sites.

And of course, it's possible to see the market share of various different products. Here are the top 10 that I am tracking. Google Analytics has an impressive 43% of the top 1,000 web sites by traffic.

Since I've been tracking over time it's also possible to watch the growth (and decline). Here's the growth in the average number of tags on a web page (excluding pages that have no tags) since March 2009.

Since I also keep all the JavaScript and HTML for a page it's a breeze to calculate page weights. Here's a chart showing the size of HTML and JavaScript for the top 1,000 web pages by traffic. The x-axis shows the size of the page (excluding images) in kilo- or megabytes. The y-axis is the percentage of sites in that band.

I was shocked when I saw that list and suspected a bug. How could there be web sites with megabytes of non-image content? It turned out that it wasn't a bug. For example, at the time of downloading the HTML and JavaScript for Gawker was over 1Mb.

In a previous post I showed in detail the tagging on a site and that 29% of the non-graphic content was JavaScript used for web tagging. Here's another chart showing what percentage of web page markup is included JavaScript (this can include stuff like jQuery and web tagging products).

The really surprising thing there is how much JavaScript there is on pages. For many pages it's the majority of non-graphic content. Take for example Subscene where the home page HTML is about 18k but then masses of JavaScript are included (including over 200k from Facebook, a similar amount from UPS and various other bits of code).

If you delve into the tags actually used by various products you'll see that the sizes of JavaScript used for them varies a lot. comScore's Beacon is tiny (just 866 bytes)!

Finally, you might be asking yourself which site had 16 different tags on it. The winner is the celebrity gossip site TMZ.

No comments: