Skip to main content

Some real data about JavaScript tagging on web pages

Since March of this year I've been running a private web spider looking at the number of web tags on web pages belonging to the Fortune 1000 and the top 1,000 web sites by traffic. Using the spider I've been able to see which products are deployed where, and how those products are growing or shrinking.

The web tags being tracked are those used for ad serving, web analytics, A/B testing, audience measurement and similar.

The spider captures everything about the page, including screen shots, and I'm able to drill in to see the state of a page and all its includes at the time of spidering. Here's shot of Apple with all the detail that the spider keeps.

The first interesting thing is to look at the top 1,000 web sites by traffic and see how many different tags are deployed per page. The average is 2.21, but if you exclude those that have no tags at all then the average is 3.10. Here's the distribution of number of tags against percentage of sites.

And of course, it's possible to see the market share of various different products. Here are the top 10 that I am tracking. Google Analytics has an impressive 43% of the top 1,000 web sites by traffic.

Since I've been tracking over time it's also possible to watch the growth (and decline). Here's the growth in the average number of tags on a web page (excluding pages that have no tags) since March 2009.

Since I also keep all the JavaScript and HTML for a page it's a breeze to calculate page weights. Here's a chart showing the size of HTML and JavaScript for the top 1,000 web pages by traffic. The x-axis shows the size of the page (excluding images) in kilo- or megabytes. The y-axis is the percentage of sites in that band.

I was shocked when I saw that list and suspected a bug. How could there be web sites with megabytes of non-image content? It turned out that it wasn't a bug. For example, at the time of downloading the HTML and JavaScript for Gawker was over 1Mb.

In a previous post I showed in detail the tagging on a site and that 29% of the non-graphic content was JavaScript used for web tagging. Here's another chart showing what percentage of web page markup is included JavaScript (this can include stuff like jQuery and web tagging products).

The really surprising thing there is how much JavaScript there is on pages. For many pages it's the majority of non-graphic content. Take for example Subscene where the home page HTML is about 18k but then masses of JavaScript are included (including over 200k from Facebook, a similar amount from UPS and various other bits of code).

If you delve into the tags actually used by various products you'll see that the sizes of JavaScript used for them varies a lot. comScore's Beacon is tiny (just 866 bytes)!

Finally, you might be asking yourself which site had 16 different tags on it. The winner is the celebrity gossip site TMZ.


Popular posts from this blog

Your last name contains invalid characters

My last name is "Graham-Cumming". But here's a typical form response when I enter it:

Does the web site have any idea how rude it is to claim that my last name contains invalid characters? Clearly not. What they actually meant is: our web site will not accept that hyphen in your last name. But do they say that? No, of course not. They decide to shove in my face the claim that there's something wrong with my name.

There's nothing wrong with my name, just as there's nothing wrong with someone whose first name is Jean-Marie, or someone whose last name is O'Reilly.

What is wrong is that way this is being handled. If the system can't cope with non-letters and spaces it needs to say that. How about the following error message:

Our system is unable to process last names that contain non-letters, please replace them with spaces.

Don't blame me for having a last name that your system doesn't like, whose fault is that? Saying "Your last name …

All the symmetrical watch faces (and code to generate them)

If you ever look at pictures of clocks and watches in advertising they are set to roughly 10:10 which is meant to be the most attractive (smiling!) position for the hands. They are actually set to 10:09.14 if the hands are truly symmetrical. CC BY 2.0image by Shinji
I wanted to know what all the possible symmetrical watch faces are and so I wrote some code using Processing. Here's the output (there's one watch face missing, 00:00 or 12:00, because it's very boring):

The key to writing this is to figure out the relationship between the hour and minute hands when the watch face is symmetrical. In an hour the minute hand moves through 360° and the hour hand moves through 30° (12 hours are shown on the watch face and 360/12 = 30).
The core loop inside the program is this:   for (int h = 0; h <= 12; h++) {
    float m = (360-30*float(h))*2/13;
    int s = round(60*(m-floor(m)));
    int col = h%6;
    int row = floor(h/6);
    draw_clock((r+f)*(2*col+1), (r+f)*(row*2+1), r, h, floor(m…

Importing an existing SSL key/certificate pair into a Java keystore

I'm writing this blog post in case anyone else has to Google that. In Java 6 keytool has been improved so that it now becomes possible to import an existing key and certificate (say one you generated outside of the Java world) into a keystore.

You need: Java 6 and openssl.

1. Suppose you have a certificate and key in PEM format. The key is named host.key and the certificate host.crt.

2. The first step is to convert them into a single PKCS12 file using the command: openssl pkcs12 -export -in host.crt -inkey host.key > host.p12. You will be asked for various passwords (the password to access the key (if set) and then the password for the PKCS12 file being created).

3. Then import the PKCS12 file into a keystore using the command: keytool -importkeystore -srckeystore host.p12 -destkeystore host.jks -srcstoretype pkcs12. You now have a keystore named host.jks containing the certificate/key you need.

For the sake of completeness here's the output of a full session I performe…