### The bandwidth of a fully laden 747

Never underestimate the speed of shipping physical stuff when you want to move large amounts of data. The Internet is actually horribly slow even at 'high' speeds. That's why Amazon Web Services offers an Import/Export service that involves shipping physical disks around.

Back in 1999 I wrote an article for The Guardian explaining latency and bandwidth for modem users. In it I used a jet plane full of people flying across the Atlantic to illustrate the difference.

The analogy still works today, and brings me to the farcical question: what's the bandwidth of a fully laden 747?

Assuming I fill it with DAT 320 cartridges, each of which can contain 160GB of uncompressed data and each of which weigh about 50g then I can fit about 2.8m cartridges in the plane (a single 747 can lift 140 tonnes). That's about 427 PB of storage in the plane. (I'm not sure that many cartridges will actually fit inside the 747, but you get the idea).

Now assume it flies from San Francisco to London in 10 hours. That's a bandwidth of about 12 TB/s.

Which brings me to today's announcement from RackSpace about OpenStack. One of the goals of OpenStack is to remove lock in to specific providers. That's a noble goal, but if you store a lot of data in the cloud you might find yourself needing a 747 (or at least FedEx) if you decide to change providers.

PS Many people have commented that it would take a while to fill up the DAT tapes. Clearly the solution, as one commenter suggests, is to use a 747 as your data centre and fly it where you need.

Joe said…
Assuming you had 2.8m DAT cartridges already. Otherwise you would have to factor in copying on and off of the cartridges.
Xavier said…
I disagree. You'd have to account for the time it would take to load those tapes into a tape drive and read them to your 427PB array.

How long does it take to read 160 gig off a tape? Multiply that by 2.8 million, and then add the travel time, unloading off the plane time, and the shipping time to your terminal.

Sounds like a toss up now?
MLeo said…
This worked for [email protected], the people at SETI didn't have that much bandwidth to spare, so they shipped their data through "sneakerweb" to Berkley where they loaded the data into packets for use in [email protected]
No, don't you all see? He's saying that you have to be using the 747 as an active data center. You fly it to wherever you have the highest data needs, to bump up local maximums.
Anonymous said…
Let's get out an envelope, and turn it over...

LTO-4 cartridges -- the largest presently available -- are 800GB native, and 1.6TB compressed, and are about 4x4x1", or 16 in3. (Yes, I know about Ampex DLT; this is denser.)

One ft3 is 1728 in3, so you can fit 12 layers of 9 tapes -- 84 tapes or 134.4TB -- in a cubic foot of space.

Assuming a spherical cow... no, wait. :-)

I have a source which suggests that the 747-400F is good for 65,155 ft3 of cargo, so assuming maximum packing density, and that the density of packed tapes doesn't take the airframe over gross or out of flyable CG, then the total number of terabytes you could load would be...

8,756,832, which is 8.7 exabytes, if I remember my prefix table properly.

Now, you're assuming the cost of the airframe, fuel, pilot salaries and airport fees, as well as the cost of 5.43M LTO-4 cartridges, so , clearly this is not an inexpensive project...

And you have to divide that into the total over all number of seconds from the time you write the first byte to the time you read the last one, so -- in the long run -- you might be better served to pay someone to trench OC-192 fiber (40Gbps) from one point to the other -- especially if you're going to do it more than once.
Unknown said…
Hey, I wrote a calculator for just such a thing. Ryan, I was considering throwing in just what you were talking about so that you could get the "real" picture.

http://cgpsoftware.thepalls.com/van-tapes-calculator.html
Unknown said…
I wrote a calculator for just such a thing, but I didn't get a chance to write in the "transferring on or off of media" portion of the exercise. My assumption would be that they could just use the media in question. Which is pretty much like someone was saying -- it's like moving the entire data center.
Unknown said…
Hmmm...I read a similar article not too long ago that used Micro SD cards instead of the ubiquitous backup tapes, and a humble 1985 Volvo station wagon on the transport layer.

500 gig per second, if we don't get a flat

Again, the main argument against it was the time & effort involved, especially the amount of time needed to load data onto and read off of the cards. But, if you're willing to accept that kind of latency then it's certainly a robust way of moving huge amounts of data around.
Why don't you use 2TB hard disks as the example. Aren't they as cheap & dense as tapes, and also have benefits like I dunno, random access and stuff.
Anonymous said…
leenoox, i don't know where you got 65000 cu ft of space. a 747-8F (bigger than the 400 model) have only 30000 cu ft (854 m3) of space.

that's enough for 14.7 milion DAT cartdriges, or 3686.5 tons.

but since the 747-8f is limited to 160 tons, it'd take 23 planes to carry all that volume. the maximum you can take in a single jumbo would be 3.5 million tapes, assuming 45 grams (shipping wheight i found on amazon for an HP branded DAT 160) and the capacity of a 747-8f from boeings web site.

still, a lot of data. which would take a while to save on tapes. assuming maxell's datasheet as reliable, a drive can put 24MB/s on a tape, or 41s to save a gigabyte.

which means 474 days for a single drive to record a mere petabyte. with a thousand drives working in parallel, it'd take 202 days to save john's estimated 427PB, plus one day for loading the tapes and flying them, 203 days. same 202+1 day to unload and restore the tapes, we're talking 406 days to physically move all the data to the destination system.

that's about 1 petabyte a day. with 86400 seconds in a day, that's 12.2 GIGABYTES/SECOND.

it'd take nearly 100 thousand gigabit ethernet cards to match that.

in other words: never underestimate the bandwidth of a thousand DAT drives working toghether with a 747-8f
Unknown said…
does anyone here really believe there is any company in the world that would HAVE that much data, or even a small fraction of it?

### Your last name contains invalid characters

My last name is "Graham-Cumming". But here's a typical form response when I enter it: Does the web site have any idea how rude it is to claim that my last name contains invalid characters? Clearly not. What they actually meant is: our web site will not accept that hyphen in your last name. But do they say that? No, of course not. They decide to shove in my face the claim that there's something wrong with my name. There's nothing wrong with my name, just as there's nothing wrong with someone whose first name is Jean-Marie, or someone whose last name is O'Reilly. What is wrong is that way this is being handled. If the system can't cope with non-letters and spaces it needs to say that. How about the following error message: Our system is unable to process last names that contain non-letters, please replace them with spaces. Don't blame me for having a last name that your system doesn't like, whose fault is that? Saying "Your

### All the symmetrical watch faces (and code to generate them)

If you ever look at pictures of clocks and watches in advertising they are set to roughly 10:10 which is meant to be the most attractive (smiling!) position for the hands . They are actually set to 10:09.14 if the hands are truly symmetrical. CC BY 2.0 image by Shinji I wanted to know what all the possible symmetrical watch faces are and so I wrote some code using Processing. Here's the output (there's one watch face missing, 00:00 or 12:00, because it's very boring): The key to writing this is to figure out the relationship between the hour and minute hands when the watch face is symmetrical. In an hour the minute hand moves through 360° and the hour hand moves through 30° (12 hours are shown on the watch face and 360/12 = 30). The core loop inside the program is this:   for (int h = 0; h <= 12; h++) {     float m = (360-30*float(h))*2/13;     int s = round(60*(m-floor(m)));     int col = h%6;     int row = floor(h/6);     draw_clock((r+f)*(2*col+1), (r+f)*(row*2+1),

### The Elevator Button Problem

User interface design is hard. It's hard because people perceive apparently simple things very differently. For example, take a look at this interface to an elevator: From flickr Now imagine the following situation. You are on the third floor of this building and you wish to go to the tenth. The elevator is on the fifth floor and there's an indicator that tells you where it is. Which button do you press? Most people probably say: "press up" since they want to go up. Not long ago I watched someone do the opposite and questioned them about their behavior. They said: "well the elevator is on the fifth floor and I am on the third, so I want it to come down to me". Much can be learnt about the design of user interfaces by considering this, apparently, simple interface. If you think about the elevator button problem you'll find that something so simple has hidden depths. How do people learn about elevator calling? What's the right amount of