Thursday, February 19, 2009

Fostering (friendly) inter-team rivalry by build monitoring

The team I'm currently managing is split into three distinct groups working on things we refer to by the names Platform, Corgi and Javascript. Given that my startup is currently in stealth mode that's about all I can tell you about what we are doing.

It'll come as no surprise to regular readers (since I started Electric Cloud that we are using a continuous build system (current it's hudson) and to keep everyone informed there's a prominently positioned flat screen display showing build status.

Here's a screenshot (each team has chosen a dog breed to use as a codename for their part of the code):

To build this page a simple Perl script reads information directly from Hudson via its API converting Hudson JSON objects into Perl structures.

Initially the builds were simply presented in the order given by Hudson, and a simple color scheme was put in place: red means the build is broken, yellow means the tests are failing and green means everything is ok.

When builds are broken the Perl script pulls the list of culprits from a Hudson API and names and shames the people who broke the build by showing their photographs against the red background.

But the team came up with the idea of also using the order (from top to bottom) as a way of indicating just how good or bad build stability is. After much arguing about the best algorithm (and a number of prototypes which themselves caused one of the teams to accelerate unbreaking their build) we settled on the following:

When a build is broken or unstable it appears at the bottom of the list, and the longer it's been in that state (i.e. the longer it is since the last green build) the further it is down the list. This is pretty easy to determine since Hudson has a lastStableBuild API for each build.

The green builds are ordered by a computed value called 'Health'. The script gets the status of all the builds (for a particular component) within the last week and computes the percentage of builds that were green. The higher the health the higher on the build monitor screen.

Doing so creates some friendly rivalry. No one wants a really broken or unstable build, but even within the Green Zone the teams are competing to keep their code in good shape all the time.

And by having the build monitor accessible as a web page it can be viewed on the build monitor flat panel display, from any web browser, or from a mobile device like the iPhone:

Now I can stay home and watch what the team is up to, and harass them about broken builds before I've had my breakfast.

Monday, February 16, 2009

Sounds like the perfect day to me

The blog Daily Routines has information about the, well, daily routines of well known people. I'm particularly struck by how Winston Churchill's day seems to be ideal:

He awoke about 7:30 a.m. and remained in bed for a substantial breakfast and reading of mail and all the national newspapers. For the next couple of hours, still in bed, he worked, dictating to his secretaries.

At 11:00 a.m., he arose, bathed, and perhaps took a walk around the garden, and took a weak whisky and soda to his study.

At 1:00 p.m. he joined guests and family for a three-course lunch. Clementine drank claret, Winston champagne, preferable Pol Roger served at a specific temperature, port brandy and cigars. When lunch ended, about 3:30 p.m. he returned to his study to work, or supervised work on his estate, or played cards or backgammon with Clementine.

At 5:00 p.m., after another weak whisky and soda, he went to bed for an hour and a half. He said this siesta, a habit gained in Cuba, allowed him to work 1 1/2 days in every 24 hours. At 6:30 p.m. he awoke, bathed again, and dressed for dinner at 8:00 p.m.

Dinner was the focal-point and highlight of Churchill’s day. Table talk, dominated by Churchill, was as important as the meal. Sometimes, depending on the company, drinks and cigars extended the event well past midnight. The guests retired, Churchill returned to his study for another hour or so of work.

Just need to replace 'mail and papers' with the Internet equivalent.

Friday, February 13, 2009

How to fail at technical support

Today I need support on an HP ProCurve Wireless Access Point 420 which seems to be having trouble forwarding multicast DNS packets. Multicast DNS matters if you have lots of Apple machines on your network because it's the technology behind Bonjour.

Luckily, the access point is capable of doing multicast IP over a WPA/WPA2 encrypted network and so it should have been fine. But... it would work for a few hours and then stop working.

The current work around is to reboot the access point which I do using an expect script executed by a cron job:
#!/usr/bin/expect -f
# This script is called via a cron job to
# reboot the HP Wireless 420 access
# point. The current firmware in the access
# point has a bug (reported to HP) which causes
# it to stop forwarding multicast DNS packets.
# These packets are used by Apple Bonjour for
# service discovery. The upshot is that
# wireless connected users can no longer see
# shared servers.
# This is a horrible hack.
# Written by John Graham-Cumming

# This works by TELNETing into the access point
# using the admin user account 'admin' and issuing
# the obscurely named 'reset board' command which
# performs the reboot.

set wireless_ip_address a.b.c.d
set username "admin"
set password ""

spawn telnet $wireless_ip_address
expect "Username:"
send "$username\n"
expect "Password:"
send "$password\n"
expect "HP ProCurve Access Point 420>"
send "reset board\n"
expect "Reboot system now? :"
send "y\n"
send "exit\n"

But that's truly an ugly, ugly hack.

So, I emailed HP ProCurve technical support and within a couple of hours I had this response:

Dear John Graham-Cumming,

You recently submitted an email to ProCurve with the following description:

We have a HP ProCurve Wireless AP 420 which is used to provide access to our internal IP network. This works well except that I have to reboot the AP once per day to restore access to Apple Bonjour services (which are provided by multicast DNS). Is this a known issue? We have the latest firmware. John.

Your email Case Number is: CAS-11111-4X6563
Please always provide your case-id when you contact ProCurve Support Direct.

It is not a known issue. Which software revision do you have ?, Did you try reset to factory default ?, if you didn't. I advised you to do but previously make a copy of the configuration.

If you still have issues, you have to call to our Technical Support of Procurve Networking in UK 0870 0130 778

Thank you for contacting ProCurve Networking Support. Please let us know if we may be of any further assistance to you.

Now this starts badly because having told them that I have the latest firmware their first question is what software revision I have. Great. These standard scripts drive me insane. I've already told you I have the latest.

They also ask me to do a factory reset. Yep, I already tried that.

So I follow their advice and call the number in the UK. I tell the operator that I have a case number (which is in the email) and I then go through a bizarre conversation where he tells me that the case number isn't a case number. All case numbers must begin with 160 and mine starts CAS. According to him the C stands for China and he insists that I am not giving him a case number, I'm giving him the serial number of my product.


The email from HP reads: "Your email Case Number is: CAS-11111-4X6563. Please always provide your case-id when you contact ProCurve Support Direct."

I read these exact words to him. He insists that this is a serial number. He then claims it is not a serial number, but is a case number for something other than ProCurve support. So I read him the Subject line of the email: "Procurve Support EMEA"

Finally, I give up. I ask to start a new case and get on with it.

I describe the problem and he disappears to talk to second level support. I don't have the energy to go into the whole 'we might not support Apple', and 'multicast DNS isn't the same thing as multicast IP' (despite the fact that DNS is layered on UDP which is layered on IP).

Finally, we get to setting up the case number. But I'm not in the system because I've never called before. So I have to set up a case.

And the only way to do that is via email...

Wednesday, February 04, 2009

What the heck does this mean?

From the web site of Chelsea and Westminster NHS Foundation Trust:

Broadly based on an integrated new-born and adolescent axis which features a family centred multidisciplinary approach to care supported by a dedicated paediatric team aligned solely to the children. The core service consultant and training grade support include:

Paediatric Emergency Department with its own staff of children’s nurses and paediatric SHOs supported by an assigned paediatric medical registrar and consultant. Direct contact with either the registrar or consultant is welcomed for any member of the primary healthcare team.

OK, what the hell is an 'integrated new-born and adolescent axis'? And how could a 'dedicated paediatric team' not be 'aligned solely on the children'?

And, when did a hospital become a Foundation Trust?

Monday, February 02, 2009

"The Geek Atlas" gets a cover

And here it is:

Can you name all the people and objects on it? A free copy of the book to the first person who correctly identifies them all and posts them as a comment here.

Bonus points if you can figure out which place is associated with each person or object.