Tuesday, December 08, 2009

Google Earth view of Met Office Land Surface Temperature Data

The UK Met Office has released a subset of the HadCRUT data that contains temperature observations from 1729 stations around the world. To get started with this data you need to go here.

I downloaded the data and wrote a program to verify its integrity (looks ok so far) and then extract information about the stations and plot them. To so this I created a KML file from the observation files. Important note: if you are repeating what I am doing you will need to flip the sign on the longitude (The Met Office and KML have a different idea about what the sign means on longitude).

The first step was to import this into Google Maps. You can view the map of all the observation stations here. Here's a snapshot of the UK.

You can grab the actual KML file I generated here.

This can also be imported to Google Earth where you can see all the points.

Here's the code I used to generate it:

use strict;
use warnings;

# Program to read data released by the UK Met Office on 08/12/2009 and
# produce a KML file showing the location of all of the temperature
# stations in the released data. This KML file can be imported to
# Google Earth or Google Maps for display.
# In reading the Met Office data the program also looks for missing
# information (such as missing fields in the files and missing data in
# the observations). It also checks for any oddities such as
# non-numeric values in numeric fields.

# A directory called MetOffice08122009 is assumed to exist and contain
# a collection of two-digit subdirectories which contain the actual
# obseravtion files.

my $top = 'MetOffice08122009/';
my @subdirs = glob "$top*";

# Verify that all the subdirectories have two digits and that there's
# no extraneous data. In doing so build up a list of all the
# observation files. These should also be in a specific form: each
# has a six digit name where the first two digits correspond to the
# two digits in the directory name.
# If this code doesn't force an error the @obs array while contain the
# names of all the observation files in the released data. Each entry
# will be the path to that file.

my @obs;

foreach my $s (@subdirs) {
if ( $s !~ /^$top(\d\d)$/ ) {
die "Unexpected subdirectory $s\n";

# Captures the two-digit portion of the directory name
my $two = $1;

my @o = glob "$top$two/*";
foreach my $f (@o) {
if ( $f !~ /^$s\/$two\d{4}$/ ) {
die "Unexpected file $f\n";
} else {
push @obs, ($f);

print "Found ", $#obs+1, " observation files\n";

# This hash contains the fields that are expected in the released data
# and maps to a regular expression for recognizing the data. These
# expressions are used to look for anomalies (such as an alpha
# character in a numeric field).

my %fields = (
'Number' => qr/^\d{6}$/,

# The Name and Country fields contain all sorts of extra characters
# that you might not expect e.g. SPAIN is followed by a line of -
# characters

'Name' => qr/^[A-Za-z \.\-\/\(\)\d\*',]+$/,
'Country' => qr/^[A-Z \.\-\(\)']+$/,
'Lat' => qr/^-?\d{1,2}\.\d$/,
'Long' => qr/^-?\d{1,3}\.\d$/,
'Height' => qr/^-?\d+$/,
'Start year' => qr/^[12]\d{3}$/,
'End year' => qr/^[12]\d{3}$/,
'First Good year' => qr/^[12]\d{3}$/,
'Source ID' => qr/^\d+$/,
'Source file' => qr/^.+$/,
'Jones data to' => qr/^[12]\d{3}$/,
'Normals source' => qr/^.+$/,
'Normals source start year' => qr/^[12]\d{3}$/,
'Normals source end year' => qr/^[12]\d{3}$/,
'Normals' => qr/^(\s*-?\d+\.\d\s*){12}$/,
'Normals source percent availability' => qr/^\d+$/,
'Normals source variable code' => qr/^\d+$/,
'Standard deviations source' => qr/^.+$/,
'Standard deviations source start year' => qr/^[12]\d{3}$/,
'Standard deviations source end year' => qr/^[12]\d{3}$/,
'Standard deviations' => qr/^(\s*-?\d+\.\d\s*){12}$/

# Now read through each of the observation files and verify that they
# have all the fields that are required. Extract the latitude and
# longitude and use these to fill in hash elements named 'latitude'
# and 'longitude' in the %observations hash. This hash is keyed by
# the observation number.
# For example an entry in this hash might look like
# $observations{988510}
# $observations{988510}{latitude} = 6.1
# $observations{988510}{longitude} = -125.2
# $observations{988510}{name} = 'REJAH BUAYAN'

my %observations;

foreach my $f (@obs) {
if ( open F, "<$f" ) {

# Parse out the name of the file to extract the expected Number
# which should match the Number field inside the file

$f =~ /$top\d{2}\/(\d{6})/;
my $number = $1;

# This will get set to 1 as soon as we find the Obs: marker in the
# file and hence are done with the header portion and are on to
# the actual observations

my $doing_obs = 0;

# This is used to make sure that the observations are all there,
# i.e. that there are no gaps in the years in a single file

my $year_obs = 0;

while (<F>) {

my $line = $_;

if ( !$doing_obs ) {

# The line should have an = sign in it, or it should be the
# start of the actual observations which begins with Obs:

if ( $line =~ /^([^=]+)\s*=\s*(.+)/ ) {
my ( $field_name, $field_value ) = ( $1, $2 );

if ( exists($fields{$field_name}) ) {
if ( $field_value =~ /$fields{$field_name}/ ) {

# Check that the Number field is the same as the number
# in the file name

if ( ( $field_name eq 'Number' ) && ( $field_value != $number ) ) {
die "Number field $field_value doesn't match file name $f\n";

# Capture latitude and longitude and name

if ( $field_name eq 'Lat' ) {
$observations{$number}{latitude} = $field_value;
if ( $field_name eq 'Long' ) {

# Note that I am flipping the sign on the longitude here because
# it appears that the Met Office is swapping east and west
# compared to how KML thinks about longitude

$observations{$number}{longitude} = -$field_value;
if ( $field_name eq 'Name' ) {
$observations{$number}{name} = $field_value;
} else {
die "Value for field $field_name in $f is incorrect ($field_value)\n";
} else {

# Since I am not using any extraneous data I just report
# it and carry on.

print STDERR "Unexpected header value $field_name in file $f\n";
} elsif ( $line =~ /^Obs:$/ ) {
$doing_obs = 1;
} else {
die "Unexpected line [$line] in $f\n";
} else {

# Observation lines should start with a four figure year
# followed by 12 observations. The years should be contiguous.

if ( $line =~ /([12]\d{3})\s+(\s*-?\d+\.\d\s*){12}/ ) {
if ( $year_obs == 0 ) {
$year_obs = $1;
} else {
$year_obs += 1;

if ( $year_obs != $1 ) {
die "Gap in observation record at year $1 in file $f\n";
} else {
die "Unexpected observation line [$line] in $f\n";

close F;

# Verify that we have captured latitude, longitude and name

if ( !exists( $observations{$number}{latitude} ) ||
!exists( $observations{$number}{longitude} ) ||
!exists( $observations{$number}{name} ) ) {
die "Incomplete information for $f\n";
} else {
die "Failed to open observation file $f\n";

# Now use the %observations hash to build the output KML file, this
# will be stored in a file called MetOffice08122009.kml.
# IMPORTANT NOTE: this is currently outputting an altitude of 0 for
# each observation.

my $kml = 'MetOffice08122009.kml';

if ( open F, ">$kml" ) {
print F<<EOF;
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2">

foreach my $o (sort keys %observations) {
print F<<EOF;

print F<<EOF;

} else {
die "Failed to create output KML file $kml\n";

Take it under the General Public License.

Next step: map the temperature data onto Google Earth.

Labels: ,

If you enjoyed this blog post, you might enjoy my travel book for people interested in science and technology: The Geek Atlas. Signed copies of The Geek Atlas are available.


<$BlogCommentDateTime$> <$BlogCommentDeleteIcon$>

Post a Comment

Links to this post:

<$BlogBacklinkControl$> <$BlogBacklinkTitle$> <$BlogBacklinkDeleteIcon$>
Create a Link

<< Home