Tuesday, December 08, 2009

Google Earth view of Met Office Land Surface Temperature Data

The UK Met Office has released a subset of the HadCRUT data that contains temperature observations from 1729 stations around the world. To get started with this data you need to go here.

I downloaded the data and wrote a program to verify its integrity (looks ok so far) and then extract information about the stations and plot them. To so this I created a KML file from the observation files. Important note: if you are repeating what I am doing you will need to flip the sign on the longitude (The Met Office and KML have a different idea about what the sign means on longitude).

The first step was to import this into Google Maps. You can view the map of all the observation stations here. Here's a snapshot of the UK.

You can grab the actual KML file I generated here.

This can also be imported to Google Earth where you can see all the points.

Here's the code I used to generate it:

use strict;
use warnings;

# Program to read data released by the UK Met Office on 08/12/2009 and
# produce a KML file showing the location of all of the temperature
# stations in the released data. This KML file can be imported to
# Google Earth or Google Maps for display.
# In reading the Met Office data the program also looks for missing
# information (such as missing fields in the files and missing data in
# the observations). It also checks for any oddities such as
# non-numeric values in numeric fields.

# A directory called MetOffice08122009 is assumed to exist and contain
# a collection of two-digit subdirectories which contain the actual
# obseravtion files.

my $top = 'MetOffice08122009/';
my @subdirs = glob "$top*";

# Verify that all the subdirectories have two digits and that there's
# no extraneous data. In doing so build up a list of all the
# observation files. These should also be in a specific form: each
# has a six digit name where the first two digits correspond to the
# two digits in the directory name.
# If this code doesn't force an error the @obs array while contain the
# names of all the observation files in the released data. Each entry
# will be the path to that file.

my @obs;

foreach my $s (@subdirs) {
if ( $s !~ /^$top(\d\d)$/ ) {
die "Unexpected subdirectory $s\n";

# Captures the two-digit portion of the directory name
my $two = $1;

my @o = glob "$top$two/*";
foreach my $f (@o) {
if ( $f !~ /^$s\/$two\d{4}$/ ) {
die "Unexpected file $f\n";
} else {
push @obs, ($f);

print "Found ", $#obs+1, " observation files\n";

# This hash contains the fields that are expected in the released data
# and maps to a regular expression for recognizing the data. These
# expressions are used to look for anomalies (such as an alpha
# character in a numeric field).

my %fields = (
'Number' => qr/^\d{6}$/,

# The Name and Country fields contain all sorts of extra characters
# that you might not expect e.g. SPAIN is followed by a line of -
# characters

'Name' => qr/^[A-Za-z \.\-\/\(\)\d\*',]+$/,
'Country' => qr/^[A-Z \.\-\(\)']+$/,
'Lat' => qr/^-?\d{1,2}\.\d$/,
'Long' => qr/^-?\d{1,3}\.\d$/,
'Height' => qr/^-?\d+$/,
'Start year' => qr/^[12]\d{3}$/,
'End year' => qr/^[12]\d{3}$/,
'First Good year' => qr/^[12]\d{3}$/,
'Source ID' => qr/^\d+$/,
'Source file' => qr/^.+$/,
'Jones data to' => qr/^[12]\d{3}$/,
'Normals source' => qr/^.+$/,
'Normals source start year' => qr/^[12]\d{3}$/,
'Normals source end year' => qr/^[12]\d{3}$/,
'Normals' => qr/^(\s*-?\d+\.\d\s*){12}$/,
'Normals source percent availability' => qr/^\d+$/,
'Normals source variable code' => qr/^\d+$/,
'Standard deviations source' => qr/^.+$/,
'Standard deviations source start year' => qr/^[12]\d{3}$/,
'Standard deviations source end year' => qr/^[12]\d{3}$/,
'Standard deviations' => qr/^(\s*-?\d+\.\d\s*){12}$/

# Now read through each of the observation files and verify that they
# have all the fields that are required. Extract the latitude and
# longitude and use these to fill in hash elements named 'latitude'
# and 'longitude' in the %observations hash. This hash is keyed by
# the observation number.
# For example an entry in this hash might look like
# $observations{988510}
# $observations{988510}{latitude} = 6.1
# $observations{988510}{longitude} = -125.2
# $observations{988510}{name} = 'REJAH BUAYAN'

my %observations;

foreach my $f (@obs) {
if ( open F, "<$f" ) {

# Parse out the name of the file to extract the expected Number
# which should match the Number field inside the file

$f =~ /$top\d{2}\/(\d{6})/;
my $number = $1;

# This will get set to 1 as soon as we find the Obs: marker in the
# file and hence are done with the header portion and are on to
# the actual observations

my $doing_obs = 0;

# This is used to make sure that the observations are all there,
# i.e. that there are no gaps in the years in a single file

my $year_obs = 0;

while (<F>) {

my $line = $_;

if ( !$doing_obs ) {

# The line should have an = sign in it, or it should be the
# start of the actual observations which begins with Obs:

if ( $line =~ /^([^=]+)\s*=\s*(.+)/ ) {
my ( $field_name, $field_value ) = ( $1, $2 );

if ( exists($fields{$field_name}) ) {
if ( $field_value =~ /$fields{$field_name}/ ) {

# Check that the Number field is the same as the number
# in the file name

if ( ( $field_name eq 'Number' ) && ( $field_value != $number ) ) {
die "Number field $field_value doesn't match file name $f\n";

# Capture latitude and longitude and name

if ( $field_name eq 'Lat' ) {
$observations{$number}{latitude} = $field_value;
if ( $field_name eq 'Long' ) {

# Note that I am flipping the sign on the longitude here because
# it appears that the Met Office is swapping east and west
# compared to how KML thinks about longitude

$observations{$number}{longitude} = -$field_value;
if ( $field_name eq 'Name' ) {
$observations{$number}{name} = $field_value;
} else {
die "Value for field $field_name in $f is incorrect ($field_value)\n";
} else {

# Since I am not using any extraneous data I just report
# it and carry on.

print STDERR "Unexpected header value $field_name in file $f\n";
} elsif ( $line =~ /^Obs:$/ ) {
$doing_obs = 1;
} else {
die "Unexpected line [$line] in $f\n";
} else {

# Observation lines should start with a four figure year
# followed by 12 observations. The years should be contiguous.

if ( $line =~ /([12]\d{3})\s+(\s*-?\d+\.\d\s*){12}/ ) {
if ( $year_obs == 0 ) {
$year_obs = $1;
} else {
$year_obs += 1;

if ( $year_obs != $1 ) {
die "Gap in observation record at year $1 in file $f\n";
} else {
die "Unexpected observation line [$line] in $f\n";

close F;

# Verify that we have captured latitude, longitude and name

if ( !exists( $observations{$number}{latitude} ) ||
!exists( $observations{$number}{longitude} ) ||
!exists( $observations{$number}{name} ) ) {
die "Incomplete information for $f\n";
} else {
die "Failed to open observation file $f\n";

# Now use the %observations hash to build the output KML file, this
# will be stored in a file called MetOffice08122009.kml.
# IMPORTANT NOTE: this is currently outputting an altitude of 0 for
# each observation.

my $kml = 'MetOffice08122009.kml';

if ( open F, ">$kml" ) {
print F<<EOF;
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2">

foreach my $o (sort keys %observations) {
print F<<EOF;

print F<<EOF;

} else {
die "Failed to create output KML file $kml\n";

Take it under the General Public License.

Next step: map the temperature data onto Google Earth.


Plato said...

That's excellent - wow you're quick off the mark :)

John Graham-Cumming said...

Just couldn't resist. My next plan is to use this plus the paper on the creation of CRUTEM3 to produce a static KML file for Google Earth that shows the most recent temperature anomaly.

Then I'll move onto an animated KML that shows the warming of the Earth in Google Earth from 1850.

dirty dingus said...

And already a little plugging around in the google maps code shows that the lat/long numbers are rounded so that we have some observation points apparently in the sea or in other improbable locations.

And we discover that google earth has a problem displaying the south pole :)

But thanks for the parser. You just saved me a bunch of time. I'll be extending it shortly and will post when I have done so,

Francis Turner said...

PS your href to the google map has an extraraneous
in it...

mtrigoboff said...

Is this Perl? If not, what is it? Thanks...

Mapperz said...

Looking at the kml - locations are rounded up - seems only to be locations in the kml.
Is there a direct link to the kml as via google maps as it is only displaying locations and name not temperature data.

Looking at the raw data in ArcGIS there are a few anomalies -99.9 no data needs changed, then time animation can be introduced.

Looking at your code seems to be perl?

good quick process though.

Stuart said...

For a Google Maps-based rendering of this data set, take a look at http://geo.me/climate

There are two different visualisations of the released data set. The first one is essentially a map-based location directory that allows the weather stations to be searched and viewed, with an annotated timeline of that station's data.
The second is a tool to generate a coloured graphical overlay of temperature for any given year/month, with the opportunity to contrast and compare years visually. Hope you find it useful!