Met Office Hadley Centre observations datasets
Home > FAQ >

Frequently Asked Questions

There are a number of questions from the users of hadobs that crop up time and time again. A list of these common question and some answers are given below. If you cannot find your question here, or the question is not answered to your satisfaction, do not hesitate to contact us.

General questions

Q. What format are the data in?

A. There are files that describe the data format. These are typically found in the same place you got the data from. Most of the data sets are available in plain text and netcdf formats. The data are usually compressed to reduce the sizes of the files. This helps to lower storage costs and download times. The tool used to compress the text files, gzip, is widely used and on Windows machines the files can be uncompressed using Winzip.

Some files are provided in NetCDF format, a machine-independent binary format. The NetCDF web site provides links to many tools for reading and manipulating NetCDF data.

We do not provide the files in Excel format because of the difficulties involved in automatically generating Excel files on our system. However, most people find it relatively straightforward to import the text data into Excel once they have uncompressed it. Unfortunately, some of our data sets (e.g. HadISST) cannot be imported easily into Excel. Some of these datasets are now also provided via the BADC Data Explorer, which allows small portions of data to be extracted. If you do have difficulties please contact us.

Occasionally files become corrupted during the download process. It often helps to download the data afresh and try again.

Q. What periods do you use for your baseline when computing anomalies?

A. We use a variety of base lines. The 1961-90 period is most often used because it is the period recommended by the WMO (World Meteorological Organisation). This period is also used for UK data so that information for the UK is directly comparable to data from other parts of the world.

1971-2000 is preferred in some instances because it is thought to be more representative of the current average conditions and therefore more relevant to the reader's personal experience of recent weather and climate.

In some cases other periods are used. For example, satellite data are typically only available from 1979 onwards so it would not be possible to construct either a 1961-1990 or a 1971-2000 climatology. For global average temperatures, an 1861-1890 period is sometimes used to show the warming since the "pre-industrial" period. For other data sets, the period is chosen to maximise the coverage of data in the climatology period.

We appreciate that the use of two (or more) different climatology periods can lead to confusion but every attempt is made to state clearly which period is being used at any time.

Q. I want to use one of your diagrams. How should I acknowledge the Met Office Hadley Centre?

A. Diagrams are Crown Copyright. Source should be acknowledged as Met Office Hadley Centre. Diagrams showing global mean surface temperature should additionally acknowledge the Climatic Research Unit at the University of East Anglia with whom we collaborate.

Q. I can't use the data, what's going on?

A. The data are compressed to reduce the sizes of the files. This helps to lower storage costs and download times. The tool used to compress the text files, gzip, is widely used and on Windows machines the files can be uncompressed using Winzip.

We do not provide the files in Excel format because of the difficulties involved in automatically generating Excel files on our system. However, most people find it relatively straightforward to import the text data into Excel once they have uncompressed it. Some of our data sets (e.g. HadISST) cannot be imported easily into Excel. Some of these datasets are now also provided via the BADC Data Explorer, which allows small portions of data to be extracted. If you do have difficulties please contact us.

Occasionally files become corrupted during the download process. It often helps to download the data afresh and try again.

Q. The file extension suggests the file is compressed, but it doesn't seem to be compressed

A. When certain applications are used to download data from the server, the files are automatically uncompressed before they are sent. Many users have experienced this difficulty when using a unix command called wget. It is possible to force wget to download compressed files by adding --header="accept-encoding: gzip" to the command.

Q. I can't find what I'm looking for?

A. I am sorry that you have had difficulty using the web site. The hadobs web site was created to improve access to our data and, although aimed chiefly at other scientific researchers, we are always open to suggested improvements from anyone who uses the site. If you can tell us a little more about the nature of your problem, we will do our best to help.

Surface temperature data sets

Q. Which were the 10 warmest years on record?

A. Please bear in mind that the uncertainties on the annual values are around ±0.1°C. 1998 could have been as cool as 2006 or warmer than 2010, which would alter the rankings. Although this uncertainty range is an attempt to comprehensively assess the uncertainty in the global average, the method used contains many tacit assumptions about how a global average should be calculated. NCDC and NASA GISS produce their own estimates of the global average temperature using more or less independent methods. Because there is no correct method of estimating the global average temperature from the sometimes sparse observations, the differences between the analyses can be thought of as an additional uncertainty that would be impossible to assess if only a single global temperature analysis existed. Vose et al. 2005 (An intercomparison of trends in surface air temperature analyses at the global, hemispheric, and grid-box scale, GRL) showed that the largest differences between the land surface air temperature data sets arose from the way that the gridded data were averaged to obtain a global value. For data sets which combine land and ocean data there are additional uncertainties, for example how to estimate temperature anomalies over the polar regions. Despite the differences in approach, the average correlation between the three major global temperature data sets is greater than 0.98.

Q. Where can I get the underlying observations?

A. The sea-surface temperature observations used to create HadSST2 are taken from ICOADS (International Comprehensive Ocean Atmosphere Data Set). These can be found at The land station data are available from the CRUTEM4 data download page.

Q. How are the daily mean temperatures calculated? And in case of there being several ways, how can you be sure that those ways are equivalent?

A. Because we use temperature anomalies from a station climatology, it doesn't matter how the average temperature is calculated as long as it is always done in the same way (the differences will cancel in the climatology and the monthly values). For UK data we still use the average of the Max and Min temperatures. This gives us homogenous long-term series from a station.

Some other countries calculate the average in other ways. So long as they don't change the method of calculation, the results will be consistent. If the calculation method is changed we apply corrections to the reported values. In some instances it is possible that the method was changed, but no record was made. The uncertainties associated with such inhomogeneities are discussed in Brohan et al. 2006.

Q. Why do you use anomalies?

A. Anomalies vary slowly from one place to another - if it is warmer than average in London, it is likely to be warmer than average in Paris too - but actual temperatures can vary greatly from one weather station to its nearest neighbour (or from one side of a house to another!). The average anomaly for, say, Europe is likely to be representative of a large area, the average absolute temperature will be representative of only a very limited one.

Q. Do you use model data to estimate the global average temperature?

A. The short answer is, no. In order to maintain independence between the observed record and model reconstructions of past climate, model data are not used in the estimation of the global average temperature.

Q. Do you use satellite data to estimate the global average surface temperature? If not, why not?

A. No. Although satellites can provide a quasi-global view of Earth's surface, there are a number of difficulties involved in estimating near-surface temperatures from these observations. Over land, the satellites measure the temperature of the surface, which can be very different from the air temperature just above the surface. The difference depends, amongst other things, on the wind speed and the nature of the surface. Because of the way that the satellites orbit the earth, many only take measurements at a given point only a few times a day, making it harder to estimate the mean temperature.

Over the oceans the problem is somewhat simpler. The satellites measure the temperature of the sea-surface, which is what we are interested in. The daily range of sea-surface temperature is much smaller than over land, so the time at which the observations are made is less important, although it is still significant. However, the observations from satellites are influenced by atmospheric conditions, particularly aerosols (small particles in the air), which can mean that the measurements are often in error by several tenths of a degree. Some sea-surface temperature products (for example HadISST) use satellite data, but because of the difficulties of forming a homogeneous climate record from satellite data, they are not yet used in our estimates of global average temperature.

Q. How many ships are taking measurements?

A. A peak in the total number of Voluntary Observing Ships (VOS) was reached in 1984/85 when about 7700 ships were on the WMO's list. Since then there has been a general fall in numbers and in 1994, the size of the fleet had dropped to about 7200. The fleet has continued to decline and currently comprises around 4000 ships. Most meteorological reports from VOS ships come from the major shipping routes in the North Atlantic and North Pacific.

Q. Why do you combine land-surface air and sea-surface temperatures into one data set? What is the intent of doing so?

A. The most plentiful measurements of temperature over the oceans are sea-surface temperature measurements. Air temperatures measurements are also made over the oceans, but these measurements are prone to a number of problems. During the day the sun heats the ship's hull causing temperature measurements to be artificially high. This can be avoided by only using measurements made at night, at the cost of reducing the number of available observations by half. Air temperature measurements from buoys are unreliable so those cannot be used either. In using sea-surface temperature anomalies we assume that the anomalies of sea-surface temperature are in agreement with those of marine air temperature. Tests show that night marine air temperature anomalies agree well with sea-surface temperature anomalies on seasonal and longer time scales in most open ocean areas. Globally the agreement is very good (Rayner et al, 2003).

Q. The observations you use are of low quality or have large uncertainties

A. Brohan et al. 2006 answer the concern that observations are too unreliable to estimate a global average. Brohan et al. explicitly calculate uncertainties on the global average surface temperature anomaly, which, although larger in the early period than the modern, are far smaller than is often supposed. The fact that the uncertainties are small arises, in part, from the fact that temperature anomalies are correlated over large distances. This means that the global average temperature anomaly can be assessed with a relatively small number of widely separated stations.

Q. The global temperature last month was very warm/very cold. What is going on?

A. There is significant month to month variability in temperatures. During periods when the globe warms there will be shorter periods when temperatures are lower than the recent average, just as there will be period which are warmer.

Q. What are the differences between HadCRUT, GISS and NCDC global temperature analyses?

A. The datasets are largely based on the same raw data, but each analysis treats that data differently.

HadCRUT4 is perhaps the simplest. The available data are averaged onto a regular grid. No attempt is made to fill grid boxes where there are no data, instead the empty boxes are treated as an additional source of uncertainty when area averages, such as the global average, are calculated.

The GISS analysis uses an interpolated sea-surface temperature analysis, which fills in some of the gaps in the sea-surface temperature data. The land station data is also interpolated over data free regions (including over the oceans) to a maximum distance of 1200km. This has a particularly large effect over the Arctic and Antarctic where there are few data points and temperature variability is large.

One important thing to note is that the difference between the GISS and HadCRUT4 analyses are smaller than the calculated uncertainties on the HadCRUT4 data set - the data sets are not inconsistent. The largest component of the uncertainty arises from the fact that temperatures over large areas of the Earth's surface remain unobserved. There are very few observations in the Arctic and Antarctic. GISS attempts to estimate temperatures in these areas, HadCRUT4 does not. This is the major source of difference between the analyses, which can be seen if, instead of a global average, one takes the average temperature anomaly between 60S and 60N. Over this slightly smaller area, the GISS and HadCRUT4 analyses give very similar results.

There is a third global analysis produced by NCDC that also uses interpolation to fill in some of the gaps. Their method typically fills fewer gaps than the GISS analysis and the global average generally lies somewhere between GISS and HadCRUT4.

Q. Have global temperatures been falling since 1998?

Questions about some of our older data sets

In HadCRUT3 the mean of the 12 monthly global averages does not equal the annual average. Why not?

For HadCRUT3 we calculate the global annual average from a gridded data set. The gridded data set breaks the earth's surface into 'squares' whose sides are 5 degrees in longitude and 5 degrees in latitude. We have observations of temperature in some of the grid squares, but in others we have no observations.

To calculate a monthly average we take a weighted mean of all those grid-squares that contain data that month. In other words we multiply the temperature anomaly in each grid square by the area of the grid square and add them all together, then we divide this by the total area of grid squares that contain data. We do this separately for the northern and southern hemispheres, then take the mean of the two. This stops the global average from being dominated by the better observed northern hemisphere.

To calculate an annual average we first calculate the annual average for each grid square. We then take a weighted mean of all those grid-squares that contained data that year (the method is the same as for the monthly data).

Alternatively, one can calculate the annual average by taking the mean of the 12 monthly global averages as CRU do.

Because some grid squares do not contain data these different methods do not necessarily give the same answer. We try to account for this variation by calculating the uncertainty in the annual average. This is a measure of how different we expect our estimate to be from the 'true' (albeit unmeasurable in practice) global average temperature. For the global annual average this difference is around 0.1C. If the differences between the methods are smaller than this then the differences are not, statistically speaking, significant.

Q. In HadCRUT3 how do you obtain a global annual average temperature from the monthly data?

A. First the monthly anomalies in each grid box are averaged together to give an annual average anomaly for that grid box. The area-weighted averages of these annual average grid-box anomalies are then calculated for the northern hemisphere and for the southern hemisphere. The global average temperature is the arithmetic mean of the northern hemisphere average and the southern hemisphere average. The last step avoids biasing the global average to the more densely observed northern hemisphere. There are, of course, other ways to calculate the global average and each will give a slightly different answer.

Q. The HadCRUT3 data are expressed as anomalies, but I want actual temperatures.

A. HadCRUT3 is an anomalies dataset, and all the uncertainties apply to the anomalies. If you are interested in year-to-year changes it's best to use the anomalies if you can. So before you start using the actuals, think hard to check you can't use the anomalies instead.

We can make actuals - we merge the SST climatology from the HadSST2 dataset and the land climatology from CRU high resolution dataset (New et al 2002 - see, to make a climatology for HadCRUT3 and add this to the anomalies dataset. But this has two problems: it adds an additional source of uncertainties which we don't allow for, so our uncertainty analysis is no longer valid; also land surface actuals vary over short distances because of large changes in altitude: so the actual range of temperatures in a 5 degree grid box can be large, and the mean value is not always useful.

The absolute global-average annual temperature and the absolute hemisphere-average annual temperatures for 1961-1990 were calculate by Jones et al. (1999). They are:

Globe 61-90 average = 14.0°C

Northern Hemisphere 61-90 average = 14.6°C

Southern Hemisphere 61-90 average = 13.4°C

Q. Can you let me know, please, how you calculate global average temperatures in HadCRUT3?

A.The answers to all these questions can be found in the paper:

P. Brohan, J.J. Kennedy, I. Harris, S.F.B. Tett and P.D. Jones, Uncertainty estimates in regional and global observed temperature changes: a new dataset from 1850. J. Geophys. Res, 111, D12106, doi:10.1029/2005JD006548.

or the references therein. The basic method is:

Air temperatures over the land are measured at around 2200 land stations every month. Most report the average of the maximum and minimum temperature recorded each day and the 30 or so daily values are averaged to give a monthly value for each station. At the end of the month the data are sent to the Met Office where they are quality controlled.

Over the oceans, observations of sea surface temperature (SST) are used - around 100,000 each month. These observations are made by volunteer observing ships - ships which take meteorological measurements while going about their regular business - by research vessels and by moored and drifting buoys.

The available data for each month are turned into anomalies (difference from the average temperature between 1961 and 1990 for that station or location) and averaged onto two regular grids: one for the land and one for the ocean. The anomaly in a grid box is equal to the mean of the anomalies from all stations or SST observations in that grid box.

We use anomalies because stations on land are at different elevations and the actual temperatures can vary widely over short distances. Anomalies vary more gradually (if it is warmer than average in London, it is more likely to be warmer than average in Paris) and are therefore representative of a far wider area. Using anomalies also means that the global average temperature is not strongly affected by the addition or removal of data from a location where average temperatures are very high, or low.

The 'gridded' anomalies for land and ocean are then combined and the global average temperature anomaly is calculated from the combined gridded values. This isn't a complex process. First we take an area-weighted average for the northern hemisphere and the southern hemisphere separately. The global average is simply the mean of the northern and southern hemisphere values. We do this so as not to give too much weight to the northern hemisphere, where there are more observations.

Q. Why aren't your estimates of the global annual average from HadCRUT3 exactly the same as those that CRU publish?

A. We use the same gridded data but the way that CRU calculate the global annual average from HadCRUT3 and the way that we do it are different. We average in time (monthly maps to annual maps) then in space (area average annual map to get a single number). CRU average in space (monthly map to single monthly average number) then in time (12 single monthly global averages average to get one single global annual average number).

Q. I have been looking at your smoothed curves [of HadCRUT3 global annual average temperature]. How do you do the smoothing and how do you deal with the end points?

A. See here:

Q. I am using the HadCRUT3, CRUTEM3 or HadSST2 data, and I have discovered that not all gridboxes that have an anomaly in them have an associated uncertainty. What is going on?

A. In some regions there were too few observations in the historical record to estimate the variables needed to estimate the uncertainties. This occurs in sparsely sampled regions such as the high latitude oceans. We are working on filling these gaps.

Q. My question isn't answered here!

A. Please contact us. We are always happy to help.

Commercial and media enquiries

You can access the Met Office Customer Centre, any time of the day or night by phone, fax or e-mail. Trained staff will help you find the information or products that are right for you.
Contact the Met Office Customer Centre