Anomalies and outliers
Gregor Macara and Bonnie Farrant work with climate datasets. They explain two key terms – anomalies and outliers – why they are of interest and how they inform reports like Our atmosphere and climate 2020 atmosphereOur atmosphere and climate 2020 climateOur atmosphere and climate 2020 .
Questions for discussion:
How do anomalies make it possible to compare data between different sites?
Why do data scientists need to investigate outliers?
Transcript
GREGOR MACARA
Anomaly is basically a kind of fancy or scientific term for a difference from normal. So with temperature, when we think about what’s normal, we typically look at a 30-year period of time. That gives us a long enough record to determine what might be expected at a given location at a given time of year. So that’s our normal. We can compare the temperature that’s being measured with what is normal for that location for the time of year, so what we have there is a difference from normal or expected.
BONNIE FARRANT
So anomaly is a variation from a baseline – a long-term mean. Anomaly is the correct definition for that.
GREGOR MACARA
We use these anomalies in the report to enable comparisons between different sites. In New Zealand, we’ve got a huge diversity of climate and weather. For example, Milford Sound – we have huge rainfall, it’s a very wet location – but then only a couple of hundred kilometres inland in Central Otago, it’s very dry.
So if you compare the rainfall that occurred in Central Otago in January versus what occurred in Milford Sound, there might be a difference of about 400–500 mm or more, which is a huge difference. And so, when you’re comparing if it’s been a relatively dry month at Central Otago or Milford Sound, it’s really hard to make that comparison based on the data as it is or what’s being measured. So what we do is we use the anomalies, the difference from normal, to make it relative and it allows you to compare different sites. So it might have only rained 50 mm in Alexandra in Central Otago, but that would be, for example, 10% more than normal for January. In Milford Sound, it might have rained 500 mm and that might be 10% more than normal. So despite the fact that it had 450 mm more of rainfall, it still had – relative to Alexandra – a slightly wetter month as well. So the difference from normal at both sites was the same, yet the actual rainfall that observed was quite a lot different.
We use these anomalies in the report to enable comparisons between different sites so we can discuss them as a whole rather than treating each site case by case. We can talk about overall, you know, 15 sites might have had an anomalously warm or warmer than normal period of time or it might be trending warmer than normal.
BONNIE FARRANT
An outlier is often a very extreme value within a dataset. It’s often one observation that will be observed in a single row that is quite unique.
So in environmental data, it might be a really large temperature, it might be a huge rise in greenhouse gases. So you want to investigate that value and figure out is that value real? Does it correspond with a real-life event or has something gone wrong? Has someone made an error or has something gone wrong with our computer?
So an example of an outlier is that, 30 years ago in our temperature dataset, we saw temperatures dip quite significantly. And when we looked into the data, we found that that dip was associated with the volcanic eruption in the Philippines and an El Niño event.
Acknowledgements Gregor Macara, NIWA Bonnie Farrant, Stats NZ Footage of Bonnie Farrant coding at desk, Stats NZ Mount Pinatubo eruption 1991, courtesy of USGS
Acknowledgement
This resource has been produced with the support of the Ministry for the Environment and Stats NZ. (c) Crown Copyright.