Skip to content

Data, Risk
and Decision
Making

What We Do

imp_scenario0_surface

Data Analysis

We use data, statistical methods and machine learning to create opportunities, quantify risks and guide decision making.

Tools

We design and develop modern tools and applications to create clarity from data and optimize decision making.

statsrisk-training

Training

We provide state-of-art training courses in statistics, machine learning and risk quantification.

How We Do It

Select

We select and implement proven methods from peer-reviewed journals, to solve specific problems.

Formulate

We formulate new algorithms and models to solve specific challenges faced by decision makers.

Build

We build easy-to-use tools that facilitate the use and exploration of these models.

Latest Article

run_mean_2d_circl2

Capture data patterns using running mean / moving average – Part two

In a previous blog post, we explored how to summarise data patterns using the method of running mean in two dimensions.

This post looks at the method of running mean in three dimensions.

For illustration, we consider the dataset displayed in Figure 1.
These data show death rates among young males in the UK by Age and by Calendar year.
The age range is from 1 to 18, and the calendar year ranges from 1990 to 2010.

We want to summarise the mortality rates in terms of ages (\(x\)) and calendar years (\(y\)).

Figure 1: UK mortality rates from age 1 to 18 and for calendar years 1990 to 2010.

 

Let us denote by \(Z_{x,y}\) the observed death rate at age \(x\) and
calendar year \(y\), \(x\in \{1,\,\cdots,\,18\}\;\text{and}\; y\in\{1990,\,\cdots,\,2010\}\).
The relation between death rate, age and time can be summarised as

\[ Z_{x,y} = \mathcal{S}(x, y) + \varepsilon_{x,y} \]
where \(\mathcal{S}\) is some function to be determined and \(\varepsilon\) represents the noise.

There are many ways to estimate \(\mathcal{S}\).
In this post, use the simple method of running mean or moving average.
We shall look at alternative methods in subsequent posts.

In the method of running mean, \(S\) is estimated at each point by the average of the nearest neighbours.
That is, the estimate \(\hat{\mathcal{S}}(x, y)\) of \(\mathcal{S}(x,y)\) can be obtained by averaging the observed death rates
at ages and calendar years in the vicinity of \((x,y)\).

Various structure/shapes of the neighbourhood can be chosen. One simple structure consists of
taking the points that fall inside the circle of radius \(R\) centred at the target point.

We refer to R as the neighbourhood radius.

Figure 2 illustrates how the fitted running mean surface behaves with respect to the neighbourhood radius.

Figure 2: Two-dimensional running mean fitted to UK mortality rates by Age and by Calendar Year. \(R\) represents the neighbourhood radius.

This figure shows that the fitted running mean surface converges to toward the underlying data as the neighbourhood radius decreases; and in particular, the running mean becomes identical to the data when \(R=0\).
At the other end, the running mean surface converges to the overall mean in the data as the neighbourhood radius increases.

A great advantage of the running mean is its simplicity. However, it has many limitations.
We shall look at better smoothing methods in subsequent posts.

Get in Touch

Do you want to know more about what we do? Do you have a project to discuss? We would like you to contact us.