TRACE User Guide  TRACE Version 9.6.1
Interpolation Methods

Bi-linear Interpolation

The bi-linear interpolation is an extension of linear interpolation for interpolating functions of two variables (e.g., \(x\) and \(y\)) on a rectilinear 2D grid.The key idea is to perform linear interpolation first in one direction, and then again in the other direction. Although each step is linear in the sampled values and in the position, the interpolation as a whole is not linear but rather quadratic in the sample location. Suppose that we want to find the value of the unknown function \(f\) at the point \((x, y)\). It is assumed that we know the value of f at the four points \(Q_{11} = (x_1, y_1), Q_{12} = (x_1, y_2), Q_{21} = (x_2, y_1), \mbox{and}\; Q_{22} = (x_2, y_2)\).

We first do linear interpolation in the x-direction. This yields

\begin{equation*} f(x,y_1) \approx \frac{x_2-x}{x_2-x_1} f(Q_{11}) + \frac{x-x_1}{x_2-x_1} f(Q_{21}) \end{equation*}

and

\begin{equation*} f(x,y_2) \approx \frac{x_2-x}{x_2-x_1} f(Q_{12}) + \frac{x-x_1}{x_2-x_1} f(Q_{22}). \end{equation*}

We proceed by interpolating in the y direction to obtain the desired estimate:

\begin{eqnarray*} f ( x , y ) & \approx & \frac{y_2 - y}{y_2 - y_1}f (x , y_1) + \frac{y - y_1}{y_2 - y_1} f(x,y_2) \\ & = & \frac{y_2 - y}{y_2 - y_1} \left( \frac{x_2-x}{x_2-x_1}f ( Q_{11} ) + \frac{x-x_1}{x_2-x_1}f ( Q_{21} ) \right) + \frac{y - y_1}{y_2 - y_1} \left( \frac{x_2-x}{x_2-x_1}f ( Q_{12} ) + \frac{x-x_1}{x_2-x_1}f ( Q_{22} ) \right) \\ & = & \frac{1}{(x_2-x_1)(y_2-y_1)} \begin{bmatrix}x_{2}-x & x-x_{1} \end{bmatrix} \begin{bmatrix}f(Q_{11})&f(Q_{12})\\f(Q_{21})&f(Q_{22})\end{bmatrix} \begin{bmatrix}y_{2}-y\\y-y_{1}\end{bmatrix} \end{eqnarray*}

Note that one would arrive at the same result if the interpolation is done first along the \(y\) direction and then along the \(x\) direction.

Inverse distance weighting

Inverse distance weighting (IDW) is a type of deterministic method for multivariate interpolation with a known scattered set of points. The assigned values to unknown points are calculated with a weighted average of the values available at the known points.

A general form of finding an interpolated value \(y\) at a given point \(x\) based on samples \(y^{(i)}= f(x^{(i)})\) for \(i =1,2,\ldots, N\) using IDW is an interpolating function:

\begin{equation*} y(x) = \frac{\sum_{i=1}^N w_i(x) y^{(i)} }{\sum_{i=1}^N w_i(x)}, \end{equation*}

where

\begin{equation*} w_i(x) = \frac{1}{d(x,x^{(i)})^p} \end{equation*}

is a simple weighting function. \(x^{(i)}\) denotes an interpolated (arbitrary) point, \(x^{(i)}\) is an interpolating (known) point, \(d\) is a given distance (metric operator) from the known point \(x^{(i)}\) to the unknown point \(x^{(i)}\), \(N\) is the total number of known points used in interpolation and \(p\) is a positive real number, called the power parameter. In the current implementation, \(N\) is taken as \(4\) and \(d\) is taken as \(2\). The four data points, which are used to evaluate the interpolation weights are found using the k-nearest neighbors algorithm.\

K-means Clustering

Given a set of observations \((x^{(1)}, x^{(2)},\ldots, x^{(N)})\), where each observation is a \(d\)-dimensional real vector, k-means clustering aims to partition the \(N\) observations into \(k (\leq n)\) sets \(S = \{S_1, S_2,\ldots, S_k\}\) so as to minimize the within-cluster sum of squares (i.e. variance). Formally, the objective is to find:

\begin{equation*} \arg \min_{S} \sum_{i=1}^k \sum_{x \in S_i} \| x-\mu^{(i)} \|^2, \end{equation*}

where \(\mu_i\) is the mean of points in cluster \(S_i\). To find the cluster means, the most common algorithm uses an iterative refinement technique. First, given an initial set of \(k\) means \(\mu^{(i)},\mu^{(2)},\ldots, \mu^{(k)}\), the algorithm proceeds by alternating between two steps:

  • Assignment step: Assign each observation to the cluster whose mean has the least squared Euclidean distance, this is intuitively the "nearest" mean.
  • Update step: Calculate the new means to be the centroids of the observations in the new clusters.

The algorithm has converged when the assignments no longer change.