Orthogonal Distance Regression

[PTB90] Paul T. Boggs and Janet E. Rogers. Orthogonal Distance Regression. In P.J. Brown and Wayne A. Fuller, editor, Contemporary Mathematics. American Mathematical Society, Providence, Rhode Island, 1990.

Given a model $f$ so that

\[ y_i = f(x_i; \boldsymbol{\beta} ) \]

where we have $n$ measurements with errors $(\delta_i; \epsilon_i)$, $i=0 \ldots n-1$ and $p$ parameters $\boldsymbol{\beta}$. The measurements are related to the true values by

$X_i = x_i - \delta_i$

$Y_i = y_i - \epsilon_i$

The model can also be written as

\[Y_i = f(X_i + \delta_i; \boldsymbol{\beta}) - \epsilon_i.\]

The function f can be either linear or non-linear in its arguments, but must be a smooth function of these arguments.

For each point ${(X_i, Y_i)}$, we would like to minimise the distance to the curve $f(x; \boldsymbol{\beta})$, i.e.

\[ \text{argmin}_{\epsilon_i, \delta_i} \left\{\epsilon_i^2 + \delta_i^2\right\} \quad \text{subject to} \quad Y_i = f(X_i + \delta_i; \boldsymbol{\beta}) - \epsilon_i. \]

It is assumed that $\epsilon_i \sim \mathcal{N}(0, \sigma_{\epsilon_i^2})$ and $\delta_i \sim \mathcal{N}(0, \sigma_{\delta_i^2})$ (zero mean), from which it follows that, overall, we should minimise the sum of the squares:

\[ \text{argmin}_{\boldsymbol{\beta}, \boldsymbol{\delta}, \boldsymbol{\epsilon}} \sum_{i=0}^{n-1}{\epsilon_i^2 + \delta_i^2} \quad \text{subject to}\quad Y_i = f(X_i + \delta_i; \boldsymbol{\beta}) - \epsilon_i \]

which can be rewritten without $\boldsymbol{\epsilon}$ as

\[ \text{argmin}_{\boldsymbol{\beta}, \boldsymbol{\delta}} \sum_{i=0}^{n-1} [f(X_i + \delta_i; \boldsymbol{\beta}) - Y_i] + \delta_i^2. \]

By associating weights with the above equation, the weighted Orthogonal Distance Regression problem is defined as

\[ \text{argmin}_{\boldsymbol{\beta}, \boldsymbol{\delta}} \sum_{i=0}^{n-1} w_i^2 [f(X_i + \delta_i; \boldsymbol{\beta}) - Y_i] + d_i^2 \delta_i^2, \]

where common choices for $w_i$ and $d_i$ are

\[ w_i = \frac{1}{\sigma_{\epsilon_i}} \ \text{and}\ d_i = \frac{\sigma_{\epsilon_i}}{\sigma_{\delta_i}} \]

which reduces the minimisation problem to

\[ \text{argmin}_{\boldsymbol{\beta}, \boldsymbol{\delta}} \sum_{i=0}^{n-1} \frac{1}{\sigma_{\epsilon_i}} \left\{ f(X_i + \delta_i; \boldsymbol{\beta}) - Y_i + \frac{1}{\sigma_{\delta_i^2}} \delta_i^2 \right\}. \]

The article claims that the algorithm performs better than OLS even when the ratios $d_i$ are only known to a factor of 10. The algorithms utilises a Levenberg-Marquardt like step with trust-region. The implementation makes use of special structures in different matrices to yield a solution in $O(np^2)$ time.

IanKnot rainbow -->