## Sunday, January 26, 2014

### adding weights to nonlinear least squares algorithm

Note: this uses MathML code produced by OpenOffice. It doesn't seem to work on Internet Explorer. It's also been rejected by Chrome. But it works on Safari, Chromium, and Firefox.

Nonlinear least-square fits are done with a Jacobian Matrix, which describes the linearized dependencies of the model, evaluated at each of the points, on each of the model parameters.  In this case I show two parameters, τ1 and τ2.  I like writing these things out, rather than using index notation, because index notation is a bit abstract.  In the following, f is a function of time t representing the model to be fit.

$A=\left[\begin{array}{cc}{\frac{\partial f}{\partial {\tau }_{1}}\mid }_{{t}_{1}}& {\frac{\partial f}{\partial {\tau }_{2}}\mid }_{{t}_{1}}\\ {\frac{\partial f}{\partial {\tau }_{1}}\mid }_{{t}_{2}}& {\frac{\partial f}{\partial {\tau }_{2}}\mid }_{{t}_{2}}\\ ⋮& ⋮\\ {\frac{\partial f}{\partial {\tau }_{1}}\mid }_{{t}_{N-1}}& {\frac{\partial f}{\partial {\tau }_{2}}\mid }_{{t}_{N-1}}\\ {\frac{\partial f}{\partial {\tau }_{1}}\mid }_{{t}_{N}}& {\frac{\partial f}{\partial {\tau }_{2}}\mid }_{{t}_{N}}\end{array}\right]$

With the unweighted nonlinear least-squares fit, the transpose of the Jocobian matrix is then taken:

${A}^{T}=\left[\begin{array}{ccccc}{\frac{\partial f}{\partial {\tau }_{1}}\mid }_{{t}_{1}}& {\frac{\partial f}{\partial {\tau }_{1}}\mid }_{{t}_{2}}& \cdots & {\frac{\partial f}{\partial {\tau }_{1}}\mid }_{{t}_{N-1}}& {\frac{\partial f}{\partial {\tau }_{1}}\mid }_{{t}_{N}}\\ {\frac{\partial f}{\partial {\tau }_{2}}\mid }_{{t}_{1}}& {\frac{\partial f}{\partial {\tau }_{2}}\mid }_{{t}_{2}}& \cdots & {\frac{\partial f}{\partial {\tau }_{2}}\mid }_{{t}_{N-1}}& {\frac{\partial f}{\partial {\tau }_{2}}\mid }_{{t}_{N}}\end{array}\right]$

This then creates a linear equation describing the iteration of the solution.  In the following, Pi are the points to be fit, of which there are N.  Point Pi is sampled at time ti.

${A}^{T}A\left[\begin{array}{c}\Delta {\tau }_{1}\\ \Delta {\tau }_{2}\end{array}\right]={A}^{T}\left[\begin{array}{c}{P}_{1}-{f\mid }_{{t}_{1}}\\ {P}_{1}-{f\mid }_{{t}_{1}}\\ ⋮\\ {P}_{N-1}-{f\mid }_{{t}_{N-1}}\\ {P}_{N}-{f\mid }_{{t}_{N}}\end{array}\right]$

To weight this, I replace the AT matrix with the following:

${A}_{W}^{T}=\left[\begin{array}{ccccc}{{w}_{1}\frac{\partial f}{\partial {\tau }_{1}}\mid }_{{t}_{1}}& {{w}_{2}\frac{\partial f}{\partial {\tau }_{1}}\mid }_{{t}_{2}}& \cdots & {{w}_{N-1}\frac{\partial f}{\partial {\tau }_{1}}\mid }_{{t}_{N-1}}& {{w}_{N}\frac{\partial f}{\partial {\tau }_{1}}\mid }_{{t}_{N}}\\ {{w}_{1}\frac{\partial f}{\partial {\tau }_{2}}\mid }_{{t}_{1}}& {{w}_{2}\frac{\partial f}{\partial {\tau }_{2}}\mid }_{{t}_{2}}& \cdots & {{w}_{N-1}\frac{\partial f}{\partial {\tau }_{2}}\mid }_{{t}_{N-1}}& {{w}_{N}\frac{\partial f}{\partial {\tau }_{2}}\mid }_{{t}_{N}}\end{array}\right]$

Then I solve the following modified equation for Δτ1 and Δτ2:

${A}_{w}^{T}A\left[\begin{array}{c}\Delta {\tau }_{1}\\ \Delta {\tau }_{2}\end{array}\right]={A}_{w}^{T}\left[\begin{array}{c}{P}_{1}-{f\mid }_{{t}_{1}}\\ {P}_{1}-{f\mid }_{{t}_{1}}\\ ⋮\\ {P}_{N-1}-{f\mid }_{{t}_{N-1}}\\ {P}_{N}-{f\mid }_{{t}_{N}}\end{array}\right]$

On the right side of this equation, each Pi − f|ti are multiplied by a weighting factor wi, consistent with points being duplicated.  Similarly, on the left side of the equation, the normalization term is weighted, as it must be since all that matters is the relative weights, not the absolute weights.

It's good to check this with simple cases.  One is to reduce the number of parameters to one: τ.  Then additionally I'll reduce the number of data points to a single point: t1.

Then the above equation becomes:

${w}_{1}{\left(\frac{\partial f}{\partial \tau }\right)}^{2}\Delta \tau ={w}_{1}\frac{\partial f}{\partial \tau }\left({P}_{1}-{f\mid }_{{t}_{1}}\right)$

Note the weights cancel, as they must, since there's only one point. It's easy to see from the equation that this is the correct result: it's Newton's method in one dimension.

If I change back to two parameters and two points, I get:

$\begin{array}{c}\left[{w}_{1}{{\left(\frac{\partial f}{\partial {\tau }_{1}}\right)}^{2}\mid }_{{t}_{1}}+{w}_{2}{{\left(\frac{\partial f}{\partial {\tau }_{1}}\right)}^{2}\mid }_{{t}_{2}}\right]\Delta {\tau }_{1}+\left[{w}_{1}{\left(\frac{\partial f}{\partial {\tau }_{1}}\frac{\partial f}{\partial {\tau }_{2}}\right)\mid }_{{t}_{1}}+{w}_{2}{\left(\frac{\partial f}{\partial {\tau }_{1}}\frac{\partial f}{\partial {\tau }_{2}}\right)\mid }_{{t}_{2}}\right]\Delta {\tau }_{2}=\\ {w}_{1}{\frac{\partial f}{\partial {\tau }_{1}}\mid }_{{t}_{1}}\left({P}_{1}-{f\mid }_{{t}_{1}}\right)+{w}_{2}{\frac{\partial f}{\partial {\tau }_{1}}\mid }_{{t}_{2}}\left({P}_{2}-{f\mid }_{{t}_{2}}\right)\end{array}$
and:
$\begin{array}{c}\left[{w}_{1}{{\left(\frac{\partial f}{\partial {\tau }_{2}}\right)}^{2}\mid }_{{t}_{1}}+{w}_{2}{{\left(\frac{\partial f}{\partial {\tau }_{2}}\right)}^{2}\mid }_{{t}_{2}}\right]\Delta {\tau }_{2}+\left[{w}_{1}{\left(\frac{\partial f}{\partial {\tau }_{1}}\frac{\partial f}{\partial {\tau }_{2}}\right)\mid }_{{t}_{1}}+{w}_{2}{\left(\frac{\partial f}{\partial {\tau }_{1}}\frac{\partial f}{\partial {\tau }_{2}}\right)\mid }_{{t}_{2}}\right]\Delta {\tau }_{1}=\\ {w}_{1}{\frac{\partial f}{\partial {\tau }_{2}}\mid }_{{t}_{1}}\left({P}_{1}-{f\mid }_{{t}_{1}}\right)+{w}_{2}{\frac{\partial f}{\partial {\tau }_{2}}\mid }_{{t}_{2}}\left({P}_{2}-{f\mid }_{{t}_{2}}\right)\end{array}$

Every term is multiplied by a single weight and so only the ratios of weights matters, as expected.

If I set w2 to zero while w1 remains positive, the above two equations collapse into the following simplified equation:

${\frac{\partial f}{\partial {\tau }_{1}}\mid }_{{t}_{1}}\Delta {\tau }_{1}+{\frac{\partial f}{\partial {\tau }_{2}}\mid }_{{t}_{1}}\Delta {\tau }_{2}=\left({P}_{1}-{f\mid }_{{t}_{1}}\right)$

This is clearly correct yet it is underconstrained: two unknowns for a single condition. But it shows the weights are working as expected: putting the emphasis on the points with the higher weighting coefficients.

Appendix: the weights can be represented as a diagonal square matrix, the weights on the diagonal. Then I can map ATA to AT W A, where W is the weight matrix, A is the Jacobian matrix described by Wolfram, and AT is the transpose. Then I simply use W AT where in the unweighted case I'd used AT.