adding weights to nonlinear least squares algorithm

January 26, 2014

Note: this uses MathML code produced by OpenOffice. It doesn't seem to work on Internet Explorer. It's also been rejected by Chrome. But it works on Safari, Chromium, and Firefox.

Nonlinear least-square fits are done with a Jacobian Matrix, which describes the linearized dependencies of the model, evaluated at each of the points, on each of the model parameters. In this case I show two parameters, τ₁ and τ₂. I like writing these things out, rather than using index notation, because index notation is a bit abstract. In the following, f is a function of time t representing the model to be fit.

A = [\begin{matrix} {\frac{\partial f}{\partial τ_{1}} ∣}_{t_{1}} & {\frac{\partial f}{\partial τ_{2}} ∣}_{t_{1}} \\ {\frac{\partial f}{\partial τ_{1}} ∣}_{t_{2}} & {\frac{\partial f}{\partial τ_{2}} ∣}_{t_{2}} \\ ⋮ & ⋮ \\ {\frac{\partial f}{\partial τ_{1}} ∣}_{t_{N - 1}} & {\frac{\partial f}{\partial τ_{2}} ∣}_{t_{N - 1}} \\ {\frac{\partial f}{\partial τ_{1}} ∣}_{t_{N}} & {\frac{\partial f}{\partial τ_{2}} ∣}_{t_{N}} \end{matrix}]

With the unweighted nonlinear least-squares fit, the transpose of the Jocobian matrix is then taken:

A^{T} = [\begin{matrix} {\frac{\partial f}{\partial τ_{1}} ∣}_{t_{1}} & {\frac{\partial f}{\partial τ_{1}} ∣}_{t_{2}} & \dots & {\frac{\partial f}{\partial τ_{1}} ∣}_{t_{N - 1}} & {\frac{\partial f}{\partial τ_{1}} ∣}_{t_{N}} \\ {\frac{\partial f}{\partial τ_{2}} ∣}_{t_{1}} & {\frac{\partial f}{\partial τ_{2}} ∣}_{t_{2}} & \dots & {\frac{\partial f}{\partial τ_{2}} ∣}_{t_{N - 1}} & {\frac{\partial f}{\partial τ_{2}} ∣}_{t_{N}} \end{matrix}]

This then creates a linear equation describing the iteration of the solution. In the following, P_i are the points to be fit, of which there are N. Point P_i is sampled at time t_i.

A^{T} A [\begin{matrix} Δ τ_{1} \\ Δ τ_{2} \end{matrix}] = A^{T} [\begin{matrix} P_{1} - {f ∣}_{t_{1}} \\ P_{1} - {f ∣}_{t_{1}} \\ ⋮ \\ P_{N - 1} - {f ∣}_{t_{N - 1}} \\ P_{N} - {f ∣}_{t_{N}} \end{matrix}]

To weight this, I replace the A^T matrix with the following:

A_{W}^{T} = [\begin{matrix} {w_{1} \frac{\partial f}{\partial τ_{1}} ∣}_{t_{1}} & {w_{2} \frac{\partial f}{\partial τ_{1}} ∣}_{t_{2}} & \dots & {w_{N - 1} \frac{\partial f}{\partial τ_{1}} ∣}_{t_{N - 1}} & {w_{N} \frac{\partial f}{\partial τ_{1}} ∣}_{t_{N}} \\ {w_{1} \frac{\partial f}{\partial τ_{2}} ∣}_{t_{1}} & {w_{2} \frac{\partial f}{\partial τ_{2}} ∣}_{t_{2}} & \dots & {w_{N - 1} \frac{\partial f}{\partial τ_{2}} ∣}_{t_{N - 1}} & {w_{N} \frac{\partial f}{\partial τ_{2}} ∣}_{t_{N}} \end{matrix}]

Then I solve the following modified equation for Δτ₁ and Δτ₂:

A_{w}^{T} A [\begin{matrix} Δ τ_{1} \\ Δ τ_{2} \end{matrix}] = A_{w}^{T} [\begin{matrix} P_{1} - {f ∣}_{t_{1}} \\ P_{1} - {f ∣}_{t_{1}} \\ ⋮ \\ P_{N - 1} - {f ∣}_{t_{N - 1}} \\ P_{N} - {f ∣}_{t_{N}} \end{matrix}]

On the right side of this equation, each P_i − f|_ti are multiplied by a weighting factor w_i, consistent with points being duplicated. Similarly, on the left side of the equation, the normalization term is weighted, as it must be since all that matters is the relative weights, not the absolute weights.

It's good to check this with simple cases. One is to reduce the number of parameters to one: τ. Then additionally I'll reduce the number of data points to a single point: t₁.

Then the above equation becomes:

w_{1} {(\frac{\partial f}{\partial τ})}^{2} Δ τ = w_{1} \frac{\partial f}{\partial τ} (P_{1} - {f ∣}_{t_{1}})

Note the weights cancel, as they must, since there's only one point. It's easy to see from the equation that this is the correct result: it's Newton's method in one dimension.

If I change back to two parameters and two points, I get:

\begin{matrix} [w_{1} {{(\frac{\partial f}{\partial τ_{1}})}^{2} ∣}_{t_{1}} + w_{2} {{(\frac{\partial f}{\partial τ_{1}})}^{2} ∣}_{t_{2}}] Δ τ_{1} + [w_{1} {(\frac{\partial f}{\partial τ_{1}} \frac{\partial f}{\partial τ_{2}}) ∣}_{t_{1}} + w_{2} {(\frac{\partial f}{\partial τ_{1}} \frac{\partial f}{\partial τ_{2}}) ∣}_{t_{2}}] Δ τ_{2} = \\ w_{1} {\frac{\partial f}{\partial τ_{1}} ∣}_{t_{1}} (P_{1} - {f ∣}_{t_{1}}) + w_{2} {\frac{\partial f}{\partial τ_{1}} ∣}_{t_{2}} (P_{2} - {f ∣}_{t_{2}}) \end{matrix}

and:

\begin{matrix} [w_{1} {{(\frac{\partial f}{\partial τ_{2}})}^{2} ∣}_{t_{1}} + w_{2} {{(\frac{\partial f}{\partial τ_{2}})}^{2} ∣}_{t_{2}}] Δ τ_{2} + [w_{1} {(\frac{\partial f}{\partial τ_{1}} \frac{\partial f}{\partial τ_{2}}) ∣}_{t_{1}} + w_{2} {(\frac{\partial f}{\partial τ_{1}} \frac{\partial f}{\partial τ_{2}}) ∣}_{t_{2}}] Δ τ_{1} = \\ w_{1} {\frac{\partial f}{\partial τ_{2}} ∣}_{t_{1}} (P_{1} - {f ∣}_{t_{1}}) + w_{2} {\frac{\partial f}{\partial τ_{2}} ∣}_{t_{2}} (P_{2} - {f ∣}_{t_{2}}) \end{matrix}

Every term is multiplied by a single weight and so only the ratios of weights matters, as expected.

If I set w₂ to zero while w₁ remains positive, the above two equations collapse into the following simplified equation:

{\frac{\partial f}{\partial τ_{1}} ∣}_{t_{1}} Δ τ_{1} + {\frac{\partial f}{\partial τ_{2}} ∣}_{t_{1}} Δ τ_{2} = (P_{1} - {f ∣}_{t_{1}})

This is clearly correct yet it is underconstrained: two unknowns for a single condition. But it shows the weights are working as expected: putting the emphasis on the points with the higher weighting coefficients.

Appendix: the weights can be represented as a diagonal square matrix, the weights on the diagonal. Then I can map A^TA to A^T W A, where W is the weight matrix, A is the Jacobian matrix described by Wolfram, and A^T is the transpose. Then I simply use W A^T where in the unweighted case I'd used A^T.

Search This Blog

On Bicycles, and.... what else is there?

adding weights to nonlinear least squares algorithm

Comments

Popular posts from this blog

Marin Avenue (Berkeley)

Strava Suffer Score decoded

hummingbird feeder physics