Parameter choice for optimal model fitting
I implemented the nonlinear least-square fitting scheme I described, and it didn't work. That is, it didn't work until I fixed it.
The initial problem was the parameter choice. This is a classic problem with iterative fitting schemes: parameters which may span a broad range of values, yet which are restricted to being strictly positive (or equivalently, strictly negative) are often best not fit directly. Instead it may be smarter to fit their logarithm, then take the exponential of the result to restore the original parameter. Exponentials are strictly positive, so the logarithm maps the positive numbers to the full range of real numbers, so it's more robust. Additionally, the logarithm increases the significance of differences of small values, decreasing the significance of the same absolute difference of large values. This is typically what is wanted: the difference between τ = 1 and τ = 2 is probably more significant than the difference between τ = 24000 and τ = 24001.
Fortunately, I don't need to do any complicated calculus to change my parameters to their logarithm. I instead use the following simple transformation:
∂ f / ln x = x ∂ f / x
I thus in one easy step change my Jacobian matrix to use the following derivatives, which also happen to be simpler than the originals, which is almost always a sign that I'm on the correct path:
∂ f / ∂ ln P1 = P1 [τ1 / t] [1 − exp(−t / τ1)]
∂ f / ∂ ln P2 = P2 [τ2 / t]1/2 [1 − exp(−[t / τ2]1/2)]
∂ f / ∂ ln τ1 = P1 [ (τ1 / t) (1 - exp[-t / τ1]) - exp(-t / τ1) ]
∂ f / ∂ ln τ2 = 1/2 P2 [ (τ2 / t)1/2 (1 - exp[-(t / τ2)1/2]) - exp(-[t / τ2]1/2) ]
Then, when I get the Δ values from the solution of the linear algebra problem, instead of adding these to the previous values, I do the following:
P1 → P1 exp( Δ ln P1 )
τ1 → τ1 exp( Δ ln τ1 )
P2 → P2 exp( Δ ln P2 )
τ2 → τ2 exp( Δ ln τ2 )
To improve numerical stability further, I can add a damping factor δ :
P1 → P1 exp( δ Δ ln P1 )
τ1 → τ1 exp( δ Δ ln τ1 )
P2 → P2 exp( δ Δ ln P2 )
τ2 → τ2 exp( δ Δ ln τ2 )
Here δ is between 0 (exclusive) and 1 (inclusive), where 1 signifies no damping. The smaller the value, the slower the optimal convergence rate, but the more robust the process. I chose δ = 0.5 initially, but more on that later.
With these changes, I have had good success with the method. I'll show that next.
Comments