Friday, May 2, 2014

numerical testing of maximal power curve algorithm with variable activity expiration time

Two posts back I proposed an algorithm for calculating maximal power data with a variable expiration date. For each duration, instead of retaining a single power point, I retain a list of power points where each retained point is the highest power for points of that age or newer. So going back in time for a given duration, the points are increasing power order as well as increasing age order.

If my most recent effort was my highest power effort, then I have only one point in the list. In the opposite extreme, if my power decreases every activity going forward in time, I would have every activity in the list. But obviously that would be exceptional. Far more commonly powers at a given duration are sometimes higher, sometimes lower, so the number of retained points will be between the two extremes. Uploading activities to Strava on a near daily basis, it is not unusual for a user to have 1000 activities or more. So ideally this list needs to retain only a small fraction of the total activities. If this is true, then it will be relatively quick to produce a maximum power curve for any given period up to the present.

I tested this by assuming power is random and uncorrelated. So the athlete is neither trending up or down or trending in any other way: each activity has power which is ranked randomly among all other activities, with no ties.

Then I ran my algorithm: I retained a list of the highest power activities in increasing age order.

I did this for a broad range of activity counts. For each activity count, I did it for 1000 independently randomized data sets.

image

I derive four statistics from this. One is the maximum length of the list from the 1000 repetitions: the red circle. That maxes out at 21 for approximately 7 thousand activities, with 20 for 10 thousand activities (it's just luck 7 thousand yielded more). So the longest a list is reasonably going to get is arond 20 activities, barring persistantly decreasing power. That's much, much less than the 7 thousand or 10 thousand activities assumed at the right side of the plot. And that's on order a 1 in 1000 occurance, since that's how many repitions I used.

Another statistic is the maximum length of the list at the end of the activity sequence. The peak my have occurred before the end of the activity sequence. This tends to be within one of the peak list length, so it's essentially the same.

Then there's the average length of the activity list. I take two averages over the 1000 iterations per activity count. One is the average of the maximum length encountere during the given number of activities. The other is the average of the length at the end of the given number of activities. This represents a typical number of activities in the list after a given number of total activities. The two averages also tend to be within one of each other.

After 10 activities, the average peak length is 3. It goes up to 6 after 100 activities. Then it's 8 after 1000 activities. Then after 10 thousand activities, it's 10. So it's increasing by two every time the number activities increases by a factor 10: a logarithmic progression. For each of the averages, I show logarithmic fits to the numerical results. The fit to the average maximum list length is quite good. The fit to the average final list length is less good (the final 11 points fall above the fitted curve). Note both coefficients for the fit to the average peak list length are close to one, suggesting an analytic approximation of 1 + ln activities. In this case, when the activity count is increased by 10, you'd expect the list length to increase by 2.30258.... The crude trend I observed before was 2, close to this number.

I can attempt to come up with just such an approximation. Consider a large number of activities. The activity of the highest power excludes all activities which come after. The activity can come with equal probability at any point in the list. So if I assume a list of length N, I get the expectation value of the length of list N is 1 + the average of the expectation value of all lists of length from 1 to N. I can test this in the large N limit using integrals. If I assume the value is ln N, then the integral from 1 to N - 1 of ln N = (N - 1) ln (N - 1) - N. The average is this divided by N - 1, or ln (N - 1) - N / (N - 1). I add the one and I get ln( N - 1 ) - 1 / (N - 1). In the large N limit this is approximately ln N. So the analytic approximation that the final list length goes to ln N applies. The maximum list is observed to be approximately one more than this, so ln N + 1.

While lists of this typical length will increase computational time relative to retaining only a single point, versus processing all activities from scratch each time a new time cut-off is established the savings are huge.

No comments: