Tuesday, August 20, 2013

Power meter cadence comparison (analysis of DCRainmaker data)

The cleverest way I've seen to check cadence data is due to Robert Chung. He uses cadence and speed in conjunction with an assumed wheel rolling circumference to calculate the for a bike for each data point. If cadence and speed were measured perfectly, I'd be able to see exactly what gear the rider was in at every point where he was pedaling (no coasting). On the other hand, if the cadence or speed are measured sloppily, then the gear calculation would also be sloppy. The key insight is that gears are discrete: there's a countable number of choices. So if I can extract the gear, I should see only a discrete set of results: plotting gear over time should show steps, with transitions between the steps corresponding to shifting, with deviations from the steps only when the rider is coasting with the cranks stationary, or, hopefully not often, spinning the cranks while coasting.

The issue with this approach is it depends on both speed and cadence being of equal quality. In the case of the DCRainmaker test, he had different speed sensors associated with different power meters. So to judge cadence values alone, you need to establish a uniform standard for speed data. He has this available, since he has synchronized data available as was captured by his WASP Ant+ Sport hub. But working with that would require a bit of effort.

I wanted to look at cadence alone. To do that, I will make the assumption that cadence has two sources of variability: one is actual cadence, the other is error. I will assume these are completely uncorrelated for moderate cadence values. This would not be the case if a unit was using post-processing on cadence, for example data smoothing. Data smoothing applies a cadence error which tends to cancel changes in actual cadence. But this smoothing would also yield smoothing of the extracted gear, something Robert doesn't observe. So I will assert that data smoothing isn't used, and changes in cadence due to cadence measurement error are uncorrelated with changes in actual cadence. I further assume, in addition to a lack of explicit smoothing, the unit's measurement of cadence for one data point is unaffected by the result from the preceding or subsequent data point.

I will then assume that the sample-to-sample cadence variance consists of a component from actual cadence variance, and a component due to error. Since the data for each unit were taken from the same ride, I conclude the variance of actual cadence was the same. That leaves only the variance from the error. So if I rank the various units by the total sample-to-sample cadence variance, I get a ranking of their cadence error.

One sort of cadence error this would not catch would be a systematic error. For example, suppose the cadence error is a 1:1 function of actual cadence, for example that reported cadence is 99% of actual. Or, perhaps the error varies slowly during the ride, drifting from -1 rpm at the beginning to +1 rpm at the end. I assume errors in individual samples are uncorrelated with each other. If it makes a -2 rpm error this second, next second the dice are tossed freshly and without bias: the error will have the same probabilities as if the error this second had been +2 rpm.

So enough... here's the ranking of the RMS change in cadence from the different units used by DC Rainmaker. I limited the analysis to cadence values of at least 30, since cadences less than this are of trivial interest and likely encountered only when coasting.

Edge 800 + Quarq: 3.804 rpm
Edge 800 + Stages: 4.563 rpm
Edge 810 + Powertap: 5.535 rpm
Edge 810 + Vector: 5.904 rpm

But I learned early on that it's a mistake to look at just derived numbers without looking at a plot showing more detail. Here's a histogram of the rpm changes, comparing Vector (the most total variability) to the Quarq (the least).


It's fairly unambiguous: the Quarq has consistently fewer counts in almost every bin for which the magnitude of the cadence change is more than 1 rpm.

So why is the Quarq producing tighter cadence numbers? I'm not sure. But if the reason is Vector is more prone to cadence error, the positive thing is that is handled by their pods, so upgrading cadence accuracy in the future will be relatively cheap and easy, even assuming it required a hardware change. More likely, perhaps, it could be fixed with a firmware update.

As an aside, it is curious the Edge 800 units each rank ahead of the Edge 810 units. I'll need to check the WASP data for verification of these results.

added: I did check the WASP data and the results are very different, with the Vector cadence doing quite well. Curious.

Friday, August 16, 2013

Applying pedal smoothness algorithm to Metrigear Vector data

Last time I proposed an algorithm for pedal smoothness. I can hardly take credit for it, it was basically the reciprical of Coggan's variability index without the 30-second smoothing. It's fairly obvious to apply it to the pedal stroke, as well.

Here's some data left over from the old Metrigear Vector blog, showing measurements taken with the Metrigear-era Speedplay Vectors. These data are at a much higher sampling rate than would be recorded by an Edge computer: they show the detailed power and cadence during just a few seconds of a longer "ride" (on the trainer):

I used Plot Digitizer to pull points off the plot (off-topic: I really like Plot Digitizer; it's replacing g3data, which I previously used). Here's a view of a subset of those data. Curiously, the left leg is going negative power, while the right leg does not.

The plot also shows total power, the sum of the L and R legs. This shows a strong oscillatory character: it goes from a maximum of near 800 watts to a low of near zero each half-pedal-stroke.

So then I did a running calculation of smoothness for each leg, smoothness for net power, and of course L-R power balance:

You can see the values oscillating with each pedal stroke. This isn't really a problem, since Vector uses "event-drive" data recording, generating numbers with each pedal stroke. Even after only 3 seconds of smoothing the amplitude of the oscillations nicely. But as I noted, as long as Vector averages over complete pedal strokes, the oscillations should be essentially eliminated.

Not surprisingly, single-leg smoothness numbers are lower than smoothness for the two legs combined. This is because the power from the two legs are anticorrelated: when one drops off, the other tends to pick up the slack. The sum is smoother than either of its parts.

Also note the left leg is scoring a lower smoothness than the right. Since the left leg is dipping into negative power much more prominently than the right leg, this seems like a reasonable result.

Another factor is whether the numbers are of the correct order of magnitude. Without any definition, if you asked someone how smooth their pedal stroke was considering both feet together, then considering each foot separately, I think 70% and 35% are plausible responses. Really, most users of a number aren't going to delve into the detailed derivation: they just look at a number and judge it at face value. 70% says to people "mostly smooth, but not perfectly smooth", while 35% says "not very smooth". These seem like the correct messages.

Metrigear Waterbottle
Metrigear Waterbottle data recorder

Ideally, it would be interesting to be able to access each of these numbers. Why not? I think left leg, right leg, and total power smoothness all provide an interesting story. But what would be most interesting is if they finally support the detailed pedal stroke curves, like these data published on that Metrigear blog. There's no reason that can't happen. Maybe it would require a pod upgrade, but with pods selling at $69 each, less than 5% the cost of the full system, that's the sort of upgrade a lot of people would be happy to make down the road. The issue is how to transfer the data to the user. The old Metrigear water bottle isn't necessary any more, with low-energy bluetooth almost universally available on mobile devices. Then we'll see Vector truly differentiate itself.

This, after all, is the brilliant aspect of the Vector design: with so many of the "brains" in the cheap, external pods, the investment made in the more expensive pedal spindles should be a lasting one.

Wednesday, August 14, 2013

Proposed Pedal Stroke Smoothness Algorithm for Garmin Vector

In anticipation of the release of the Garmin Vector, or perhaps the Rotor Flow, Garmin added a pedalstroke quality field to its Edge-series head units in the recent firmware update. But consistent with the Vector-team's approach to not release anything which they won't stand behind, on the first public release of the Vector power meter, that field remains unpopulated.

It's an interesting question about how important pedal technique actually is. I think most cyclists think if they can pedal in a smoother, more uniform fashion, their cycling will improve. This has been difficult to demonstrate in the laboratory, however. For example, a recent work by Arkesteijn, et al, used force-feedback to encourage riders to pull up more on the upstroke. This worked, improving the uniformity of force application around the pedal stroke, but gross energy efficiency of their cycling failed to improve. On the other hand, it appeared the smoother pedaling increased the ability of the cyclists to resist fatigue, preserving a greater fraction of peak power during extended pedaling. Gross efficiency is easily overestimated: it's the ability to produce near-maximal power which is more important in bike racing.

But there has been interest in the subject for decades. By at least the 1980's (Burke, Cycling Science) there was proposed the concept of a "force efficiency" metric for pedal stroke. The idea was you wanted to push in the direction of pedal motion, and only that direction. The metric is calculated as the average propulsive force divided by the average total force.

But this is silly. If I go from seated to standing, I transfer a force close to M×g from the saddle to the pedals. This will tank my pedal efficiency, but there's no indication it should reduce metabolic efficiency. Standing is a relatively energy-efficient process. And indeed, the literature has continually failed to show a direct correlation between force efficiency and metabolic efficiency.

Another metric has been proposed: the "Dead Center size" (DC). This is computed as follows:

DC = (power @ top of pedal stroke + power @ bottom of pedal stroke) / (2 × average power over pedalstroke).

This is also silly. What's so special about the top and bottom of the pedal stroke? It's possible a pedal stroke could be horribly non-uniform, but the average "dead center" power just happens to match the average power.

An alternative is to, instead of trying to pick the top and bottom of the pedal stroke, to use the minimum power in the pedal stroke. This metric is:

smoothness = minimum power during pedal stroke / average power during pedal stroke.

But this is also silly. Imagine I am pedaling at 90 rpm, which is one pedal stroke per 667 msec. I pedal at constant power for 666 msec, then for 1 msec I apply zero power. Is this a zero uniformity score? Hardly: my pedal stroke is close to perfect, closer to perfect than any human can achieve.

So I propose an alternate metric. It's not new: it's a simple derivative of the one Andrew Coggan has been applying to ride data for more than 10 years.

The idea is to assign a "cost function" to a pedal stroke at a given point in time, then to compare the cost function of an optimal pedal stroke at that power to the cost function of an actual pedal stroke.

For example, Coggan likes to use wattage to the 4th power as a "cost function" in calculating his "normalized power". That seems like it would work perfectly fine. The difference is in normalized power, the data are smoothed with a time constant on order 30 seconds. For calculating a pedal stroke uniformity metric, obviously that sort of smoothing would essentially kill any nonuniformity. So if smoothing is explicitly applied, it would need to be to a much shorter time constant. Typical measurement systems already have smoothing in place, implicit or explicit, so additional smoothing may not be needed.

So given this, a metric which represents the smoothness of power would be the following:

smoothness = fourth power of the average watts / average of the fourth power of watts (warning: I modify this later).

Note a nice feature of this is that if I apply negative watts, there's a positive cost for this. On a typical bicycle this represent eccentric work. Eccentric work is generally considered to be fatiguing, so it makes sense that it be assigned a "cost".

The obvious question if you're Vector is if this sort of metric should be applied to the combined L+R power data, or should be applied to each pedal separately. Obviously even a perfectly uniform pedalstroke from a total power standpoint may have regions where the L leg dominates and regions where the R leg dominates, yielding nonuniformities. From the normalized power perspective, total power is used, because the cardiovascular system is shared by the two legs. But from a pedal uniformity perspective, perhaps the emphasis is on individual muscle loading, which would imply doing the legs separately makes more sense. With this reasoning, you'd have two scores: one for the left foot, one for the right foot. If you wanted a single result, then, you could average the two, weighting by each leg's power production. The weighting is important so if you pedal one-legged you get a meaningful result: the efficiency will be 100% from the pedaling leg.

The downside of the individual-leg approach: suppose I'm pedaling seated with a given force-versus-angle relationship. When a given pedal is on the upstroke, I'm applying close to zero force, and the power associated with that side is close to zero. Now I stand, and continue pedaling in a similar fashion, except I've added roughly half my body weight to each pedal, downward. Now during the upstroke of each foot there is a considerable downward force, so that foot is doing negative power. This is exactly canceled by an increase in power of the foot on the downstroke. It is obvious that each single-leg smoothness value will thus fall considerably. Yet my assertion was that the smoothness shouldn't be destroyed by going from a seated to a standing position, since the simple act of supporting weight on legs isn't metabolically expensive. However, if total power is used, that need not be affected by whether the rider is standing or sitting. So perhaps using total power is better after all.

But I'm not sure. The reason is there's a fundamental difference between supporting body weight with your bones and pushing body weight from a seated position with your muscles. So until we start riding with EMG probes attached to our legs, crude compromises need to be made. But if it was me, I'd try both approaches, and see which seems to work better.

Power to the 4th power is just one cost function. I could imagine others. For example, suppose I asserted that for small powers, the uniformity was less important. After all, if a leg is alternating between 0 and 20% FTP that variability may be less important than if it is alternating between 0 and 200% FTP. So I could make my "cost function" a function of FTP. Consistent with my assertion negative powers should incur a significant cost, a cost function meeting these needs would be:

cost function = cosh(K P / FTP).
for some K.

But I don't like this, since it's always a question what "FTP" or K should be. Keep it simple. So I'll stick with P4.

I wrote a little Perl script to test this. I assumed that power varied sinusoidally about a mean value, the variation with an amplitude varying between 0% and 200% of the mean. With the amplitude greater than the mean, there were portions of negative power during the pedal stroke. An amplitude of at least the mean power is realistic for total power, since power delivered with one foot at the top and the other at the bottom of the pedal stroke is generally much less than the peak power when the feet are at 3 and 9 o'clock.

Based on initial results, pedal uniformity metrics were very low. Depressing. So consistent with the DC metric, I did a reverse transform to convert the numerator and denominator to units power. This is what Coggan does, anyway, with the variability index, which is the reciprical of my smoothness metric.


smoothness = abs(average power) / (average of 4th power of power)1/4..

For the sinusoidal variation, the result can be calculared analytically

smoothness = 1 / ( 1 + 3 a2 + (3/8) a4 )1/4

which for a = 1 (pedal stroke force varies between 0 and twice the average) yields 69%, as shown in the following plot:


Here's some old data from the Metrigear days (courtesy Cozy Beehive) showing pedal force versus angle:

Great stuff. Anyway, you see the force varies similar to a=1. So 69% would be a fairly good number.

Of course Vector is best-positioned of all the power meters current sold to deliver this sort of metric to Edge head unit (Polar Look not using ANT+), since it measures the force "at the source", as they say.

Some References:

  1. M. Arkesteijn, J. Hopker, S. A. Jobson, L. Passfield; The Effect of Turbo Trainer Cycling on Pedalling Technique and Cycling Efficiency; Int J Sports Med, 2012: link
  2. Leirdal S, Ettema G. Pedaling technique and energy cost in cycling. Med Sci Sports Exerc. 2011 Apr;43(4):701-5. doi: 10.1249/MSS.0b013e3181f6b7ea. PubMed PMID: 20798659: link
  3. Theurel J, Crepin M, Foissac M, Temprado JJ. Effects of different pedalling techniques on muscle fatigue and mechanical efficiency during prolonged cycling. Scand J Med Sci Sports. 2012 Dec;22(6):714-21. doi:10.1111/j.1600-0838.2011.01313.x. Epub 2011 Apr 21. PubMed PMID: 21507064: link
  4. European Journal of Applied Physiology December 2011, Volume 111, Issue 12, pp 2885-2893, The relationship between cadence, pedalling technique and gross efficiency in cycling; Stig Leirdal, Gertjan Ettema: link

Tuesday, August 13, 2013

On Stages cadence, and comparison w/ Powertap from DC Rainmaker data

In the previous post I compared power from the Stages power meter to power from the Vector power meter (or meters, since L and R are separate) measured by DC Rainmaker on a ride he did in DC after the Vector release in Boulder, Colorado. Since Stages is measuring power only on one side of the bike, it is natural to compare the results with Vector, which measures power on each side of the bike separately. Before that, I showed that the Vector total power agreed well with Powertap and Quarq. If I assume that validates total power for the Vector, then it validates L and R power separately, since total power is derived from L and R power (the validation would be invalid if there were errors which naturally canceled between the L and R side, but I can't identify any).

But Stages does one thing I really like: it measures cadence multiple times per second, instead of relying on an average cadence associated with the time for a full pedal rotation. The constant cadence approximation leads to an error if the pedal velocity is changing during the pedal stroke, as it would if the bike is in a low-inertial condition such as a steep climb or strong headwind or resistive surface like mud or sand, or if the bike had eccentric chainrings. I wrote about this in 2010. Quarq makes the constant cadence approximation, since it uses a magnet to measure pedal rotation time.

Vector, on the other hand, may or may not make this approximation. I don't think they've said how they measure cadence. It's not trivially obvious, since there's advantages and disadvantages. The disadvantage I've described. The advantage is that it's relatively easy on a trainer, where the crank is spinning isolated in inertial space, but when riding on rough roads at low cadence it's not easy to measure instantanous cadence accurately. I described this in 2010, as well.

Stages would need to measure cadence differently than the way I described Vector might do it, since they operate on only one side. What Stages does, I suspect, is to put accelerometers at two separate radii of the crank arm, two different positions in their package. The difference in acceleration between these points is easily calculated: 𝞈² Δr, where 𝞈 is the angular velocity of the crank (radians/sec) and Δr is the separation of the accelerometers. So if I can measure that acceleration difference, I can extract the rotation rate: 𝞈 = sqrt[Δa / Δr], where Δa is the acceleration difference.

The nice thing about this is it doesn't depend on the position of the Stages along the crank arm, it just depends on the accelerometer separation, and that's essentially fixed by the rigid package. The challenge with this, especially when 𝞈 is relatively small, is that for Stages, Δr is just a fraction of the crank arm length, so the magnitude of this acceleration difference is limited. They would do better if they had a greater package length.

We know vibrations are on order 1 g, where g is the acceleration of gravity. So it's useful to compare the expected acceleration to this reference. If the cadence = 40 rpm (2/3sec) and the accelerometer separation is 2 cm, the acceleration difference would be 36 milligravities. So to estimate that to within 1% accuracy (you likely need at least this good for 2% power accuracy) would require 360 microgravity sensitivity in the presence of noise on order gravity. So it's nontrivial as your bike is rattling over the cobbles.

So the question becomes: how well can they cancel other sources of vibration this way? Vibrational modes of the crank arm, for example, will result in differences in acceleration at different point along the crank arm and will yield an error in cadence extraction.

Anyway, if they can get the cadence sufficiently rapidly, for example 3 times per half-pedal-stroke, which at 180 rpm is 18 times per second, they should do a superior job than other power meters at reproducing what Powertap measures, which is power without relying on imprecise approximations for cadence as a function of time. So I compared Stages "total power" to Powertap power. Here's a histogram of the power difference:

Stages vs Powertap

First, don't panic: it's not as bad as it looks. First of all the axis is logarithmic, which enhances small counts. Second, recall power differences can come from timing differences as well as actual differences in power measurement, and no two power meters (and head units) are going to be perfectly synchronized. That said, there's still a lot of spread there compared to the other comparisons I've done. Recall Quarq and Vector both tracked Powertap rather nicely. If all you looked at was average power, you'd be extremely pleased with this comparison, however (unless you factored in drivetrain loss of on order 7 watts, in which case you'd be slightly less happy). In any case I've never heard anyone complain about Stages accuracy or precision. The reality is most people aren't very sensitive to either.

Monday, August 12, 2013

Vector and Stages power comparison

The last time I compared DC Rainmaker's total power numbers from Vector, Powertap, and Quarq on a typical ride near DC (Zip file here). The result was excellent agreement by my standards, considering the meters are measuring different points in the power transmission path.

StagesThat leaves the Stages, which he also used. The Stages is not a total power meter; it's a left-leg power meter. It produces a derived number for total power by doubling left-leg power, but if you even glance at any Vector data, you realize that left leg and right leg power differ. Vector is also not a total power meter: it's two power meters. It's a left-leg meter and it's a right leg meter. These are distinct and are calibrated separately. You can derive total power by adding the two together, and the assumption here is total power is the sum of the left leg power and the right leg power. This is an excellent assumption as long as I'm not pushing with my hand on the crank arm.

So comparing Stages total power to the total power from another power meter is rather indirect. A better comparison is to compare Stages left leg power to the left leg power from another power meter. The obvious candidate is the Garmin Vector.

Additionally, I compared Stages left leg power to the Garmin's right leg power. This accomplishes two things. One is it checks to make sure there's any advantage to measuring power on both sides (if left leg power predicts right leg power, perhaps with a small offset but it still tracks) then we should see a narrow spread in this latter comparison and there's little advantage to be gained in using two sides, other than balance. But it also checks to make sure I'm labeling the Garmin powers correctly: if the Stages power were to be a better predictor of the Garmin R power han the Garmin L power, that would lend suspicion I had them swapped.

Here's the results, remembering I divided the Stages total power by two to get a Stages L-power (which is what it actually measured):

Stages vs Garmin L

Stages vs Garmin R

This is interesting: what is observed is that the average power from the Stages is a better match to the Garmin right power than the Garmin left power. This may in itself yield suspicion the Garmin powers were swapped. But looking at the spread of values makes it clear that Stages tracks the Garmin L power better than it tracks the Garmin R power. It suggests the agreement of the average value with the Garmin R power is fortuitous. Note in considering these σ values that they are from an average power roughly half that considered when looking at total power: the percentages are thus roughly double.

So the conclusion here is the comparison of left leg power from Stages to Garmin left leg power yields a decent amount of spread: 3.5 watts as calculated here. But comparing the Stages L power with the Garmin R power yields considerably more spread: 5.0 W with considerably more outliers.

The question is why is there such a big difference between the Stages L power and the Garmin L power: 5.5 watts. First I need to assess which is more trustworthy. The Garmin total power has already been compare with the total power from Quarq and Powertap (last post) and they tracked quite well. Since total power is derived from L and R power, the errors in L and R power must be even less then the errors in total power (there's no mechanism for errors to cancel between L and R). So this lends confidence to the Garmin numbers for both L power and R power.

I don't know, but I think what Stages is doing, measuring crank flex along the longer of two cross-sectional axes using an externally applied sensor, is challenging. Garmin is measuring the deflection of a hollow cylinder from the inside. That seems much more robust. Think about an eccentric beam with a major axis and a minor axis. It bends much more easily on its minor axis. Then I apply force from off-axis, attempting to bend it along the direction of the major axis (propulsive force). But I'm additionally providing a moment which is trying to bend it outward. since my foot is applying force outside the rotational plane of the crank arm. It's thus going to bend outward, along the minor axis, as well as along the major axis. Stages needs to measure only the bending along the major axis without being affected by bending along the minor axis. And, since I'm pushing forward on the axle, there's additionally a twisting moment. It just seems like a hard problem to me, and when I saw how well it seemed to work, I was impressed. But if it didn't work as well as a Quarq or SRM, which is measuring the wind-up stress in a spider, or as a Garmin, measuring the simple bending of a hollow cylinder, that wouldn't surprise me. But I'm not a mechanical engineer, so am no expert on these things. Still, it doesn't surprise me too much the left power doesn't agree closer than it does.

Despite my claim all that matters is single-pedal power when comparing to Stages, I at the last minute decided to run the "full power" numbers as well. Here's the result. The comparison isn't good, and this is with DC Rainmaker's relatively balanced L-R pedal stroke:

Stages vs Garmin R

I just got out for my first outdoor ride since my crash 8 weeks ago, so it would be fun to give Vector a spin to see if my right-leg-injury translates to left-leg-dominance. These things are selling like hotcakes, though, so I may need to wait.

Sunday, August 11, 2013

Power comparison: Vector vs Powertap vs Quarq

DC Rainmaker has posted another dataset to his blog (Zip file here). His early installation issues behind him, these data serve as a valuable comparison between the power meters he's using: the spanking new Garmin Vector, the Quarq Elsa, Powertap, and Stages.

The point of this post is to quantitatively compare the powers. But first some discussion...

Each of these power meters is measuring power at a different point in the transmission path. Vector gets first shot at it, picking up power transmitted through the pedal axle. Stages is next, measuring power in the left crank arm. Next, Quarq measures it in the crank spider. Finally Powertap picks up the power which manages to make it to the rear hub. The largest losses are expected between the Quarq and hub, as mechanical losses in the chain and in the rear derailleur pulleys converts mechanical power into heat before the Powertap sees it.

Between the Vector and the Quarq, it takes more imagination. Brim Brothers, not yet available, will measure power in the cleat, and between the cleat and the pedal axle are the pedal bearings, so you'd expect some loss there. But between the pedal spindle and the spider is a hard mechanical linkage. Sure, there's some flex, and any flex-unflex cycles will yield some energy loss (but also some energy storage and release), but cranks are fairly beefy things these days so you'd expect these losses to be in the sub-watt range. In contrast, losses in the drivetrain are on order 3% of total power. FrictionFacts has generates most excellent data on these losses (I recommend their reports).

So it's a mistake to just expect each of these power meters to agree with each other, at least ideally. Of course if I make a pedal-based power meter and calibrate it to a Powertap, it will tend to agree with the Powertap, even though there's power loss between the two detection points. But this would be because I'm making an error in my calibration. Vector, I strongly suspect, doesn't calibrate to a Powertap. On the other hand, it would be easy for them to calibrate to a spider-based system, for example a Quarq or SRM (for example, the Science version). Whatever they choose, it's almost certainly downstream of the axle, so I wouldn't expect losses at the axle-crankarm interface to be included in the Vector power, even if in theory it should capture those. So I'd lump the power meters into two groups: upstream of the chain, and downstream of the chain, Powertap the only one in the latter group.

If I accept that there is some small difference between the power available to each meter, then it would be a mistake to assume they should measure the same value. However, if I assume the transmission losses are roughly proportional to the transmitted power (not a good assumption, actually, since fractional losses depend on gear selection, cadence, and chain tension), I'd expect the meters to track each other if they're all "correct".

To prepare the data, I convolved them with my biexponential smoothing function where each exponential had a characteristic time decay of 10 seconds. This means that variation quicker than 10 seconds wasn't considered. This is to smooth out some of the "noise". This is especially important because power naturally oscillates with each half-pedal-stroke: it's not constant. But we're more interested in the trend, not these half-stroke oscillations. Smoothing the data out to 10 seconds gets rid of some of the artifacts associated with a non-integer number of half-pedal-strokes fitting into teach one-second time sample. This is especially important for the Powertap.

I then made sure the powers were synchronized. Actually, cadence was convenient for this. The only change I made was to delay the Quarq data by 2 seconds relative to the other. Then I eliminated missing data points. Each data stream had a few missing points. I can't take a difference if either of two data streams being compared are missing.

Then I substracted the powers from one meter versus the other. This is an absolute power difference. I could have done a fractional power difference, in which case I'd take the natural logarithm of the powers first, but since I care less about small powers than large powers, I used an absolute difference.

Then I stripped out the first and last 60 seconds of data. This was to give the smoothing, which uses data to both sides of a given point, something to work with.

Note two things can yield a difference in power. One is an actual difference in the power measurement for a given applied power. But the other is a difference in the time delay, if power is time-variable, as it always is with a human pedaling the bike. So even a human pedaling a bike absolutely the same with the same power meter two different times, there will be power differences. This is because the one-second samples will have different registrations within the data set. For example, if the power is increasing 100 watts/second, one attempt may result in samples spanning 250 to 350, then 350 to 450 watts, while the next iteration might have samples spanning 300 to 400, then 400 to 500 watts. When I do my comparison, I'll get a 50 watt difference, even those these were equivalent trials with equivalent measurement systems.

So here are the results. In each case I plotted the histogram on a logarithmic axis, so the spread is actually a lot tighter than the plot might appear at first. I also fitted a normal distribution curve, which is characterized by a mean and a standard deviation (σ). The σ characterizes the spread in the data:

Vector vs Powertap

Vector vs Quarq

Quarq vs Powertap

The "winner" for the best agreement is between Quarq and Powertap. However, the difference between Vector and Powertap is a lot closer to the 3% or so you'd expect from drivetrain losses. So the difference between Vector and Powertap is more in line with theory than the difference between Quarq and Powertap. The spread of the difference between Quarq and Powertap is tighter than the spread in the comparisons involving Vector.

However, the three results are quite close, around 3 watts, which is on order 1.5% of the mean power. Given some of this is from transients in power, as I described earlier, as opposed to power measurement error, these comparisons are in my view spectacularly good. I'd be more than happy using any of these three power meters based exclusively on these results. I feel slightly compelled toward the Vector or Powertap versus the Quarq since the power difference between those two is what I'd expect from theory, but it's very close.

That leaves the Stages, which I'll look at next.

correction:: Because of the way XmGrace encodes histogram data, I had a 0.5 watt bias in my Gaussian fits to the histograms (they're shifted to the right 0.5 watts). I fixed the numbers in the legends.

Friday, August 9, 2013

Garmin Vector released: L-R power balance comparison with Quarq Elsa

Garmin Vector vs Quarq Elsa: L-R balance comparison

The Garmin Vector is clearly the most anticipated power meter to come onto the market. There's a few reasons for this. One is the freedom of choice it affords in selecting components. With a Powertap, you need essentually a new Powertap for every wheel. With crank-based systems, restrictions are more limtited, but swapping pedals is generally considered easier than swapping cranks (this is debatable, however: my Lightning crank is super-easy to take off and on).

But perhaps more than this is the ability to measure independently the left and right pedal. This is power measurement at essentially the point of contact. It's directly measuring the forces applied by the rider. That was the inspiration for the name: "Vector". It's measuring the force vector applied by the rider's feet.

This is somewhat of an academic point in comparison to the crank spider, since it's generally considered to be the case that there is negligible power loss between the pedals and the spider. There's some loss associated with the inelastic component of flex, but the crank-pedal system is designed to be relatively stiff, so these losses are almost certainly well into the sub-watt range.

But where the Vector has an advantage versus spider-based systems is its access to left and right forces separately. Additionally it has direct access to the direction of force application, something systems measuring power further downstream miss. For example, the Rotor Flow and the Pioneer system have independent L-R measurement. But the Pioneer, for example, is relying on measurements of the L and R crank arm flexing. In most modern cranks, the L and R crankarms have substantially different mechanical properties, due to the presence of the drivetrain on the (typically, on a solo bike) the right side. It thus requires independent calibration of somewhat dissimilar systems. In contrast the Vector has the advantage that the L and R pedal spindle are essentially antisymmetric, and have nominally equivalent mechanical properties. Thus even in the presence of errors in power extraction, averaged over several pedal strokes, L-R balance should be relatively less affected, since if the rider has a symmetric pedal stroke then the errors on both sides should be the same.

SRAM's Quarq Elsa, a spider-based system, takes a bold approach to left-right balance. They assume that a full rotation of the spider can be divided into two segments: one dominated by the left foot, one dominated by the right foot. A super-naive approach would be to assume all work done during the foot on the downstroke. I doubt they do this. Another approach would be to assume during the application of peak torque on each half-stroke, the downward foot is dominant. Perhaps this is what they do. In any case, since a perfectly L-R symmetric pedal stroke would result in equivalent half-strokes, they view deviation from symmetry to be an indication of L-R asymmetry.

Now this reverse assertion obviously isn't the case. I could construct a machine which would apply force to a single pedal and yet deliver a nearly equivalent pair of half-pedal strokes. This would fool the Quarq into thinking I had perfect L-R balance, while in reality L-R balance would be 100:0. Nevertheless, the human body is not such a machine, and it seems conceivable the Quarq approach is good enough in practice with real human riders.

Similarly, just because the pattern of force application is asymmetric, the total power applied need not be asymmetric. For example, suppose my left foot delivers a smoother force pattern than my right. In this case L-R power may be perfectly matched, yet peak torque, or work doing during the L-foot downstroke, may differ. In this case, unlike the previous, Quarq may claim an asymmetry while Vector reports perfect power balance.

So what Quarq describes is an asymmetry of some sort. Vector describes a different asymmetry. Of course, Vector could report a metric associated with peak torque, as well. It has access to more detail then Quarq. By the time Quarq observes torque (and therefore power) the L and R pedals have been combined.

Now, for the first time, the public has access to real data from the Vector. By far the best widely available source is DC Rainmaker's. He rode a bike outfitted with an impressive 4 independent power meters: Quark Elsa, Stages, Powertap, and Garmin Vector. He did multiple rides, carefully correcting installation errors from his first, and combing the data streams synchronously with a WASP ANT+ wireless bridge. Then he posted his data online.

There's a lot to go through, and with my injury I don't have as much time or energy as I used to for this sort of thing. But here's an example of a piece of low-hanging fruit from that data set. I plotted the L-R balance of the Garmin Vector versus the Quarq Elsa. This is from Rainmaker's climbing ride he did towards the end of his test day, after he'd gotten installation dialed in:


The agreement is far from impressive.

So I think the conclusion here is, if you believe the Vector numbers, that the Quarq method fails to reproduce L-R power asymmetry.

This begs the question, however, as to whether the Vector numbers should be believed. Garmin claims 2% accuracy for the Vector. This is for total power. Errors in total power consist of correlated and uncorrelated errors from the individual pedals. If errors from the pedals are uncorrelated, this corresponds to a 2.8% accuracy for each pedal, with the accuracy improved by adding the two together by a factor square root of 2. If the errors from the pedals are correlated, then each pedal has 2% accuracy, and the errors on the two pedals are the same.

Correlated errors might include effects of temperature (since the two pedals are exposed to the same temperature), errors in cadence (assuming each pedal uses the same cadence numbers), and errors associated with symmetric aspects of the way the rider is applying force to the pedals (for example, pushing more on the outside of the pedal than the middle of the pedal), and deviations in pedal characteristics from assumptions in the modeling (for example, assuming some effect is linear when it is not perfectly linear, or assuming a force is parallel to a direction of motion when flex causes it to be not perfectly parallel, etc). Since riders tend to apply fairly symmetric power to the pedals it can perhaps be assumed that most of the error associated with specific ways in which the rider applies power are correlated. These correlated errors will affect the L and R pedals the same, and therefore will not affect L-R balance appreciably. Of course if each pedal were to determine its own cadence, cadence errors would be uncorrelated, which would be unfortunate unless you're riding Power Cranks where the two cranks can turn independently.

Uncorrelated errors include errors in calibrating the pedal, errors associated with uncorrelated aspects of the rider's pedal style, and variation in residual strain from the value extracted during the zeroing process (which involves keeping the pedals stationary without applied force, then backpedaling when the shoes are clipped in). It is the uncorrelated error sources which leads to errors in L-R balance.

So I need to partition the error budget into a correlated and uncorrelated component. Looking at these components, I think it's safe to say a significant fraction of the 2% error budge is correlated. I'll arbitrarily assume half of it. That leaves 1% uncorrelated error. This corresponds to each pedal yielding a 1.4% power error, considering only the uncorrelated component of the power error.

Pedal balance is, for example, right pedal power divided by the sum of the left and right pedal power. It's close to 50%. Consider the case where the right pedal measures 1% high. Then instead of 1.00 / (1.00 + 1.00) = 50%, I get 1.01 / (1.00 + 1.01) = 50.25%.

So the error in L-R pedal balance is approximately a 1/4 of the error in the difference of L and R power, to first order. The error in the difference of L and R power is the square root of 2 times the individual pedal errors, which is the square root of 2 times the total uncorrelated error, which I am assuming is 1%. So the error in the difference is on order 2% of each pedal's individual power. I therefore conclude the net error in L-R balance is around 0.5%. So if it says 50.0%, the reality could be between 49.5% and 50.5%.

Given these error bars, it seems the difference between the Vector and the Quarq is quite significant. It also suggests that DC Rainmaker's asymmetry is real: he's pedaling slightly left-dominant. But it's fairly close. I can't have enormous confidence that DCR's not perfectly symmetric. I can conclude the asymmetry reported by Quarq is not a power asymmetry.

Thursday, August 1, 2013

Attractivity Classification: Tour of Poland Beta-Fail

It's a general principle in cycling that riders, or teams, shouldn't be punished for riding faster. Cycling's a fairly simple sport: faster = better. Or at least it has been until the 2013 Tour of Poland.

Today's stage was the first one where it really rose to people's attention. Here's the CyclingNews results. The "attractivity" ranking is the sum of points for intermediate sprints and KOM points. Riders are ranked on these points, with first place getting a 30 second deduction, second place a 20 second deduction, and third a 10 second deduction from overall time. Ties are resolved optimistically: if two riders tie for first, they each get 30 seconds, they don't share the points for 1st and 2nd (which would be 25 seconds each), which would make more sense.

But that aside, the use of the daily ranking to assign points makes for some non-obvious strategies. Consider the case where rider A ahead in GC. Rider B is 29 seconds down on GC. Rider C, teammate of rider B, is out of GC contention. There's two sprints in the stage, worth 3, 2, and 1 point for the top 3 places, and no KOM's. Rider Z is also out of GC contention.

First sprint goes as follows:

rider B: 3 points
rider C: 2 points
rider X: 1 points

Then comes the second and final sprint. Rider C gets in a break with two other riders, Y and Z, both teamates of rider A and neither in GC contention, while rider B, C's teammate, is in the main pack with rider A. Rider B needs to win the attractivity ranking to take the GC lead (assuming he isn't able to finish with a time gap on rider A). But his teammate, rider C, is in the breakaway, and if rider C finishes first or second in this sprint, he will relegate rider B to at best 2nd place. So rider C needs to finish third in this sprint. However, riders Y and Z want him to finish 1st or 2nd to preserve their teammate's GC lead. So the result is the three riders in the break go into a track stand on the sprint line until they are overtaken by the pack.

This is an extreme example, but it highlights how the new rule, by virtual of its complexity, can result in bizarre tactical scenarios. Really, the UCI has better things to do than to turn stage racing into a mutant points race, and if the problem I described wasn't as obvious to them as it was to me, then the UCI needs new management. But in the least, this attractivity ranking needs to go.