Of course, power meter output for the same input power should not vary between rides. But how would anyone check without having access to a calibrated pedaling robot?
One-on-one comparisons between different power meters are always difficult to interpret. They might perfectly match (which they usually won’t) and you’d still not know whether they are right. More often, they will differ and you won’t know which one is off by how much, just that at least one of them must be off.
Using scatter plots as described in an earlier post
we can use linear approximation to get a single value describing the difference between two power meters. We can also do the same for left and right pedals each, although, keep in mind that spider-based meters like the P2M (or hub-based meters) will only provide an estimate.
The inclination of the linear approximation is not a 100% statistically correct way of comparison here; there are issues like whether zero-points should be left out, whether the approximation should go through the origin of the axes, whether it should be linear etc, BUT, it provides a way to look at the overall trend.
I collected the data over several rides, some of which were taken on the same day, each ride starting with a zero-offset calibration of both meters and got this:
The total instant power of both meters differs on averageby 3.5%, which could be within the manufacturers claims. The fact that it may differ by up to 6.4% is less so, especially considering that this data was taken in a climatized indoor environment without large temperature changes. Both meters measure left power and right power independently: The left meters match almost perfectly by a difference of only 0.4% on average, although s.d. is a bit larger; the right meters agree less and differ by 6.3% on average and 14.1% maximum (note though that after both rides differing by more than 10% the second ride agreed much more, which might show some kind of conversion, though, the left side is showing the opposite trend… ). Although both meters report different values for instant and average power, their comparison result is similar.
Comparison of Vector2 and P2M shows that, for total power, the variance between rides is slightly less than in the previous comparison, although the difference is slightly larger, 7.3% on average, the Vector2 always reading 4.5 or more percent higher, which is not nice. We also see that the estimated left/right balance is pretty stable over rides, although that doesn’t have to mean anything. We also see that, while the left vector pedal matches almost perfectly, the right vector pedal is again reading a higher power value than the power meter it is compared with.
Conclusion:
Even calibrating immediately before a ride will not eliminate bias beween power meters. A difference of up to about 5% doesn’t seem unusual, at least for the compared pairs.
The right Vector pedal used in this comparison reads to high compared with both the Pioneer and the P2M. Note that the Vector had been calibrated using a known weight just days before, and the right Vector was scaled to about 98% of its original output value. I may need to talk with their support about this issue.