Thoughts on choosing pedals, shoes and insoles

A few thoughts from my experience using SPD, SPD-SL, Look and Speedplay pedals and shoes.

Clipless pedals

From an biomechanical viewpoint, a low stack height is likely more efficient for force transfer from the legs to the pedals, one reason being that it will require less force to stabilize the foot while pedaling. From a bike fitting perspective, a cleat that is easier and more systematic to adjust will make it easier to find the best cleat position. Both conditions are best met by Speedplay pedals, especially with 4-hole shoes (for low stack height without using 3-hole to 4-hole adapters), and with it’s independent lateral and longitudinal adjustments and independent inside and outside float limiter screws.

An unpredicted merit of the Speedplays that I experienced was their smooth float which almost feels like zero friction. Though this would seem to require additional muscle force to stabilize the ankles against unwanted rotation, what I noticed was that I was actually applying more unneeded force with other pedals and using Speedplays helped me find a different, more relaxed and probably more efficient pedaling style. The conventional Speedplay cleats are bare metal and extremely slippery, but there are also some newer “aero” cleats that claim to be walkable. Also note that Speedplay cleats can collect and block with dirt; covers are recommended when walking.

SPD cleats are least slippery as they fit in the center surrounded by sole rubber. In order to really be comfortably walkable, the soles should – unless you are fine with the hard soles of trekking boots – flex at least upwards (like a “U”) but not in the other direction (which would reduce downward energy transfer). Pearl Izumi’s X-Project MTB shoes were the only ones I could find so far that met this condition.

The other pedals – SPD-SL, Look – I found similar in ergonomics. Also note that in situations and for some kind of pedaling practice, platform pedals can be more useful than clipless pedals.

Cleat holes

I have just slightly longer toes, which requires choosing slightly larger off-the-shelf shoes than otherwise needed. If the cleat holes don’t reach far enough back, the cleats will be too forward, putting too much strain on the calves just to stabilize the foot angle during pedaling. Some manufacturers (like Shimano) have holes that reach far enough backwards, some (like Northwave and Pearl Izumi) don’t. (Some people argue that for constant, lower cadence pedaling, mounting the cleats backwards in the center of the feet is more efficient.)

At some time in the future I will probably get completely custom made-to-order shoes. Until then I will have to continue drilling my own cleat holes. Shimano SPD-SL and Look have long holes in the cleats and only require round holes for the soles, which are easy enough to drill. SPD cleats adjust only sideways and require long holes for longitudinal adjustment; at least they require only two holes which are close-by. Speedplay has some extenders that allow more adjustment than their regular 3-hole to 4-hole adapters, but using adapters increases stack height. If you don’t want to use adapters, they will require a lot of holes to be drilled that are also farer away and therefore more difficult to place exactly right, and, in order to be adjustable, all those holes need to be long holes. In this aspect, Speedplay can be a nightmare.


The Northwave S.L.W wire dial system is nice as it can also be loosened in single steps. The more common B.O.A wire dial system can only be unlocked. Ratchets can be loosened while riding but often not in single steps. All of them are easier to adjust exactly than velcros though it may be debatable whether they are worth their cost.


I tried a lot of shoes and the heat-molding Shimano insoles (and shoe upper parts) have been unbeatable. If my feet were closer to what shoe makers think is average, I’d probably not need these. If I had some knee issues etc this would probably not be enough and I’d need to ask a professional to get some customized insoles

Pedal reflectors

It seems most roadies prefer being killed than adding pedal reflectors to their bikes. Shimano has optional reflectors for most of their pedals. Northwave has some shoes which are really reflective.

Pedal-based power meters

Garmin’s Vector 2 can be fitted into Shimano Ultegra pedals. They have a pod hanging on one side of the cranks that could be damaged during transport or off-road, less so on-road. Some of their advanced metrics and the way they visualize the data on their Garmin Edge computers are not very useful, but I find at least torque efficiency to be useful when adjusting my position. The Powertap P1, while convenient to swap between bikes, have so much stack height that I have to adjust saddle height and change my pedaling style and are not user serviceable meaning they have to be send back for maintenance which is costly and will leave you without your power meter. Their refreshed iOS apps visualizes some of their pedaling analysis but is not very legible. The P1 might be the only current pedal meter that could visualize or record foot angle – at some time in the future.

Combinations I currently use

  • Shimano RP-9 for use with Garmin Vector 2 power meter fitted to Ultegra pedals

Somewhat boring combination but fits well and reflectors make me more visible at night. Downsides: Pods. Not walkable.

  • Shimano SH-R320 with Look cleats for use with Powertap P1 power meter pedals

Easy to attach to rental bikes to get power data. Downside of this combination: a lot of stack height, no reflectors, power meter transmission unreliable when riding on trainers in hotel rooms that have a lot of wifi interference.

  • Pearl Izumi X-Project 2.0 shoes with Shimano SPD cleats and Shimano MTB pedals

Good: relatively comfortable to walk in. SPD pedals can be fitted with reflectors, although that makes them single sided. Best for touring when you also need to walk. Downsides: Cleats available only with 6 degree float – would love zero float for these. Can be difficult to release when climbing at extremely low cadence, though this might just be me. Pearl Izumi cleat holes don’t reach back enough.

  • Speedplay zero with Northwave 4 hole-compatible shoes

Good: shoes are really reflective. The Downsides: Drilling that many cleat holes can be a pain (if the original ones don’t fit).

Future plans

A few years ago I tried Bont shoes at a shop but without molding them and they didn’t convince me then but I might try them again as their form might better fit longer toes and they have heat moldable soles and uppers. Beyond that, semi or full custom shoes made using molds of your feet can be had from Bont, Rocket7 or Riivo and cost somewhere between USD1000 and USD1500. Anyone with experience in those? Other than that, with all the 3D scanning and printing technologies being developed now, fully custom might get a lot cheaper in the near future.

A few thoughts on choosing power meters

When I bought my first power meter – a Power2max, later renamed “classic” to distinguish from subsequent models – some four years ago, I mainly depended on DC Rainmaker’s blog to guide my choice. In the days since, I have tried five more power meters – a Pioneer, two Power2max (S and NG), Vector 2 and P1 – out of engineering interest. The first one now has been retired because of unexplainable spikes that appear only outside when going downhill, but on the way, I learned a few things that are not often mentioned.

Accuracy and Precision

I’ve written about this in some other posts already so I will spare the details here and just summarize: While most manufacturers make some accuracy claims about their products, there is currently no standard about how accuracy is measured. This means that a statement like “within 2%” doesn’t say more than there was at least one testing condition under which it showed less than 2% error in comparison to some reference about which we don’t know much. and isn’t comparable between manufacturers. No manufacturer I know makes statements about the delay of its power meters and crit racers seem to complain when they accidentally switched to a power meter with more delay than their previous one. If the “delay” is same for power in- and decreases it wouldn’t change the overall data, but often those characteristics differ and together with some built-in averaging short bursts can get swallowed.

When comparing power meters, graphing them separately on a power vs time graph of averaged power only allows for rough visual checking, meaning that you’d notice only if it’s really far off – and it can be difficult to spot trends. Using a peak power graph doesn’t make a scientific comparison either – it will show how much a peak power graph will be off but not much more.

A scatter plot graphing the output of one power meter against that of another is slightly more scientific as it makes spotting trends easier although you need to be aware of the fact that the sampling isn’t synchronized so the dots won’t line up perfectly – still how they distribute can give some hints about the behavior of each meter. If you draw a trend line for all dots and that line has an angle of 45 degrees, you know that both meters are accurate relative to each other. If the line is more like a curve, you know that at least one of them is not really responding linearly to your power. If all dots are very close to the trend line, both meters have high precision (i.e. small random noise) and a similar delay. If really a lot of the data points are far from the trend line, it’s probably not just a sync issue or a delay issue but really noise in the data of at least one of the compared meters.  A (cumulative) histogram of the power difference between two meter can also tell more about the meters than a time sequence comparison.

A potential problem with left/right separate meters is that their left and right gauges cannot be calibrated against each other, so there may be a bias in the reported balance that goes unnoticed. A combined meter that reports pseudo balance based on down- and upstroke cannot report the true balance but does not suffer from separate calibration; on the other hand, a biased sensor for crank position could be, though less likely, a problem. Ideally one would combine separate and combined strain gauges and calibrate the three sensors continuously against each other.

The ANT+ power meter protocols – there are many depending on power meter type (i.e. mainly location) complicating things – allows for sending data multiple times per second and SRM used to utilize this in both their power meters and head units, but recently they seem to have dialed back on this feature. More frequent data points may help getting a more accurate picture of short bursts.

You will probably use several different power meters over your lifetime and sometimes you will want to look at long term trends, so, while some argue that precision (meaning little random noise) is more important than accuracy (the average having no bias from true value), I think that’s only half the truth. A power meter really should be both precise and accurate. But by how much?

In my experience, over the long term, the difference between my power meters falls within 2%, but for single rides the difference could be 5%, or in rare cases even 10%. That’s a lot. So, don’t believe or depend too much on either precision nor accuracy nor (lack of) latency.


In rotational systems, power is calculated as the product of torqueτ and angular velocity ω. Both vary over a crank rotation and need to be measure multiple times. Angular velocity is commonly sensed using acceleration sensors and an optional magnet to improve accuracy. Torque is sensed by strain gauges that measure the length of the underlying material. That length changes due to thermal expansion and applied forces.

The first calibration step is a zero reset, i.e. determining the readout of the strain gauges without any applied weight at the current temperature and deals with thermal expansion. This is usually enough for day-to-day calibration.

To be complete, zero reset should be followed once in a while by checking how the material reacts to applied forces. Due to structural changes in the material – for example metal fatigue in crank arms – and chemical changes in the glue that attaches the strain gauges or some errors during production that affect long term durability, this might change from what was measured and stored in the power meter during factory calibration.

A few power meters (like SRM, Garmin and Pioneer) can display on their respective head units either the currently measured torque or force. This means that you can hang your own static, calibrated weights to check accuracy and then store a linear correction parameter if necessary. This is generally good, but depending on the characteristics of the inaccuracy, a single linear factor might for example undercorrect for some power ranges while overcorrect for others and not be sufficient to correct completely. If the power meter/head unit does not support this, the best you can do is comparing with another simultaneously mounted power meter and on suspicision sent in to the manufacturer, but that means you won’t have your power meter for a few weeks every few years.

Temperatures may change during a ride. One approach is to trigger automatic zero resets in moments where there is no force applied. Obviously this can go wrong if the power meter misjudges whether or not there is force. The other approach is to have a built-in temperature sensor and a look-up table so that it learns over time what the zero reset values for any temperature should be so that it can choose values from the look-up table.

Advanced metrics

While some won’t stop noting that advanced metrics haven’t been proven scientifically to benefit cycling performance or only in rehabilitation after some injuries, it’s mainly a question of which metric, what purpose and the capabilities of the user in utilizing the data. If you love trusting and doing what your coach or some book tells you and trust your LBS or fitter with finding the best position for you, then you might not need any of this.

But if you are scientifically minded and think that just using your body and not your brain defeats the purpose of being alive, don’t let yourself be discouraged from choosing a power meter that will give you more data, though of course you have to decide which data you want/need.

The Pioneer pedaling monitors can record and visualize not only tangential force but also radial force though only on their own head units and support some interesting visualizations on their cyclosphere portal. This is somewhat similar to what some bike fitters use and can be really useful when optimizing your position on your bike. Unluckily exporting that data into other formats for your own analyses is not supported.

The torque efficiency supported by left/right separate power meters like the Vector and P1 pedals and the Pioneer in ANT+ compatibility mode has been useful for me as well, while the power phases of the Vector and pedaling smoothness less so.

P1 supports some potentially useful graphical visualizations on their iOS app but the readability and post-ride analysis support leaves a lot to desire at this moment, although it seems they are still working on this.

Torque efficiency can usefully be measured on a left/right separate system. The spider-based Power2max NG reports a combined torque efficiency which I haven’t figured out yet how to use.

Verve and Rotor could be worth looking into but require PC’s for real time analysis and, I believe, recording.

Single leg drills

The P1 didn’t work when pedaling right only as their left pedal which sends out the combined ANT data used to power down. In my most recent ride I noticed it seems to report power to a Garmin Edge during right only drills though cadence disappeared.

Sitting vs standing

The Vector and the Pioneer can differentiate between sitting and standing pedaling when combined with their respective head units. (I still need to check whether that works on a turbo trainer as well.)


There are some conceptual and usability differences depending on the location of power meters. A spider, crank or pedal-based system can be less accurate at low cadences, while a hub-based system can be less accurate – I thing, but I might be wrong with this – at slow wheel rotations. If you are only interested in tangential forces, the Rotor in-power left side crank with strain gauges in the BB axis is possibly the best location for the left side, followed by spider and hubs for combined measurement. Pedals are a difficult location to engineer well, and cranks can be difficult if not designed specifically for power metering as with Verve cycling’s Infocranks. More obvious is the difference in measuring before power train loss (using spider, crank or pedal-based meters) or after (using hubs) – and the power train loss can be significant.


A crank or spider is naturally maintenance free except for checking and if necessary calibrating long term changes over several years. The Vector pedals can be serviced by the user to the point of swapping the internals into Ultegra pedals using their optional kit.

The Powertap pedals and hubs needs servicing to be done at the manufacturer – especially the pedals are a really complicated construction. Pedals are never maintenance free, so you should think twice before getting some that need to be send back for maintenance or at least include fees (the pedals have a two year warranty except for the bearings which are covered only fir six month) and downtime in your cost projection. The issue then is that they don’t have a clear maintenance cost table (yet).


Waterproofing can be an acute issue in heavy rain and a corrosive issue over the long term. Ideally you’d have a hermetically sealed system with inductive charging and wireless firmware updates. Unluckily, that’s not available (yet) so we have to live with compromises.

Some users seem to have problems with the P1 acutely in heavy rain and with humidity over the long run; I myself have occasionally noticed condensation in the battery compartment. (Note to P1 fanboys: No, I have never done a battery change in the rain.) The contact between the Vector pods and pedals could potentially be an acute issue in heavy rain. Same with the design of the rubber band sealing on the battery compartment of the classic and type S Power2max meters though I believe their electronics is sealed. Their newest “NG” charges over USB and the rubber cap is flimsy but I believe the USB port itself is water proof – although USB ports are known to corrode over time if kept wet.


The Vector pods are a potential liability, not just off-road but also when transporting the bike and I really wish for some design changes. The Pioneer magnets (which are not simple magnets but a less common design and required in pedaling monitor mode) can be stripped from the frame when the chain drops – and stick to the chain as I experienced. If you don’t notice that when fixing the chain, I imagine it might play badly with your rear derailleur when you continue riding.


Connect IQ app description: Live Peak Power Graph data field for Edge

I have been using a few self-made Connect IQ apps on my Garmin Edge cycling computers for over a year now as it doesn’t take much to write a few lines and side-load them onto one’s Edge over USB to try out. Debugging is less convenient than, say, when developing a PC application in Visual Studio, but compared with the early days of Connect IQ it has become much more stable and enjoyable now.

One of my apps – the one above – has become my go-to data field for indoor rides. It helps me judge how hard I am pushing myself during a training session compared with my past performance providing motivation to go hard where meaningful – but also to go slow when recovery was needed. I do believe it helped me reach my goal of 2016 – a FTP of 4W/kg.

In short, it shows the Peak Power of the current ride from start until the current moment, together with all-time Peak Power and a graph of live average power values. To my knowledge, this is the first time this has been implemented in real time. A worlds first!

As this takes up all of the screen of the Edge 520/820, I have also added some of the other basic information onto the same screen.

(Don’t ask me why in this screen shot speed is zero and cadence is an aggresive 186 rpm while I am putting out 96W: These screenshots were captured in the simulator software which allows convenient debugging on a PC, so no one was harmed by that cadence. Still, of course, you could ask Garmin why they think these values make sense – but maybe they thought it’s important to test especially those rare freak cases that should not occur?)

I’ve now put that data field on Garmin’s Connect IQ app store for anyone owning a Garmin Edge 520/820/1000 cycling computer to play with. As with most Connect IQ apps, it’s free for everyone to download and use. The concept behind it, namely displaying and using Peak Power live while riding isn’t common – yet – and possibly slightly more mathematical than the general public would appreciate, so, here are some simple explanations of the thoughts behind it and a few hints about how to use it.

Peak Power

Peak Power (sometimes also called Maximum Average Power or Maximum Mean Power – note that average and mean have slightly different definitions in mathematics and if I had to choose now I’d vote for the former) is the largest average power for some specified interval duration. For example, if you take all possible 20 second intervals of a ride, i.e. from 1s to 20s, 2s to 21s, 3s to 22s, etc, calculate the average power of each, and then take the largest of those, that number is the 20 second Peak Power of that ride. If you are interested in your sprint capabilities, you’ll look at short duration peak power; if you are interested in your endurance, Peak Power over longer durations will give you some guidance.

One well known example actually is Functional Threshold Power (FTP), which is not an invention but simply a naming for 60 minutes Peak Power. If is often approximated by multiplying your 8 minutes or 20 minutes Peak Power with some constant to correct for the fact that you would be stronger for such short intervals on the assumption that that constant would be same for all cyclists.

It is sometimes useful to look at the long term changes of some specific Peak Power to capture a hint of long term trends.

The Peak Power Graph

If you calculate Peak Power for all past rides for all possible intervals from 1s up to, say, 1 hour, and plot them on a graph where the horizontal axis starts from 1s on the left and goes up to the longest interval on the right, the Peak Power will form a decreasing curve.

Peak Power graph in Today’s Plan
Peak Power graph in Trainer Road

The Peak Power graph shows a lot more information than a single Peak Power value as it allows to judge, for example, what your specific strengths are, whether you are more a sprinter or an endurance athlete, whether you should add strength training or longer rides, whether you are making progress in what you are currently targeting, etc.

Of course it does not show everything. For example, this graph does not show how fast you recover from an effort and how often you can complete an interval before exhaustion. There is also not one absolute truth about how a Peak Power curve of, say, a sprinter should look like, but Velo magazine from August 2015 for example suggests these examples.

from Velo Magazine Aug 2015

Another way to look at the Peak Power graph could be to divide the interval lengths into zones depending on the dominant metabolism, i.e. the main physiological systems contributing (or limiting) the effort, of the three metabolic energy pathways:

  • The phosphagen system that regenerates ATP (adenosine triphosphate), the energy source of all physical activity, from stored CP (creatine phosphate) without using either carbohydrates nor fat.
  • Gycolysis, which converts carbohydrates from either blood glucose (sugar) or muscle glycogen (stored in muscles). In the presence of enough oxygen (aerobic), an intermediate product called acetyl coenzyme A will be further processed in to more ATP, but if oxygen supply is insufficient (anaerobic), it will be converted into lactate.
  • The Aerobic System uses everything, blood glucose, glycogen and fat as fuels and is most efficient (producing 18 times more ATP from each glucose molecule than anaerobic glycolysis) but is the slowest.

These three – or four if we divide Glycolysis into anaerobic and aerobic – roughly map to interval duration as follows:

  • 5 to 10 seconds: Neuromuscular – Phosphagen system.
  • 30 seconds to 2 minutes: Anaerobic Capacity – Anaerobic glycolysis.
  • 3 to 8 minutes: VO2 max – Aerobic and anaerobic glycolysis.
  • above 10 minutes: Endurance – Aerobic system.

These mappings are obviously just rough guidelines and neither clear cut nor exactly same across individuals. Still, when I look at my peak power data in comparison with other cyclists – more about those comparisons later on – anaerobic glycolysis seems to be a weakpoint of me while the other system are average. And anaerobic capacity is known to decrease faster than aerobic with age, so, while this could be simply due to age, I also should do something against this.

The results of my last test of the respiratory exchange ratio (or RER) in which one cycles on a trainer in a step protocol with a mask connected to an apparatus that measures the amount of oxygen consumed and carbodioxide generated paint a slightly different picture. I found out that I already have a high RER of about 0.9 at rest which increases to about 1.0 between 50% and 120% FTP and then further to about 1.1 above 130% FTP, coinciding with a generally high level of lactate. From the chemical equations corresponding to the chemical reactions we learn that RER would be close to 0.7 for purely burning fat and 1.0 for purely burning carbohydrates, so, in my personal case, the Aerobic system is rarely really dominant.

The beauty of the Peak Power curve I think lies in the fact that it shows the pure data without all those assumptions and interpretations that are based on generalization that might or might not apply to you. It’s also free of rules of thumb that might have just accidentally survived. Of course it’s not omnipotent – it does not show everything. For example, this graph does not show how fast you recover from an effort and how often you can complete an interval before exhaustion.

Peak Power of a training ride – for post-ride analysis

Some tools visualize the Peak Power of a training ride – after completing and uploading the ride data – in comparison to all-time Peak Power. This can be useful to judge how good you have performed in the last training session, whether you have indeed pushed yourself to your limit or even bested a personal record – or whether you sucked. Or may be it was just a tame regeneration ride.

Peak Power of a training ride compared with all time Peak Power in Cyclo-Sphere
Peak Power of a training ride compared with all-time Peak Power in Today’s Plan
Peak Power of a training ride compared with all-time (and monthly) Peak Power in Cycling Analytics

Live Peak Power of current ride

If the Peak Power of the current ride could be visualized while riding and in comparison to all-time Peak Power, it could be used not only for post-ride analysis but to adjust the riding immediately.

How did I perform in my last sprint? Did I achieve a new personal record? Or do I need to try harder? May be even give up and re-aim this ride as a regeneration ride? How was the new warm-up protocol I tried today? Did it allow me to perform better already, or should I add a few minutes of warming up at low power?

After a while of thinking, I found an algorithm to implement the calculations in a way that would fit even into the somewhat arbitrary 28KB limit that Garmin has for Edge 520 data fields and that would run in realtime even with the limited computing power of that device. It does sometimes skip a refresh cycle and not update every second, may consume more battery power than other data fields and sporadically fail to start at all, but overall it seems to work:

The horizontal axis is the length of the interval from 1 second up to 3 hours. It’s an logarithmic scale as is common with Peak Power graphs, meaning that the resolution is higher for short durations – e.g. 1 second increments below 10 seconds – than for long durations – e.g. 1 hour increments above 1 hour, so that you can see details where needed but also have an overview of the long tail.

The vertical axis is a normal linear scale, again as usual with Peak Power graphs. On the left, power is given in W, on the right, power-to-weight ratio is given in W/kg. For this to work, your body weight has to be set correctly in your rider profile.

I also noticed that I should make it clear when I improved a personal record, so I decided to color the area between the previous all-time record and the new record in red, as shown here:

That area can be rather small and difficult to notive if the improvement is, say, 10W out of 1000W or 1%, so I have now changed the app (V0.3.30 from Feb. 1st, 2017) and added coloring of the area below the all-time record (in light red) as well, as shown in this screenshot. This coloring works in two steps to indicate if you are close to a new record by changing the coloring from beige to light orange at 90% of the all time peak before changing to light red at 100%.

Graph of Power Averages

As soon as I started using an earlier version of this data field showing my current as well as all-time Peak Power, I noticed that there was something missing. What is my current performance compared with the current Peak Power? Am I on my way to improve on a personal record or should I bail out? How long have I been maintaining the current power level – both when pushing hard but also to judge when to end a refresh break between hard intervals?

I found that many of these questions could be answered by adding a curve showing average power values for all interval durations. On the left, it would start with the current instant power, which is the average for 1 second, and continue rightwards to the 2 second average, 3 second average, and so on, until the 3 hour average on the right end of the graph. Whenever this curve overlaps with the Peak Power of the current ride, it would mean that you are currently riding at the Peak Power of the current ride and about to improve that section of the Current Ride Peak Power if you push a little bit more. If you are far below the Peak Power of the current ride, you will need to build some momentum first before having a shot at improving the Current Ride Peak Power.

Statistical reference curves for Peak Power

Looking just at your own Peak Power might be useful, but wouldn’t it be even nicer if one could also compare with others? Cyclinganalytics provides a nice rainbow colored graph to judge where one stands in comparison to all other Cyclinganalytics users.

The gray line shows my peak power while the rainbow shows the zone of all other users from bottom (in blue) to top (in red). So, this graph is showing that I am pretty average for short durations … and suck for any interval longer than a minute … if we compare by raw, absolute power. While absolute power is significant for speed when riding on flat roads, it also is a measure at which heavier riders fare much better than light weight riders like myself and even better than they’d do on a real flat road. When going uphill, the power-to-weight-ratio makes more sense as a performance measure as that is the physical metric that determines climbing speed – even though one could argue that that disproportionately favours lighter riders. Still I believe that power-to-weight ratio is the somewhat “fairer” way to compare athletes of different body size. The power-to-weight statistics for is significantly more favourable for me, as I now seem to be pretty average for all durations above a minute and slightly – or even significantly better between 10 and 30 seconds.

Instead of using actual statistical data of a large population, I turned to a famous table that’s also often quoted on the internet and looked up reference data about typical relative Peak Power for cyclists racing in Categories 1 to 5 and created some approximation curves to fill in the gaps. Obviously, these are very rough reference guides as there will be a lot of differences between riders in the same category and there really is no typical Category X cyclist. And of course these curves do not change the fact that peak power does not capture how quick one regenerates from an effort and becomes ready for the next break away. Still, I found these reference lines very useful, both to provide motivation to improve my relative weaknesses (from around 1 to 3 minutes) but also to further enhance my relative strengths (like between 10 and 30 seconds). The app will read from your profile settings whether you are male or female in order to adjust the reference curves. (I admit the approximation does not adequately model the flattening of the curve at very short durations – this could be improved later.)

App configuration

It would be really nice if this app didn’t require any configuration. In some way it doesn’t: if you do all rides with it, it will update the historical Peak Power curve after every personal record and require no configuration except for setting your body weight in your user profile.

But if you have just started using this app, it will take some time until the historical Peak Power curve becomes useful, and in the long run, you’ll probably do some rides without it even if you became a big fan of it. It would be nice if you could just connect it to the data accumulation site you are using and let the data field download all data via a connected smartphone – unluckily Garmin does not allow data fields to do that, and I am still divided whether to turn this from a data field which is easier to use within your training screens into an app.

The current solution is to use the settings screen in either your Garmin Connect App on your smartphone or Garmin Express app on your Mac/PC, to write or read the currently set values of your historical Peak Power. It requires 25 numbers in total, one each for 1-2-3-5-10-15-20-30-40-60-90 seconds and 2-3-5-10-15-20-30-40-60-80-100-120-150-180 minutes. You should be able to get most of those numbers from your TrainingPeaks, TrainerRoad, Cyclo-Sphere, Today’s Plan or Strava Premium account pages or Golden Cheetah – or whatever else you are using. Please approximate if your source does not supply data for some of the intervals this app is using.  Note: Depending on the sleep settings of your Edge, it might enter sleep mode before you have finished typing all 25 numbers and saved, in which case you might lose your effort. So I recommend to save once half way through.

Peak power graph of a time period

Many sites also allow comparing all time Peak Power with Peak Power of some time period, like the most recent month or year, in order to judge how you are trending. I don’t think I haven’t seen any site offering comparison with the same month of the previous year, which is a standard comparison in economics to look at yearly differences for something with a seasonal bias and could be useful to cylists too who structure their training with on- and off-seasons over the duration of a year.

I have not yet found a way to implement such in a data field, due to memory limitations for the Edge 520 and Garmin not allowing internet communication over connected smartphones for data fields. It might be possible if I turned this data field into an app – may be sometime in the future.

Training Peaks
Cycling Analytics


Peak Power curves usually monotonically decrease – that’s the mathematical terminology for a curve that is either horizontal or sloping downwards to the right – with increasing interval duration but, a bit surprisingly, they don’t have to. Consider the weird but not impossible sequence 0W, 100W, 0W, 100W, 0W. Peak Power for 1s, 2s and 3s are 100W, 50W (average of 0W, 100W or 100W, 0W) and 66W (average of 100W, 0W, 100W) respectively with an increase from 2s to 3s.

The discontinued Pioneer SGX-CA900 cycling computer had a CP curve display but without the curve of averages which, I believe, really transforms these curves from a more post-ride analysis visualization to a tangible, actionable, real time display that reacts instantly to the way you ride and enables you to adjust.

I am not a big fan of using FTP. Being just one number it is about as simple as it may get and I myself use it when I set rough annual goals – because annual goals should be rough and leave room for correction. The FTP value alone does not say much – two cyclists with same FTP might perform very differently at 80% FTP or 120% or even at 105% FTP. Many cyclists will not be able to complete an hour at the FTP that was calculated from a 8 or 20 minute interval, while others might totally exceed. When FTP was proposed, which was pretty much still in an analog world, it may hit a good balance between complexity and modeling accuracy. But now in the age of big data analysis I think it’s about time to replace FTP by something more complex but also more scientific. Some might even go as far as saying that all the zones calculated in percentages from FTP and structured training programs that are scaled by a single FTP multiplier are nonsense.

I have also tried W’ which aims to model fatigue. As useful it would be to be able to calculate time to exhaustion or number of matches burned, this again is overly simplified and, in my experience, pretty much useless as the model does not mostly not fit whatever the choice of the – just – two parameters and without the ability to incorporate daily condition.

One might think that using live physiological data to generate and continuously correct a more complex model might be a solution. Unluckily it seems the human body is very complicated and the data we can measure is still very limited. Of what value is the easily measurable heartrate for an exact mathematical model of the cardiac output – which is what we are really interested in – without the much more difficult to measure stroke volume? I have been told that as a rule of thumb, SV is constant over a large part of the heart rate range and the linear HR-CO relationship only deteriorates at very low and very high heart rate – still, how low, how high and how constant are those limits for me? As informative as it is to use physiological measures and learn about the body, it seems we are still years away from being able to capture a sufficiently complete picture.

This app is my attempt to visualize that what we know – minus power meter accuracy – instead of hopeful and often practical but in the end illusionary ideas of what we hope we know but in reality do not. Of course this means that you have to do the interpretation yourself and you will need some experience with that. On the other hand, grasping relations and interpreting is something that the human brain might actually be good at. In the past year I think this data field has served me well to get a better understanding for average and Peak Power, both intuitively and consciously, which helped me to reach the training goal I was going after.

I should note that I do not recommend pushing yourself to your limit too frequently, even if you now have a tool that makes it easy to do so. Running marathons in your daily training would wear you down pretty quickly, like, in one day – chasing down personal records every day can be fun but is likely more stressing than is useful. In the extreme case, there have even been reports of athletes who had to give up sports due to scar tissue in their heart muscles likely caused by repeated overreaching and there is some speculation that the never seen before prevalence of cardio training could lead to a severely damaged, overstrained, unhealthy elderly population in the near future. My wish is that this app help making better decisions, and that should not always mean going to your limit.

But of course, it can be a huge motivator to best some of your records – of which there are plenty if you have the whole Peak Power curve from one second to three hours at your disposal – every few weeks, and one possible use of this data field is to judge whether today is the day to go for it.

Thanks for reading – enjoy the app!

Using comparative statistics of cycling power to guide training directions

Races are not my thing. I wouldn’t exclude the possibility that some time in the future I might change my mind and start participating in races – not that I am a believer in the stupid “never say never” movement, just open to changing my mind if given new evidence – but at this moment I’d categorize even a more deterministic competition like time trialling to be effected too much by random circumstances to make it worthwhile spending any time on it. Not that I’d be so naive to think that hard work should be always be rewarded. It’s more that I’ve never been happy winning something by luck.

How then to compete, not just with yesterdays self, but with others, beyond reaching for the occasional KOM or Sprint jersey in Zwift while training? How would I judge my strengths and weaknesses compared to other cyclists?

Answer: By power data statistics.


  • Cycling analytics (

Cycling analytics allows to overlay the peak power distribution of all cycling analytics users (of same gender) to your peak power graph, which looks like this:

CyclingAnalyticsThe rainbow colors represent the distribution of other riders and the gray line the own performance level, where higher on the vertical axis means better. What we see here is that, if we look at power weight ratio, I am generally pretty much average or slightly above, with a strength in short intervals from 10 seconds to up to 2 minutes. The “Rank” is the rank within the distribution where 0% is bottom, 50% median and 100% is top; my rank ranges from exactly 50% (i.e. pretty average) for 8 minute intervals to 87% (i.e. almost-not-able-to-improve-much good) for 20 and 30 seconds. So, from this data it seems clear that there is plenty potential to improve my endurance whereas trying to improve on short intervals might not lead to much further improvement. Or, if I’d start racing (and wanted to be successful), I might want to find a competition where short sprints are relevant. The colored, bottom-most “Rating” is a score for the riders relative strength/weakness when comparing different interval lengths not across population but within the rider.

Cyclinganalytics is free for one time analysis but requires a small monthly fee if you want to store and accumulate data.



  • Cyclo-Sphere (

Cyclo-Sphere recently added a “Ranking” function where you can compare 20 minute max average power (or MMP in short for I believe Mean Max Power in their terminology) and – if you are using the Pioneer power meter – pedaling efficiency. In it’s simplest form, it looks like this, which is essentially a vertical cut through the CyclingAnalytics graph at 20 min on the horizontal axis showing the actual sample size on the vertical axis, with the bar that includes me colored in yellow. Note that, here, the percentage ranking counts down from top (i.e. smaller = better), opposite to what CyclingAnalytics does. And in the bar graph, more right = better.

CycloSphere PowerWeightRatio Ranking


Further, we can select a scatter plot, keeping power weight ratio on the horizontal axis but replacing the vertical axis by Pedaling Efficiency and the size of the blue circles now representing the number of samples.

CycloSphere PowerWeightRatio vs Efficiency


We can further narrow down by gender and age group. Obviously, there is a  shift towards less power with increasing age, which seems natural. I am not sure whether the small change in ranking number at the bottom left of the graph really means something. I am actually surprised that the large visual difference in distribution only changes 1 percent in the ranking. If this is true, it would mean that performance of super powerful riders decreases with age but everyone else is not affected by age – which seems weird.

CycloSphere PowerWeightRatio vs Efficiency agegroup


In summary, the Cyclo-Sphere data again shows that I am slightly better than average, with the 41% ranking (counted from top) in Cyclo-Sphere pretty much exactly matching the 61% ranking (counted from bottom) for 20 min intervals in Cyclinganalytics – which is super surprising and maybe a random coincidence, considering that the user base could be pretty different (or do large populations of cyclists converge to some representative distributions?). There is actually also a slight difference in data processing as the latter uses overall historical best whereas the former looks at recent performance, but at this moment I am at my historical best (at least for the period I have been collecting data), so that doesn’t matter here.

The Pioneer power meter is a rare one: Whereas most power meters measure only tangential force which is the factor contributing to power (or, more precisely, process the data to extract only the tangential force and throw everything else out), the Pioneer isolates both tangential and radial forces and calculates efficiency using both. This difference also continues with the efficiency metrics: The more common efficiency metric in Garmin Vectors and others represents efficiency of tangential forces only, i.e. it measures mainly the tendency to not pulling up your legs enough between 6pm and 12am and leaving an force opposite to the rotation on your pedals. So, one could score a 100% efficiency with a Garmin Vector even if you’d be stomping on your pedals with a not-contributing vertical force at the top or bottom dead point, whereas a 100% score with a Pioneer means ideal pedaling with zero such radial forces. So, my 56.5% efficiency measured by Pioneer seems low if compared to tangential efficiency which is often much higher, but is actually pretty good and within the top 15% according to Cyclo-Sphere.

If we look in more detail, shifting from statistical population data to personal ride history shown in the graph below, we see a small but continuous improvement of overall efficiency over the last two years, and, separating left and right efficiency, we see that I am not astonishingly but still noticeably better on the right than the left. This could mean potential for efficiency improvement on the left leg even if I cannot eliminate a possibly natural tendency to be stronger on the right. Although of course, a natural dexterity advantage of the right leg could mean that I will always be less efficient on my left leg, one thought is that my right leg is more flexible with a larger motion range, so, improving flexibility of my left leg could be worth attempting. An extremely rough calculation of 8% efficiency improvement of the left leg or 4% overall would mean potential for an increase in 20 min Power Weight Ratio of 3.78W/kg to 3.93W/kg. Not huge but significant and probably worth attempting.

CycloSphere Efficiency over time

Cyclo-Sphere is free to use for anyone. I haven’t checked whether it will use the efficiency data of other, non-Pioneer power meters.




And the future?

The statistics I am really interested in and what may become available in the near future, given the broader use of power meters which allow for data driven, objective comparison over time and between cyclists are:

  • We are starting to learn that cyclists differ a lot in their response to training stimuli, and that standardized training, even if adopted by (single) measures like FTP, is somewhat of a lottery. It would therefore be extremely worthwhile to learn what “types” of training responses there are, to which type oneself belongs, and what kind of training is most effective over the long run or given time constraints.
  • We’d start selecting between services like TrainerRoad and TrainingPeaks depending on objective data about how much their users are really improving.
  • Training plans would not just be selected by the type of competition we are targeting and scaled by a single metric like FTP, but we’d have personalized custom training plans that combine the most effective training stimuli depending on an automated analysis of ones own and others performance data.
  • This could then lead to statistics showing where someone stands within cyclists spending the same amount of time in the saddle. It’s good to be fast and just spending more time on the bike is not guarantee to be faster, but still, you’d somewhat expect someone with significantly more time spent cycling to be faster. A “handicap” for less training time would feel going to far, but a competition for most effective training could be interesting.

Comparison of EMS devices for Recovery: Compex vs Marc Pro

Recovery from training takes longer as one gets older, thereby posing yet another limit to the amount of sustainable training besides work and family commitments. Pro teams may have massage therapists traveling with them for daily aftercare (even though research seems split about effect on performance beyond relaxation and softening hardened muscles). What options does one have  if one is neither pro nor married to a massage therapist?

Electronic Muscle Stimulation (or EMS) is advertised as being an answer to this. That’s the things you attach to your body to stimulate the muscles under them to vibrate independently from your will and which you see on ads as the holy grail to get a six pack while lying on your couch and watching stupid TV series. For some weird fairness ideal I somewhat resist using them to gain an edge in performance but that ideal isn’t as high as to deny them as a recovery tool. And some of the manufacturers claim to have research on their side as evidence and pro users as customers. So what’s the truth behind them?

The two brands that are most dominating are Compex and Marc Pro, so I grabbed one model from each that seemed to be characteristic for their technology.  So, here is a summary of my experience, subjective and totally not scientific, but at least based on real usage over several month and not just comparing their spec sheets or citing press releases.

The common science behind them: Small electrical pulses move your muscles without tiring you (!) and without requiring your conscious involvement, helping with flushing them out and softening them while you are doing something else. The trade off: You are moving your muscles with electronic shocks, so the amount your muscles move is a trade off with pain from the shocks.

Compex SP 6.0


  • 4 channels of 2 electrodes each, giving 8 electrodes in total, meaning you can use them for example on both calves and both thighs simultaneously.
  • Wireless, which means you could in theory use them under your pants at work or while commuting (though they measure more than 15mm in thickness and won’t fit under your jeans) – or actually even during workouts for some additional stimulation. I haven’t tried the latter yet as it is not the aim I bought them for and it also sounds and looks kinda perverse, and I have no idea how that would synchronize with my body movements – but that’s just my prejudice.
  • Rechargeable.

My experience:

  • It let’s the user choose from many specific programs that include Recovery, Relaxation, Capillarisation etc, but I felt unsure about how scientificically well engineered and proven those programs are.
  • It has an automatic protocol running through different intensities at the beginning of a session to determine the recommended strength, which could be totally useful I imagine if someone was using this on me to stimulate my muscles in order to avoid degeneration while I’m in a long coma, but maybe not really necessary as long as I am awake. From a technological viewpoint, it would of course be interesting to learn how it determines the appropriate strength from the electric pulses and resistance alone, without other means to measure the muscular response.
  • Automatic switch off after the chosen program has finished, which is potentially useful if I’d happen to fall asleep while using it.
  • Visibly similar seeming amount of muscle motion – assuming that that is an indicator for similar effectiveness – feels more painful than with the Marc Pro.
  • Slight changes in body pose can affect the amount of motion more than with the Marc Pro. It happened to me that if moved around while using it, muscle motions sometimes almost completely stopped (which doesn’t kill you) or that they became painfully and funnily large, which still doesn’t kill you but is less nice.
  • Ability to dial intensity in from a lot of steps, but the range of steps seems too large and the steps too far apart for small intensities – like 7 being way too weak but 8 being way too strong – so sometimes I end up with either not much motion or something way too painful. On the other hand, for other muscles I sometimes dial up to over 30, so I am not sure how much engineering effort was spent there to get that right.


Marc Pro


  • Only 2 channels, meaning I have to have two sessions, say one for both thighs, before doing both calves, or even three if I do front and back of the thighs separately.
  • Wired to a rather large main unit.
  • Rechargeable.

My experience:

  • Just one program, but seems to work: less painful than Compex for the same amount of muscle motion. Therefore seems more effective in the same amount of time, somewhat canceling out the disadvantage of having just two channels.
  • Steps are small enough to find the appropriate strength, and is easily operated by dials for either quick or fine adjustments.
  • The maximum intensity is, depending on where you place the electrodes, often below pain threshold. A stronger intensity might be more efficient to reduce time needed.
  • No automatic switch off, so it’s not recommended to fall asleep while using it.



  • In my opinion, they both seem to do what they are promising with respect to helping with regeneration and loosening my muscles. I don’t have any measured results to back that impression though.
  • They are quite different and if possible I’d recommend trying them both before choosing one. I am having fun with both of them, although, in my subjective opinion, the Marc Pro seems more effective and the Compex more geeky. It’d be nice if Marc Pro came up with a more compact, 4 (or more) channel wireless version, with thinner wireless electrodes.
  • To be honest, both seem totally overpriced compared with less sports-oriented EMS devices from manufacturers like Omron or Panasonic, which might not have the sports-related research evidence but whose electronics seem trustworthy enough to not kill me. Unluckily I didn’t have any to compare at this moment.
  • On the other hand, both seem at least to be way more effective and more durable than other massaging devices I tried so far like some that work with electric air pumps and airbags built into sleeves. Those designed for medical usage might be better but  are prohibitively expensive.



Marc Pro (right) with its only recovery program in intensity 6.4 out of 9.0 max versus Compex SP 6.0 (left) in one of its typical recovery settings. Both were dialed in to a subjectively similar pain level. The muscle motion of the Marc Pro is less frequent but seems more intense and felt to be more effective to me.

Recording power data from multiple power meters

In order to analyze data from multiple power meters you first have to record the data, which isn’t straight forward because most cycling computers will allow you to connect to only one sensor of each kind.

Possible methods I tried so far:

– Use multiple headunits. Uncomfortable because you need to charge and operate all those headunits, and unefficient because you need to export and synchronize the data by hand. Including left-right balance and other advanced metrics can be cumbersome: the only workflow I found so far is to first import .fit into Golden Cheetah, then export to Golden Cheetah’s JSON format, then convert from JSON to CSV.

Not that I’d put too much importance into left-right balance for which some are quick to emphasize that there is no scientific evidence for the benefit of using balance in training, but, as soon as you start comparing left-right independent power meters like pedal-based meters or the Pioneer pedaling monitor, and see some deviation, you need to dig into left and right power data.

– Use North Pole Engineerings WASP unit and their iOS app. Good if it works and if there are also many other sensors you’d like to record (like multiple Moxys): you just get one large file with synchronized data. Unluckily, the iOS app sometimes swaps data columns (which you then need to restore by hand, figuring out how they swapped), sometimes crashes, seems to get all those power spikes reported from the P1-Edge combination, just much more often. If these are all just issues with their iOS app, writing your own app might be the way to go; just didn’t have enough time so far to do so.

– Use one of the multi-user PC apps like PeriPedal (or PerfPro) that are targeted to spinning classes/gyms. It’s not a portable solution and you get separate files, but the data is at least synchronized. PeriPedal does not support advanced metrics yet.

– Use IpWatts on an Android phone. The app is free, so if you already own an Android phone with built-in ANT+ support, it won’t cost you anything, and there is only one item you need to have charged and ready. Records power and balance data from up to 4 power meters. It’s actually made of two separate apps: IpWatts and IpSensorMan, which captures the ANT+ data and can relay that data to compatible apps, making it possible to run multiple apps using ANT+ in parallel. I tried it a few times so far and it seems reliable.

Interestingly, it let’s you distinguish between no data and 0 W by having a separate data column for a “data/no data” flag. Percentage data is I believe received separately from power and may therefore drop separately; the app’s description says it simply uses the last value in case of bad reception but my data shows that there are also cases where it records as 127 (for “no percentage data available”, the value used in case of a downstream power sensor like KICKR or a power hub) or 0. So, this app seems to (partially) address the issue of most head units of mixing up no data and 0 W data. On the negative side, in order to benefit from this, one needs to implement data analysis differently from the usual workflow.

Other minor incoveniences I found so far is that operation isn’t totally smooth because of the need to run those two apps separately, connecting to all sensors can take some time (or I just haven’t found out yet how to do it correctly), visual appearance isn’t sophisticated (but heck it’s a free app), and you need to fish the recorded data from the Android file system and copy/send by hand.

Two possible future improvements I’d actually be willing to pay for:
– Cadence is now recorded from only one sensor. Accuracy of upstream power sensors (btw that’s I believe a term Alex Simmons’ created and includes pedal-based, crank-based and spider-based power meters) depends heavily on correct cadence, so, for comparing such power meters, you’d want to record their cadence data separately.
– I might be wrong, but the app display seems to refresh more often than at 1Hz, so, it might be that IpSensorMan captures and relays all the power meter data that usually arrives at about 4Hz. So, it might not be too difficult to extend the app to actually record all that data instead of averaging and reducing to 1Hz, which would really make it a stand-out feature.

Ipwatts sample data:

Ipwatts sample data with corresponding Garmin Edge data:

IpWatts power

IpWatts percentage

Power meter output variation between rides

Of course, power meter output for the same input power should not vary between rides. But how would anyone check without having access to a calibrated pedaling robot?

One-on-one comparisons between different power meters are always difficult to interpret. They might perfectly match (which they usually won’t) and you’d still not know whether they are right. More often, they will differ and you won’t know which one is off by how much, just that at least one of them must be off.

Using scatter plots as described in an earlier post

variation between rides - scatter plot sample - total

we can use linear approximation to get a single value describing the difference between two power meters. We can also do the same for left and right pedals each, although, keep in mind that spider-based meters like the P2M (or hub-based meters) will only provide an estimate.

variation between rides - scatter plot sample - left variation between rides - scatter plot sample - right

The inclination of the linear approximation is not a 100% statistically correct way of comparison here; there are issues like whether zero-points should be left out, whether the approximation should go through the origin of the axes, whether it should be linear etc, BUT, it provides a way to look at the overall trend.

I collected the data over several rides, some of which were taken on the same day, each ride starting with a zero-offset calibration of both meters and got this:

variation between rides - vector vs pioneer

The total instant power of both meters differs on averageby 3.5%, which could be within the manufacturers claims. The fact that it may differ by up to 6.4% is less so, especially considering that this data was taken in a climatized indoor environment without large temperature changes. Both meters measure left power and right power independently: The left meters match almost perfectly by a difference of only 0.4% on average, although s.d. is a bit larger; the right meters agree less and differ by 6.3% on average and 14.1% maximum (note though that after both rides differing by more than 10% the second ride agreed much more, which might show some kind of conversion, though, the left side is showing the opposite trend… ). Although both meters report different values for instant and average power, their comparison result is similar.

variation between rides - vector vs p2m

Comparison of Vector2 and P2M shows that, for total power, the variance between rides is slightly less than in the previous comparison, although the difference is slightly larger, 7.3% on average, the Vector2 always reading 4.5 or more percent higher, which is not nice. We also see that the estimated left/right balance is pretty stable over rides, although that doesn’t have to mean anything. We also see that, while the left vector pedal matches almost perfectly, the right vector pedal is again reading a higher power value than the power meter it is compared with.


Even calibrating immediately before a ride will not eliminate bias beween power meters. A difference of up to about 5% doesn’t seem unusual, at least for the compared pairs.

The right Vector pedal used in this comparison reads to high compared with both the Pioneer and the P2M. Note that the Vector had been calibrated using a known weight just days before, and the right Vector was scaled to about 98% of its original output value. I may need to talk with their support about this issue.

How precise are Power meters?

In statistics, precision refers to repeatability, meaning that something is precise if all measured values are close to each other, while accuracy is exactness, meaning that the average of all measured values is close to the correct value. So, a power meter can be accurate and precise (the ideal case), precise but not accurate (useful for day to day training, but not comparable with other meters), accurate but not precise (average values are correct, but a single sampled value can be far off), or neither accurate nor precise.

Obviously, an accurate and precise power meter would be nice to have, but it’s also pretty well known by now that power meter values can dance a lot, and many riders display 3 or 5 or even 10 second averages on their cycling computers.

So, how precise are power meters?

Let’s look at this ride, comparing Pioneer ang Garmin power meters. As usual, this is just a single one-to-one comparison ride, so, I am not saying this data is representative for power meters in general or these two power meter models in particular, it’s just what I got. And, if you’d ask me which one is correct, I’d of course say: neither one.long ride comparison - average - complete ride

If we zoom in, we get something like this:

long ride comparison - instant - two intervals

Large changes in power obviously correpond, but small changes look pretty chaotic. Averaging over, say, 30 seconds eliminates all small changes and we get this:

long ride comparison - average - two intervals

So, both meters follow each other pretty well, and the pioneer is a bit slower to respond to large changes, but then tries to correct itself by a steeper slope. (Overshoot is pretty well controlled, although we see one at the last decline.)

But, this isn’t really satisfying: If I buy a kitchen scale for which the manufacturer is stating an accuracy of plus minus 1g for any weight below 100g, I expect 95% of all measurements to fall within the stated error range. Aren’t we making it too easy for power meter manufacturers when we let them get away with a simple accuracy statement (that’s also difficult to check) without any promises about precision?

So, what can be done with data that looks like this:

long ride comparison - instant - 1500 to 1600

One thing to try is a scatter plot where every data point is visualized by a single plotted dot, here with Vector on the x-axis and Pioneer on the y-axis.

long ride comparison - instant - scatter plot 2

The dots on the left on the y-axis show that the pioneer has more zero values which might be caused by measurement or transmission errors. The linear approximate y=1.0223x shows that on average the vector output is about 2% higher than the pioneer. Of course, one has to be careful with these plots because different time delays and a non-symmetric ride profile could bias this data. On the other hand, if the relative delay time difference is constant, one could simply shift the data set and try out several delay time combinations to find the most likely delay time.

We also see that the blue dot’s don’t line up neatly on the linear approximate line, but create a blue belt about 20W in width …. that’s a lot. So, let’s look at the width of this distribution in more detail, just keep in mind that this is not the error histogram of a single power meter compared with a correct value, but the relative difference of two power meters, containing the error of both (including the possibility that they cancel each other out sometimes).

long ride comparison - instant - histogram of error between meters in W

I am actually surprised that the peak comes at zero difference between the meters and a lot of values fall between plus minus 5W from zero. If we re-format this into a cumulative histogram, we get:

long ride comparison - instant - cumulative histogram of absolute error between meters in W

About 50% of the values fall within a plus minus 5W range, standard deviation is about 7W, 95% (or 2 s.d.) is about 18W and 99.7% (or 3 s.d.) is about 40W. So, from a statistical point of view, it’s pretty much nonsense to display instant power on your power meter.

I really think we need to have power meter manufacturers state how precise their meters are or have some independent organization check them with a calibrated pedaling robot.

Calibrating Power Meters with known weights

It’s often said that for day-to-day training, the important property of a power meter is precision (i.e. repeatability of measured values) and not absolute accuracy (i.e. correct value). I do agree but … what if you had several bikes with power meters fitted to each (okay, that’s actually another reason to swap a hub- or pedal-based power meter!) or wanted to review your long term performance changes 10 years from now?

Some power meters allow to check their absolute measurements after doing a zero-reset (example: Pioneer displays force in [N]), some even allow you to specify a scaling parameter after checking (example: Vector displays torque in [Nm] and allows to store a scaling factor in the pedal to correct their output). Some, like the older P2Ms, unluckily don’t do any of this.

Garmin has a manual on the internet for the recommended procedure. Although they mention the difficulty of measuring a heavy weight of over 10 kgs to the required precision, in their example they are using a large weight, and hanging that from a pedal requires hanging the bicycle high up in the air while attaching the weight … nothing I’d be keen to try.

One alternative could be to just use a calibrated weight that’s used for checking scales, which looks like this:

power meter calibration weight 2This one here is a 10 kg weight (which I admit is a bit on the light side, even as a light weight cyclist with a not too high maximum power number; 10 kg is equivalent to between 150 and 200 W at cadence 100 for ideal completely round pedaling or probably about 50 W at cadence 70 for typical not-round pedaling) accurate to plus minus 1.6g (guaranteed for one year by the manufacturer), which is far above the accuracy needed for this procedure. A 20 kg weight would only measure about 25% more in height/width/depth each and still be compact enough to measure both tangential force (as seen in the picture with a horizontal crank) or radial force (with the crank in upward position) with the wheels on the floor.

Together with the metal hardware like shackles to mount the weight to the pedal, measured on a extra precise kitchen scale, the total weight was 10184.5g plus minus 2.2g or 0.02% accuracy. With power being linear to force and torque, that’s more than accurate enough. (Sorry for the blurry smartphone picture.)

power meter calibration weight - small hardware on kitchen scale

My results for Vector2: Expected 16.485Nm (for crank length 165mm), measured 16.81Nm on right and 16.44Nm on left.

My results for Pioneer: Expected 99.91N, measured tangential 102N / radial -102N on right and 98N / -102N on left.

In both cases, that’s about 2%, which means that without any other error, the final power values could be within 2% error.


The big question here is of course: even a slight cadence error of 1 rpm will set this off largely, so a 2% error of the final power value is actually unlikely.

I might better use not just a heavier weight, but actually several different weights.

DCRainmaker reported that the yet to come Watteam Powerbeat will use a plastic bag that fills with an exact amount of water to act like a accurate weight. If that works, that’d be nice, although, hanging like 10 kgs = 10 liters doesn’t seem very practical.

Somewhat related: Wahoo used to sell, and probably now rents a weight for calibrating the power meter inside their KICKR trainer whereas Tacx claims their new really direct drive trainer is calibration-free. (The KICKR was only half direct between chain and trainer, but still had a belt driving a flywheel, while the new Tacx is doesn’t really have a flywheel and is completely electronic producing a virtual feeling of inertia by electronic control.) The principle behind the Tacx I guess is that calibration is not necessary if you can control or measure electric current very accurately. A more simplicistic view could be: for a rotation sensor you’d either have a magnet switch or some self-calibration using accelerometers and gravitational force, so why having to calibrate a power meter, isn’t that just poor engineering?

Power meter accuracy specifications

As far as I know, Verve cycling is the only power meter manufacturer that publishes somewhat trustable accuracy specifications:

  • Power range: 0–3000 Watts
  • Cadence range: 10–200 rpm
  • Accuracy of cadence: ±1 rpm
  • Accuracy torque: ±0.2 Nm accuracy for measurements below 20 Nm, and ±1% of actual readings for measurements above 20 Nm (ask for our Accuracy Certification)
  • Power: Can be calculated from any cadence value within the range at any torque
  • Power update rate: Every rotation

Surprisingly, even SRM only gives one single number, although added with a blunt statement:

Accuracy  ±1% (Scientifically Proven)

No proper scientist would state an error number like ±1% without specifying for what range of conditions that number is valid. So much about science.

Assuming cadence as a function of power:

accuracy - cadence

the specifications of verve systems gives this error:

accuracy - power error in percent with cadence error


which translates into W like this:

accuracy - power error in Watt

In other words: Verve cyclings’s Infocrank has an accuracy of about 2% above 110W (like most other power meters) and about 5% or 2W at 50W. Well, that doesn’t seem really significant to me, but still: hiding (including just forgetting to mention) such a fact doesn’t seem right either, and I strongly feel that since most power meters are engineered using somewhat similar principles, a lot of manufacturers have some clarifications to make.

(added new section from here on)

The interesting point here might actually be: How did I get from a 1% torque error to a 2% power error, if power and torque are related linearly?

accuracy - power error in percent without cadence error

Now we see: without cadence error, 1% torque error of course results in 1% power error, but a cadence error of just 1 rpm (or 1% at cadence 100!) will add another 1% to the power error. So, if you have a cadence sensor with an absolute 1 rpm error range, you’d want to pedal quickly to get more accurate power figures …. well, that’s just a joke, but, measuring rotation accurately is a really important factor here, and the more wheel rotations of a hub-based power meter compared with crank- = pedal-rotations would make a hub-based meter easier to engineer for high accuracy.

(I hope I didn’t make any calculation errors and would be happy to be corrected.)