GPS accuracy comparison using Portola Valley Low-Key Hillclimb data

As I noted, when dealing with GPS problem cases in the Portola Valley Short-Hills version of the 2013 Low-Key Hillclimbs, I couldn't help but notice every one of the cases I grappled with was an Edge 500. This is anectdotal, so I wanted to take a closer look at the problem.

The initial plan was to scrape the HTML from the Strava pages with a Perl app, since the API doesn't provide computer type, but when this didn't work out for me since Strava requires user authentication to see this info bit I started thinking about PHP options but finally when I couldn't sleep last night I just went to the pages sequentially and transcribed the computer identifier from the browser. Brute force. Not elegant. I feel so dirty.

There were 69 riders @ Portola Valley who each reported the URLs of their Strava records. I then compared these using the root-mean-square average of the distance from the center of the lines the riders triggered the lines (units: meters). The ideal number isn't zero, because my lines are generally in the center of the road and riders are to the right, and in any case there's other sources of error than rider GPS, but "perfect" would probably be 2 meters or so.

Here's the average by computer type:

computer
n rms dl
rms rms dl
Osync Nav2Coach
1
2.90553
Forerunner 405
1
3.17032
Edge 305
2
3.78266
Strava iPhone app
5
6.5897
Edge 800
9
7.98625
Edge 705
4
8.95446
Mobile
1
9.02079
Edge 200
3
10.6593
Strava Android App
4
11.3408
Edge 510
8
18.5831
Edge 500
31
65.3486

I am told Scott Byers was using an Osync Nav2Coach, which Strava failed to identify. That unit clearly works extremely well: his score was close to ideal. Indeed, it's a testiment to the superb registration of Google's satellite maps, based on which I placed the lines.

It's not as bad as it looks for Edge 500, though. There's plenty of decent results from Edge 500. It's just it has a virtual monopoly on the really bad results.... curiously along with one screwed-up Joaquin from an Edge 510.

Detailed results (hopefully this renders properly) with numbers linked to Strava activities:

ranknumrms_dlnamecomputer
13262.74391Jeff ShuteEdge 305
2492.85231David ColletStrava iPhone app
3372.90553Scott ByerOsync Nav2Coach
4583.11199Andy CrewsEdge 705
51323.17032Stefano ProfumoForerunner 405
61563.22356Todd StudenickaStrava iPhone app
7233.24592Daniel AminzadeEdge 800
82043.34927Bryn DoleEdge 510
94073.40899Brandon IlesEdge 500
10653.41488Giles DouglasEdge 800
112123.75249Peter C IngramStrava iPhone app
121253.90218Frank PaysenStrava Android App
13833.9244Rich HillEdge 500
14534.12421Tracy ColwellStrava iPhone app
152034.24395Kevin ColagiovanniEdge 500
16124.29885Will von KaenelEdge 800
174114.32206Lucas PereiraEdge 705
184124.32909Kieran SherlockEdge 500
191054.59216Doug MacPhersonEdge 305
204044.67041Heidi FraserEdge 500
214085.10658Tom K.Edge 510
221525.14333Daryl SpanoEdge 800
233275.26018Brandon SmithEdge 510
241505.27501Gregory P. SmithStrava Android App
254135.36954Liam SherlockEdge 705
264145.59722Tim SullivanEdge 500
274155.68787Jeff WeitzmanEdge 800
282305.8452Kris McQueenEdge 500
291715.94689Phil LovaglioEdge 500
304026.17513Chris EvansEdge 510
31486.20068John ClarkeEdge 510
321476.41201Marty ScottEdge 200
334016.42501Gino CetaniEdge 200
343166.89468Bogdan MarianEdge 510
35716.96707Stephen FongEdge 500
361667.1123William YeeEdge 510
371357.35438Mihai R.Edge 500
382077.46039Robert EasleyEdge 800
39958.01097Mark KingEdge 500
40278.66326Kate BergeronEdge 800
412238.72471Eva SilversteinEdge 500
424108.8267Paul McKenzieEdge 800
434038.96959Scott FrakeEdge 500
44358.97069Sugar BrownEdge 500
45629.02079Mike DavisMobile
4613311.6204Alec ProudfootStrava Android App
471411.7634Rich McLovin BrownEdge 500
4840612.9423Martin HylandStrava iPhone app
4932813.2219Ray SmithEdge 500
5016013.2689Luis ValenteEdge 500
5115113.9149Kevin M. SmithEdge 500
5230414.4409Paul CothenetEdge 500
533116.0769Blue BrownEdge 200
5412616.2337Lisa PenzelEdge 705
5516116.307Greg WatsonEdge 800
5611417.2566Shahram MoatazediEdge 500
577918.3404Bill HarkolaStrava Android App
587319.1305Chris FurgiueleEdge 500
5930119.4436Amy BruskiEdge 500
6030020.7875Billy Bob BrownEdge 500
613221.6406Haba?ero BrownEdge 500
6240030.3386Michael AndaloraEdge 500
6340549.0466Bruce GardnerEdge 500
649850.2294Michael KowalchukEdge 510
6540954.9151Bill LaddishEdge 500
66122109.659Bart NiechwiejEdge 500
67209146.149Janet GardnerEdge 500
68318186.928Trish PachecoEdge 500
69130233.005Mark PowersEdge 500

So 11 of the worst 12 are Edge 500's. In contrast only 1 of the best 12 are Edge 500's.

31 of 69 are Edge 500's, so the probability of N out of 12 being Edge 500's, by luck alone, are (using the binomial distribution; Poisson statistics aren't good enough for Low-Key):

0
0.257%
1
2.09%
2
7.69%
3
16.7%
4
23.9%
5
23.4%
6
15.9%
7
7.41%
8
2.27%
9
0.411%
10
0.0335%

So the probability of, with luck alone, of no more than 1 in the first 12 being Edge 500 would be 2.4%. The probability of at least 11 of the final 12 being Edge 500 is 0.44%. The combined probability of both of these occurring is 0.011%.

My pick of the number 12 was a biased pick so this isn't really a fair comparison. But it's fairly clear the Edge 500 is particular prone to position error. This is perhaps not representative of new Edge 500's.

The Edge 500 was the most popular computer with 31. The Edge 800 was second, with 9. The third most popular was the Edge 510, with 8. If I do a ranking of all of the results, considering only Edge 500 and Edge 800, there are 40 total. In that ranking the Edge 800's rank 1, 3, 6, 9, 11, 16, 18, 20, and 28. So in that ranking, of the top 20 computers, 8 are Edge 800 and 12 are Edge 500. Of the bottom 20 computers 19 are Edge 500 and 1 is Edge 800.

Suppose I distribute 9 Edge 800's at random among 40 ranked slots. What's the probability at most 1 would be in the 20 lowest ranking slots (and at least 8 in the highest 20 ranking slots)? The number of ways to distribute 0 in 20 and 9 in 20 is 167960. The number of ways to distribute 1 in 20 and 8 in 20 is 2519400. So the number of ways to do either of these is the sum: 2687360. The number of ways to distribute 9 in 40 is 273438880 . The ratio is 0.983%. So the chance of this happening at random is 0.983%. This strongly suggests the Edge 800 is more accurate on average than the Edge 500. However, you can find plenty of good Edge 500 results.

So I establish the Edge 800 is likely better than the Edge 500. Is it better or worse than the Edge 510? 17 of the computers were either Edge 800 or Edge 510. OF those, the Edge 510s ranked 2, 5, 7, 9, 10, 11, 12, and 17. The Edge 800's ranked 1, 3, 4, 6, 8, 13, 14, 15, and 16. The Edge 800 did slightly better but it's too close to conclude anything from this.

There were 6 different Edge units at the Low-Key. There were 5 iPhones. The iPhones did better than 5 of the 6 Edge units; the 2 Edge 305's did better than the iPhone. Between the Edge units and the iPhones, there were 62 activities. The 5 iPhone activities ranked 2, 4, 9, 11, and 42 of 62. So of the top 11, the there were 7 Edges versus 4 iPhone apps. Of the bottom 51 there were 50 Edges and 1 iPhone apps. I won't calculate the probability of this occurring by chance: it's small.

It's interesting, because you'd expect a phone carelessly shoved into a pocket would be inferior to a specifically designed head unit mounted lovingly on the handlebars. On the other hand, the phones have two advantages. One is they are large. Larger = more room for an antenna. The first iPhone was infamous for its poor GPS antenna. I am told the antenna placement in the iPhone was late in the design process, so it was made to fit in available space, rather than being placed early in the process for better optimization. But iPhone users tend to upgrade their hardware, and I doubt there were any early-generation iPhones represented here. The other advantage phones have is they can access the cell towers and use those to help with position determination. Even if no GPS satellites are available, if the phone has access to at least 3 cell towers it can get a position fix. I don't know how much power the phones versus the Edge units are willing to devote to the GPS circuits.

Comparing the phones, the iPhone did a bit better than the Android, but I am reluctant to draw too many conclusions since Android runs on so many different hardware designs.

So lots of interesting stuff here. The conclusion is among the Edge units, the 500 has the most trouble. The 800 is clearly better than the 500 with high probability, and the 800 and 510 are close. By inference the 510 is better than the 500. The iPhone app does well, even in comparison to the Edge units. And older Edge models (the 305 and 705) seem to do about as well as the newer ones. There were no Edge 810's in the mix.

Comments

U. Block said…
Hmmmm... I had a 1st-generation iPhone. There was no GPS. None.

So yes... it probably was infamous for poor GPS reception. :)

I also think it's possible the Edge 500 has MORE room inside of it for an antenna than a recent iPhone. Space is extremely limited in there.

Thanks for the analysis.
djconnel said…
Thanks for the correction! iPhones in common circulation circa 2010 seemed to produce poor results. These were perhaps the iPhone 3G or 3GS (Wikipedia).
Unknown said…
I wonder if the rate of sampling is relevant on the Garmins (ie. every second vs "smart" sampling). I'm guessing this isn't exposed in the data from Strava as they've already processed it.

Re. logging in to scrape pages, I've had luck with CasperJS. It's handy for most sites being webkit, although I don't think it copes with HTML5 stuff such as local storage.
Tom Anhalt said…
Can you follow up to ask how many of those Edge units are set on (not so) "Smart Recording"? I'm thinking that might have an effect.
djconnel said…
Tom: good idea. DC Rainmaker suggested the same thing. I of course could determine that fairly easily.
Robert said…
Very nice.

Do the 800's have a different default setting for "smart recording" than the 500's?
djconnel said…
Great comment emailed to me from Patrick, who doesn't have a Google account:

Fascinating analysis, Dan (as always)! Thanks so much. I have owned two Garmin 500s. The first was absolutely horrendous, dropping segments constantly. I ended up exchanging it at REI eventually and the replacement is also poor ("poor" compared to the 500s of some frequent ride companions) but barely serviceable.

It so happens that my regular ride partner tracks her rides with a Garmin Forerunner 310XT. And the GPS tracking is consistently better and more accurate than any Garmin 500 in the group. When I dealt with Garmin customer service a number of times, they tried to convince me that all of the dropped and lost segments must be a product of "trees and cloud cover" which might be plausible except that the rider next to me using a 310XT never had any such problems...

Garmin then informed me that the Forerunner 310XT uses a "totally different technology" to lock into satellites. I gather this is the "HotFix technology" which Garmin integrates into many of their automotive products. Per DC Rainmaker, "Yes, the FR310XT has a newer chip than the FR305, including hotfix technology for quicker pickups."

Why the Edge 500 does not incorporate this technology too, I could only speculate. But I am guessing that the Forerunner is primarily intended as a trail running and lake swimming watch and needed a higher grade of GPS signal detection than Garmin believed the Edge series needed for cycling applications. Of course, the wide range of 500 results shown in your test (from horrendous to good) probably points most directly to quality control problems in their production more than anything else.

Discussing these issues a bit with Paul Mach who developed the SNAP tool and now works for Strava, I also noted that a huge underlying problem with Strava segments is that many segments are originally drawn or created with a lot of inherent GPS drift.

On a segment I cover almost every week (~4 miles), we tested this hypothesis a bit by creating as exactly parallel a new segment as possible to an old segment using the Forerunner 310XT data. The new segment timing pretty much exactly corresponds to stop watch timing, whereas the original one consistently gives times about 10-12 seconds longer than the new Strava segment or the wrist watch. I assume this is a function of GPS drift. Even though the times are longer for the same segment, Strava calculates the avg. speeds significantly higher for the original segment, presumably because it believes that more distance was covered in the longer amount of time.
djconnel said…
Robert: both the 500 and 800 default to smart recording unless power is being recorded. Power analysis software which calculates normalized power can be confused if data are not provided every second. Same deal with maximal power curves. So Garmin decided that power analysis required uniform sampling. But they didn't anticipate Strava, where position detection would also benefit from high resolution. They figured position was just to draw maps of where you'd been.
Michael Barnes said…
Dan, Interesting, thanks. A few thoughts:

On 1-2 week Santa Rosa Cycling Club tours, I started bringing a deep-cycle 12 v. marine battery and an inverter so that people could recharge all their devices (phones, cameras, Garmins, etc.).

I always encouraged people to leave their phones turned off, because in remote areas, the phone cranks up its power to attempt to ping non-existent towers. IPhones that were left on needed to be recharged daily, while my Garmin 500 would last almost a week of 70-mile daily rides.

On the last tour, some people used their iPhones for GPS bike apps, and had to leave their phones on, leading to a huge crowd of people wanting to recharge every night.

I suspect part of the relatively good quality of iPhone GPS is that it might have more battery power available, in addition to bigger antenna. I know GPS is a power hog, because when I use my Garmin 500 with my powertap wheel on a trainer, and turn off GPS function, the battery lasts forever.

Personally, I think the Garmin 500 is terrible from a ergonomic standpoint. I'm not sure mine is a great GPS unit, either. For awhile I used it while I ran laps on the local HS track (a tough test of a GPS unit, I admit) and in the tree-lines streets of the Berkeley Hills. The Garmin was all over the place--literally.

I always use an old-fashioned wired bike computer alongside the Garmin. Leonard Zinn says he just puts his Garmin in his jersey pocket. They are very useful as "black boxes" in case of an accident, which is a main reason I continue to use it.

It also really annoys me at how bad the Garmin is at providing ride information,especially splits, on the road without having to close out the current ride file. I'll take my old Cateye ATC 3000s any day. I have three, two have been working for almost 20 years, the third one finally died.

So maybe Garmins, like low-end digital cameras, will be replaced by smart phones and their apps. Smart phones are becoming the digital equivalent of the Swiss Army knife.

Michael Barnes
former (yet still appreciative) LKHC'er
djconnel said…
Great stuff, Michael!

I agree with everything.

When I use my Android-based Droid Incredible to run Strava for nontrivial rides, I always run in Airplane mode, because of why you cite, and my battery is fatigued. I'm surprised how well it does.

On the Edge 500: I agree. I like to believe it was my suggestion of "Last lap power" that was responsible for getting even that into the unit. You used to have to go through history, which was really terrible. As it is you can see only last lap power: no scrolling through laps. But it's still very useful: when doing an interval I want to see distance, time, power, lap-average power, and last lap power. For last-lap time or distance I need to wait. I do, however, like the form factor: it's light and small, unlike the clunkier Edge 510. So "black box" is close to true.

On black boxes: I wish cars all had them, not just me.

jonah said…
Dan this is great stuff. I wanted to ask for more detail on how you computed the error in each person's GPS data. In your post you said, "I then compared these using the root-mean-square average of the distance from the center of the lines the riders triggered the lines (units: meters)."

I've read this a few times and it's not clear to me what exactly you did. I'm also interested because I'm going to have my hands on both an Edge 500 and 510 next week and I wanted to quantitatively compare their GPS accuracy.
djconnel said…
Based on a Google map, I defined "lines" riders needed to cross to complete the course. These lines had a center point, and a right-hand-edge point, and a left hand edge the same distance but opposite direction from the center as the right. So I expected the rider trajectories to intersect these lines somewhere across the road. The lines were much wider than the road, however, to accomodate GPS error.

So after determining the intercept of the rider trajectories with these "lines", I determined how far from the center the rider crossed each line. Of course the "perfect" answer isn't zero. But it is some small number of meters.

So I calculated the root-mean-square such distance over the multiple checkpoint lines for each rider, then compared results based on the GPS unit the rider was using. The Edge 500's tended to be the largest crossing distances from the line-centers.

If there's an error along the direction of travel, I wouldn't detect it: just lateral, perpendicular to the road direction, since my lines crossed the roads, versus running along them.
Diablo Scott said…
I realize this post is a couple months old, but it would be really interesting to do a similar comparison with the altitude numbers.

djconnel said…
That's a really good idea. Unfortunately I didn't set the upload script to transfer altitude on this one. But I later fixed that. I have altitude in the Montara Mountain dataset. I'd need to collect the computer types for those, however.
Wihelm said…
GPS accuracy comparison using Portola Valley Low-Key Hillclimb data ... gpsforerunner.blogspot.de

Popular posts from this blog

Proposed update to the 1-second gap rule: 3-second gap

Post-Election Day

Marin Avenue (Berkeley)