Sunday, February 19, 2012

StravaToCSV : It's Ruby's turn

StravaToCSV has become my "test app" for various programming languages (Perl, Java, and now Ruby). And for that it works fairly well: I need to process command line arguments, open an HTTP connection to Strava, download JSON data, convert it, then then output it as CSV. So there's a decent amount there.

This project went much smoother than my Java implementation. It was fairly quick, taking a bit longer when I wanted to avoid the program imploding when it was fed a bad activity specification.

This version takes as its only command line arguments activity numbers (for full activities) or activity-segment pairs, where the activity number is separated from the matched segment by a "#". It will sequentially load each of these, outputting a CSV stream with the header determined by what fields it finds in the first non-empty activity. It adds the activity number as the first column of the CSV stream.

At first I expected the JSON library to have some sort of validity checking method for the stream: does a stream represent valid data? But I didn't find any. Instead when attempting to convert data it will throw a "JSON::JSONError" exception if it can't figure things out. So I use the Ruby "begin... rescue... end" construct to check for this. This is about as simple as exception handling gets, in my experience.

Anyway, it seems to work. It's my first experience with Ruby, so I'm probably not doing things in the best fashion, but it's nice to see it do what I wanted, anyway.

I'm sure the Ruby coders are hurling at the mere sight of this... Feedback welcome!

#! /usr/bin/ruby
require 'net/http'
require 'json'

# strip a path from the program name: for error messages
progname = __FILE__.gsub(/.*\//,"")

# loop through command line options
# each should be a valid strava activity
activities = []
ARGV.each do |arg|
  # check that arg is a valid format
  if arg.match(/^\d+#?\d*$/).nil?
    warn "ERROR: poorly formatted arg #{arg}"
    exit 1

  activities << arg

headers = []

activities.each do |activity|
  warn "#{progname}: processing activity #{activity}"

  url = "{activity}"

  # get the data structure from the file
  # get does not raise exceptions in Ruby 1.8, according to class documentation
  http = Net::HTTP.get(URI.parse(url))

  # convert to data
    data = JSON.parse(http)
  rescue JSON::JSONError
    warn "#{progname}: Error parsing Strava activity #{activity}!"

  # if we don't have data, then there was an issue: go on to next activity
    warn "#{progname}: No data found for activity #{activity}!"

  # list the returned fields
    headers = data.keys

    # print the headers, but special case for latlng, which must be split
    puts "activity," + headers.join(",").sub("latlng","lat,lng")

  # iterate over the indices of the first field
  (0 ... data.values[0].length).each do |i|
    output = [activity]
    headers.each do |k|
      z = data[k][i]
      output << (z.nil? ? "" : z)
    puts output.join(",")

Anyway, I'm getting a really positive feeling about Ruby. I don't feel like I'm fighting it: it's a fairly coherent design. With other languages I often feel they've been pushed beyond their original scope, that they've become a house of cards of layer upon layer forced in place to accommodate unforseen paradigms or needs. Java, with it's HTTP module, feels very much like that, as does all of C++. And Perl with it's ad hoc object handling and endless selection of CPAN libraries is just a big bucket of chaos. Maybe Ruby is moving in that direction now that the number of cooks has grown. But at least for this example I feel as if it all works together.

I ran some quick benchmarks, converting a recent long ride. Here's the execution times for the three versions, where I avoided internet activity while the code was running:

Perl: 32 seconds, then 42 seconds (two iterations)
Ruby: 38 seconds, then 44 seconds (two iterations)
Java: 30 seconds, then GSON threw an exception

Java barfed on me, I'm not sure why. I'd need to debug that... maybe the activity was too large, because it runs on small activities. But it doesn't surprise me: the syntax is all fairly opaque and that makes it prone to coding errors. Perl might have been faster than Ruby, maybe not... but in the Perl I cheated in downloading the URL with wget, which is optimized compiled code, while in the Ruby I'm using Ruby to download. I guess the main thing this proves is my AT&T/Yahoo DSL is slow, slow, slow.


thveem said...

I'm part of a online exercise logging project, written in python/django. Maybe it would be interesring to you?
We have a strava-parser too: (not written by me)

If you have any questions please contact me!

djconnel said...

Thanks! It looks quite interesting. Presently I'm tied up in GUI matters for my project. Still not 100% on Javascript/DOM. I'm making good progress, though.

Step 1 is app which lets users map their rides on Google Maps. Of course Strava already provides this, so it's just a building block to what I want to do. But just creating a dialog to let riders scroll through their ride list and select one takes days worth of available time.

thveem said...

I feel your pain, both with respects to learning javascript and not having enough time. I too balance job/cycling/coding/etc :)

Please have a look at one of my exercises on my site:

If you feel like you can achieve your goal working with us I'd be happy to recieve patches. Or even if you want to use parts of our code to do what you want, so we could share hard stuff, like signal smoothing, etc. I'd be very interested.