Caveat Lector

I really did double-check this time and I won’t be making any wild claims here. Sorry to disappoint.

We’re going to be running Antonio’s Ruby Benchmark Suite daily to track our progress on performance in Rubinius. The current RBS is a bit of a beast so I imported the files into the Rubinius repository and did some refactoring. You can read the details and up-vote that if you’d like to see this merged back.

Now, for some baseline RBS results. If you want to follow along at home, here’s what I did. I generated these by running the rake bench task using the VM option (see the benchmark/utils/README in the Rubinius repository) for Rubinius on the stackfull and master branch and for MRI using the version installed on Debian lenny, 1.8.7p22. The system is a dual Intel® Xeon™ CPU 2.40GHz. Then I ran the rake bench:to_csv task, imported the CSV file into Google Docs, added the comparison columns and colors, and exported to PDF.

Here’s what I got. The green is faster, the red is slower. The reported time is the minimum time recorded in five “iterations” of each benchmark per input. The maximum time allowed to run five iterations is 300 seconds, or an average of 60 seconds per iteration.

A few notes about these numbers:

  • We’re still fixing the breakage on the stackfull branch, so it is not surprising, for instance, that all the thread benchmarks errored out. The new native thread support is not 100% done.
  • There are a couple speed regressions on the stackfull branch, most minor. We’ll fix those.
  • Most of the benches do run on the stackfull branch.
  • On most of the benches that run slower in stackfull than MRI, we’re 2x or less slower than MRI.
  • We are a lot faster than MRI on quite a few benchmarks.
  • Rubinius on either branch does quite well relative to MRI on benches that MRI times-out on for certain inputs.

Perhaps the biggest point about the stackfull branch is that we haven’t done much optimization at all. Evan’s been coding in the basic new interpreter architecture, fixing the GC interaction, adding the native threading. We’re fixing breakage now so we can get this merged into the master branch. The JIT is not hooked up. The new GC work is not done. There is no inlining. In other words, there is lots of head room. And that is the key point. You can’t just “make it faster”. Architecture is crucial. Since RailsConf 2008, we’ve been working hard to lay the architectural foundations. With those (and the switch away from stackless), we can start focusing on the real dynamic language optimizations.

While the benchmarks tell part of the story, there’s another part that is even more interesting IMO. And this is the part that got me so excited I, um, well I just got excited...

The two biggest pieces of Ruby software that we most often run are the Rubinius compiler and the RubySpecs. The RubySpecs are much more “real-world” than these benchmarks. Here are the results of two complete CI runs on master and stackfull. Note that we are not quite running all the basic CI specs on stackfull, but we’ll figure in that difference in our calculations below.

First, on master:

  $ bin/mspec ci --gc-stats
  rubinius 0.10.0 (ruby 1.8.6) (f4c5576c4 12/31/2009) [i686-apple-darwin9.6.0]

  Finished in 131.248169 seconds

  1430 files, 6927 examples, 23006 expectations, 0 failures, 0 errors

  Time spent in GC: 51.6s (39.3%)

And then on stackfull:

  $ bin/mspec ci --gc-stats
  rubinius 0.11.0-dev (ruby 1.8.6) (e7b6a2d56 12/31/2009) [i686-apple-darwin9.6.0]

  Finished in 66.357996 seconds

  1349 files, 6298 examples, 21344 expectations, 0 failures, 0 errors

  Time spent in GC: 12.7s (19.1%)

Let’s calculate how we do in expectations per second:

  $ irb
  >> master = 23006 / 131.248169
  => 175.286254850534
  >> stackfull = 21344 / 66.357996
  => 321.649255351232
  >> stackfull / master
  => 1.83499416782851

So, compiling and running the specs is about 1.8 times faster on stackfull. This is upside down from the normal results. Normally, we do better on the micro benchmarks and see that invert on “macro” benchmarks. On the RBS benches, stackfull is not 1.8 times faster than master. If I average the “x Master” column, I get 1.39.

There was something else in those spec run numbers I wanted to talk about… oh yeah, GC stats. We have a very simple GC timer stat right now. I’m going to be adding a few more stats. But what we see here is that the overall percentage of time spent in GC drops by half in stackfull. Even so, 19% is too much time to spend in GC. We expect to drop that by half again. Basically, leaning more on structures alloca'd on the C stack reduces a lot of pressure on the GC.

Some would toss out that it’s not hard to be faster than MRI. Perhaps. But it is an accomplishment to write a reasonably good VM, garbage collector, compiler, and Ruby standard library without importing anyone else’s code. And, lest we forget, that is two VM’s in about 27 months of a public project.

Some would also question the sanity of writing a VM and garbage collector when crazy smart people do things like that. Well, crazy smart people write papers that reasonably smart people can read and understand. From the benchmark result above, that is working pretty well.

Here’s the point: Don’t ever let anyone tell you that something is a bad idea. Make your own decisions. We probably wouldn’t have Ruby itself if Matz fretted over whether Larry Wall or Adele Goldberg were smarter than he. My most recent favorites in this space: Factor, Clojure, and yes, tinyrb.

We’re working frantically to get the stackfull branch breakages fixed and the branch merged back into master. Feel free to poke around and ask questions.

6 Responses to “Caveat Lector”

  1. macournoyer Says:

    “Don’t ever let anyone tell you that something is a bad idea”

    no, I wont! great advice Brian! thx

  2. Brian Says:

    @macournoyer :D great inspiration can come in small packages, tiny ones even.

  3. Charles L Says:

    Good results. Out of interest, do those benchmark numbers include full parsing time, or is rubinius using pre-compiled rbc files on subsequent executions?

  4. Arthur Schreiber Says:

    Wow, seeing rubinius’ performance increasing that much is pure awesomeness. :)

    Brian, I hope that you’ll continue to post more often about performance improvements in rubinius, as stuff like this makes people get interested in the work being done.

  5. Radarek Says:

    Nice write up. I’m very impressed by your work guys. For me it’s a magic :).

    Btw, I have small suggestion. Gray text on black background is very hard to read… :/.

  6. Brian Says:

    @Charles, it saves the compiled methods to the .rbc files and loads those on subsequent runs.

    @Arthur, thanks! I will be posting more about the work we’re doing.

    @Radarek, thanks, too! I know, my blog theme so needs changing. Every time I consider doing it, there’s something more interesting in Rubinius to work on. :) But, I’ll get it changed.

Sorry, comments are closed for this article.