A VM by any other name
With the recent switch away from the spaghetti stack execution model, Rubinius has also acquired native threads. A big part of understanding something is syncing up our mental model with reality. If you’ve ever tried to explain what an OS is to your mom, you know that can be a challenge. So let’s peel back a few layers and see where these native thread critters fit into Rubinius.
Rubinius is an implementation of the Ruby programming language. One of the bigger components is the virtual machine. But what is that? Unfortunately, virtual machine is a label for a category of software (and maybe hardware) that, well, does a bunch of different things. Virtual machines are often thought of as virtual computers or virtual CPUs. The problem with trying to equate two things is that you look at the one you know about and try to understand the one you do not by looking for analogous structures. Therein lie the seeds of misunderstanding. Since Rubinius has this vm/ directory, we’ve got to try to nail some of this gelatin to the wall.
Thinking about threads running inside a physical computer, you might visualize the relationship something like the following Ruby-ish pseudo code. As the program starts up, it creates a virtual machine.
1 2 3 4 5 |
def main vm = VM.new # do some setup vm.run end |
Sometime later in the program, you add a new thread, which might be implemented something like this.
1 2 3 4 5 |
def Thread.new thr = VM.threads.create # ... return thr end |
You might imagine the VM instance having a Scheduler component that would supervise the threads and arrange for running them on one or more processors or cores. In this model, the VM is really like a virtual computer in which all execution is occurring. In other words, the VM is composed of multiple threads of execution.
The point of this mental exercise is to expose the tacit assumptions we might have about our mapping between a real computer and the virtual machine. Now let’s delve into Rubinius.
The main function is located in vm/drivers/cli.cpp. The first thing it does is create an instance of Environment, which is composed of an instance of VMManager, SharedState, and VM. In the Environment constructor, the command line is parsed for configuration options. Then the manager creates a new shared state. The shared state creates a vm. And finally the vm is initialized. During initialization, the ObjectMemory is created. The object memory in turn is composed of garbage collected heaps for the young and mature generations.
Back in main, a platform-specific configuration file is loaded, the “vm” is booted, the command line is loaded into ARGV, the kernel is loaded (i.e. the compiled versions of the files located in the kernel/ directory), the preemption and signal threads are started, and finally the compiled version of kernel/loader.rb is run, which will process the command line arguments, run scripts, start IRB, etc. When your script, IRB, -e command, etc. finish running, loader.rb finishes, main finishes, resources are cleaned up, and finally the process exits.
Whew. The point of this whirlwind tour is to illustrate that VM is a rather fuzzy concept, even though we have a class named VM. Now let’s take a look at how threads fit in.
Rubinius has a 1:1 native thread model. In other words, each time you do Thread.new in your Ruby code, the instance returned maps to a single native thread. In fact, let’s look at the code for Thread.new in kernel/common/thread.rb.
1 2 3 4 5 6 7 8 9 |
class Thread def self.new(*args, &block) thr = allocate thr.initialize *args, &block thr.fork return thr end end |
The calls to allocate and fork are implemented as primitives in C++ code. They are short, so we’ll take a look at them, too.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
Thread* Thread::allocate(STATE) {
VM* vm = state->shared.new_vm();
Thread* thread = Thread::create(state, vm);
return thread;
}
Object* Thread::fork(STATE) {
state->interrupts.enable_preempt = true;
native_thread_ = new NativeThread(vm);
// Let it run.
native_thread_->run();
return Qnil;
} |
The call to allocate creates a new instance of VM as thread local data. The call to fork creates the new native thread. The call to native_thread_->run() will eventually call the __run__ method in kernel/common/thread.rb. Something to note about this snippet of C++ code is the nice consistency between the primitives and the Ruby code that calls them.
We’ve encountered the VM class in two contexts: 1) when starting up the Rubinius process, and 2) when creating a new Thread. We can consider the VM instance to be an abstraction of the state of a single thread of execution, and in fact, state is the name most often given to an instance of VM in the Rubinius source.
As we’ve seen, Rubinius as a running process is composed of various abstractions, including the Environment, SharedState, NativeThread, and VM to name a few. While it is accurate to call Rubinius a virtual machine, it is apparent that concept can cover a fair bit of complexity. But breaking it into parts makes it fairly easy to understand. Let us know what things you’d like to understand better. We have the doc/ directory in the source that we’re (slowly) building out. If you’re interested in contributing, docs would be a great way to help everyone.
Come to the Open Source Bridge conference
If you live in or would like to visit beautiful Portland, OR, consider signing up for the Open Source Bridge conference. I will (probably) be giving a talk on RubySpec and Rubinius 1.0. There’s lots of interesting folks giving great talks. This is an opportunity to hear how people are developing the open-source community.
Hope to see you there!
What is RubySpec?
According to the folks over at http://rubyspec.org, “RubySpec is a project to write a complete, executable specification for the Ruby programming language.” As with any sufficiently concise summary, there’s some opportunity for misunderstanding here, so let’s explore a few aspects of this definition.
Since this post is a bit long, here’s a summary:
- There is one standard definition of Ruby.
- RubySpec benefits the whole Ruby ecosystem.
- RubySpec does not guarantee that program A will run on implementation B.
- RubySpec includes only specs for the correct behavior when bugs are discovered in MRI.
- Help your favorite Ruby implementation by contributing to RubySpec.
First, what does it mean to be “the Ruby programming language”? The answer to this question used to be a little simpler before Ruby 1.9. Originally, there was the single Ruby implementation that Matz and friends built, referred to as MRI (Matz’s Ruby Interpreter). Now there is also RubyVM or KRI (Koichi’s Ruby Implementation) or simply Ruby 1.9. Either way, MRI 1.8.x and [KM]RI 1.9.x are the standard or reference implementations for Ruby. Everyone else is making an alternative implementation that either complies with the standard or deviates from it.
This is the way RubySpec is written. I realize that it’s possible to consider “the Ruby programming language” to be an abstract thing and all the Ruby projects as merely more or less equal attempts to implement the language. I won’t try to convince you that either way of viewing this is more or less correct. I just want to be clear about the way RubySpec views it.
At the time I began writing what eventually became RubySpec, the only Ruby implementation in wide-spread use was MRI. My biggest concern when getting involved with Rubinius was that the project would be consistent with Ruby the way Matz defines it and not cause fragmentation of the language. My reason was quite selfish. I loved programming in Ruby and I wanted to see the language thrive.
So, RubySpec’s over-arching value proposition is a single, comprehensive definition of the Ruby programming language. To see if the value proposition is actually universal, let’s examine the three categories of people involved with Ruby: consumers, programmers, and implementers.
I define consumers as anyone who depends on a product or service written in Ruby. Consumers may use these products and services directly, or they may own or work for companies that provide them, or they may use products and services that are themselves supported by software written in Ruby. Consumers are the biggest part of the Ruby ecosystem.
Interacting closely with the consumers are programmers, the men and women who write software or frameworks in Ruby. In fact, the same folks may be both consumers and programmers.
The implementers are the people writing Ruby implementations for the programmers and consumers. They want to experiment with ways to better support programmers without worrying that they are breaking their programs in unknown ways. Again, the implementers may be programmers or consumers themselves.
If you’ve ever used a proprietary implementation of a programming language or know what vendor lock-in means, then I am preaching to the choir. If not, then consider that system requirements change, hardware changes, services change, customers change, development teams change, everything changes.
Consumers want assurance that the products and services on which they depend will remain available and will grow with their needs. Programmers want assurance that they will be able to meet their customers’ needs. Implementers want to provide programmers the ability to do so. A single, consistent Ruby language really is a win-win-win situation.
With everything seeming so sunny, one may be tempted to think: “If implementation A and B perform the same on RubySpec, then a program running on A now should run just fine on B.” Unfortunately, it is not quite that simple. Just as a program may not run on both OS X and Linux simply because MRI runs on both. Once in a while, RubySpec finds bugs in MRI, though not that often considering the tens of thousands of expectations. Since a lot of the code in MRI has been around for years, this illustrates that even running thousands of programs (RubySpec is just another program) is no guarantee that there are not bugs lurking around the corner.
What can tell you whether a program will (likely) run on both A and B is the program’s own test suite. In this regard, the program’s test suite and RubySpec are complementary. If the tests discover something that RubySpec did not, it is an opportunity to enhance RubySpec. If running on another implementation expose issues not uncovered by the program’s tests, it is an opportunity to enhance the tests.
Ruby versioning is complex affair and another area where possible misunderstandings exist about RubySpec. It seems pretty simple that Ruby 1.8 and Ruby 1.9 are different. But then there is 1.8.7, which is like 1.8.6 and 1.9. And there may be 1.8.8, which will likely be different than 1.8.6 and 1.8.7 and 1.9. It is dizzying. RubySpec handles the different versions by using the ruby_version_is guard. (Read the guard documentation for full details.)
Generally, the version guards work fine. Each implementation provides the RUBY_VERSION constant and based on its value, the guards permit the correct specs to run. Some have assumed this means RubySpec will tell you that alternate implementation A is just like MRI version X.Y.Z “bugs and all” because A says it is version X.Y.Z.
The principles RubySpec strives for are correctness, clarity, and consistency. There is no way to provide clear and consistent results if RubySpec included specs for the wrong behavior as well as specs for the correct behavior. Either clarity or consistency suffer, badly. Matz is the one who ultimately determines whether a behavior is a bug or not. RubySpec simply includes specs for only correct behaviors.
These are some of the factors that complicate this issue:
- There is no definition of what is a bug. It may be a segfault, an incorrect value computed, a frozen state not set or respected, the wrong exception class raised. The list is endless and impossible to consistently categorize.
- All implementations, including MRI, have their own release processes and schedules.
- RubySpec is a social as well as technical project. An aspect of the value-added proposition for any given implementation is the quality that they provide. There is no way the alternative implementations will consistently agree to defer fixing bugs until MRI releases a fix. Rather than supporting cherry-picking which bugs to fix, RubySpec only includes correct specs.
- Bugs discovered by RubySpec in MRI are quite rare.
Finally, contributing to RubySpec is one of the lowest barriers-to-entry means of supporting your favorite Ruby implementation and the Ruby ecosystem as a whole.
Consider what the view looks like from the outside: Ruby has a vibrant community of implementations meeting consumers’ and programmers’ needs on virtually every significant platform, including on Java, .NET, Mac (Obj-C), and semi-platform-agnostic implementations like MRI and Rubinius. Internally, it means that Ruby programmers can focus more on writing their programs using the best tools for the job, confident that if requirements change they can move to a different platform with ease and confidence.
Check out the RubySpec docs if you are interested in helping out.
When describe'ing it ain't enough
One of the things I like about the RSpec syntax is that it packs a lot of information into a few concise, consistent constructs. It’s relatively easy to read through a spec file and pick out what I am looking for. The use of blocks both enable flexible execution strategies and provide simple containment boundaries.
Perhaps the most valuable aspect, though, is the ability to extend the RSpec syntax constructs easily and consistently. No need to grow a third arm here. In Rubinius, we recently encountered a situation needing some extra sauce when fixing our compiler specs.
A compiler can be thought of as something that chews up data in one form and spits it out in another, equivalent, form. Typically, these transformations from one form to another happen in a particular order. And there may be several of them from the very beginning to the very end of the compilation process.
To write specs for such a process, it would be nice to focus just on the forms of the data (that’s what we care about) with as little noise as possible about how they got there. Here’s what we have in Rubinius:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
describe "An And node" do relates "(a and b)" do parse do [:and, [:call, nil, :a, [:arglist]], [:call, nil, :b, [:arglist]]] end compile do |g| g.push :self g.send :a, 0, true g.dup lhs_true = g.new_label g.gif lhs_true g.pop g.push :self g.send :b, 0, true lhs_true.set! end end end |
The relates block introduces the Ruby source code and contains the blocks that show various intermediate forms. A single word like parse and compile encapsulates the process of generating that particular form, as well as concisely documenting the specs.
The format is sufficiently flexible to allow for other forms. For instance, ast for generating an AST directly from the parse tree rather than using the sexp as an intermediate form. Or llvm to emit LLVM IR directly from our compiler.
Another interesting aspect of this, it was possible with only a few custom extensions to MSpec. Recently, I had added custom options to the MSpec runner scripts to enable such things as our --gc-stats. I didn’t know how easy it would be to add something more extensive. Turns out it was pretty easy. You can check out the source in our spec/custom directory.
Boxers or Briefs? -- Neither?!
The emperor is wearing clothes and everything looks hunky-dory and sane on the outside. Usually, that’s a good thing. For instance, when running RubySpec on a released version of MRI, it’s good to know that things are behaving as expected and all known issues are accounted for. In other words, you won’t see any failures unless a spec is broken or a new spec has uncovered a bug.
$ mspec library/stringio/reopen_spec.rb ruby 1.8.6 (2008-03-03 patchlevel 114) [universal-darwin9.0] ........................ Finished in 0.021797 seconds 1 file, 24 examples, 59 expectations, 0 failures, 0 errors
While the above can be reassuring, it may not tell the whole story. RubySpec uses guards to control which specs are run. This enables the specs to accommodate differences in behavior due to varying platforms, versions, implementations, and bugs.
I’ve added a couple features to MSpec to enable discrete and not-so-discrete peeks under the robes, as it were. The first of these is akin to just yanking down the trousers. By passing the --no-ruby_bug option, all ruby_bug guards are disabled and the guarded specs are run.
$ mspec --no-ruby_bug library/stringio/reopen_spec.rb ruby 1.8.6 (2008-03-03 patchlevel 114) [universal-darwin9.0] ............FF.....FF.....FF...... 1) StringIO#reopen when passed [Object, Object] resets self's position to 0 FAILED Expected 5 to have same value and type as 0 ./library/stringio/reopen_spec.rb:117 ./library/stringio/reopen_spec.rb:110:in `all?' ./library/stringio/reopen_spec.rb:61 ---- snip ---- Finished in 0.022210 seconds 1 file, 34 examples, 76 expectations, 6 failures, 0 errors
If you cringe a little when blasted by a bunch of failures, don’t worry, So do I. For a more subtle examination, there is also the ability to run the specs and note which specs would have run but did not due to guards.
$ mspec --report library/stringio/reopen_spec.rb ruby 1.8.6 (2008-03-03 patchlevel 114) [universal-darwin9.0] ........................ Finished in 0.009809 seconds 1 file, 24 examples, 59 expectations, 0 failures, 0 errors, 10 guards 4 specs omitted by guard: ruby_bug #, 1.8.6.114: StringIO#reopen reopens a stream when given a String argument StringIO#reopen reopens a stream in append mode when flagged as such StringIO#reopen reopens and truncate when reopened in write mode StringIO#reopen truncates the given string, not a copy 6 specs omitted by guard: ruby_bug #, 1.8.7: StringIO#reopen when passed [Object, Object] resets self's position to 0 StringIO#reopen when passed [Object, Object] resets self's line number to 0 StringIO#reopen when passed [String] resets self's position to 0 StringIO#reopen when passed [String] resets self's line number to 0 StringIO#reopen when passed no arguments resets self's position to 0 StringIO#reopen when passed no arguments resets self's line number to 0
The guards are reported only if they have altered how the specs were run. Since the ruby_bug guard can only prevent specs from running on the standard implementation, MRI, those guards are not reported when running under JRuby, for instance.
$ mspec -t jruby --report library/stringio/reopen_spec.rb jruby 1.2.0RC1 (ruby 1.8.6 patchlevel 287) (2009-02-26 rev 9326) [i386-java] .................................. Finished in 0.257000 seconds 1 file, 34 examples, 76 expectations, 0 failures, 0 errors, 0 guards
So, if you are wondering what is going on with some specs for a particular library, you can get a quick peek using the --report option before digging into the spec files. There is also a --report-on GUARD option that allows you to narrow the focus of your peeking.