The Inferno virtual machine, called Dis, has several unusual aspects to its design: the instruction set, the module system, and the garbage collector.
The Dis instruction set provides a close match to the architecture of existing processors. Instructions are of the form
OP src1, src2, dst
The src1 and dst operands specify general addresses or arbitrary-sized constants, while the src2 operand is restricted to smaller constants and stack offsets to reduce code space. Each operand specifies an address either in the stack frame of the executing procedure or in the global data of its module.
The types of operands are set by the instructions. Basic types are word (32-bit signed), big (64-bit signed), byte (8-bit unsigned), real (64-bit IEEE floating point), and pointer (implementation-dependent). The instruction set follows the example of CISC processors, providing three-operand memory-to-memory operations for arithmetic, data motion, and so on. It also has instructions to allocate memory, to load modules, and to create, synchronize, and communicate between processes.
A module is the unit of dynamically loaded code and data. Modules are loaded by a VM instruction that returns a pointer to a method table for the module. That pointer is managed by the VM's garbage collector, so code and data for the module are garbage collected like any other memory. Type safety is preserved by checking method types at module load time using an MD5 signature of the type.
Memory management is intimately tied to the instruction set of the VM. Dis uses a hybrid garbage collection scheme: most garbage is collected by simple reference counting, while a real-time coloring collector gathers cyclic data. Because reference counting is an exact rather than conservative form of garbage collection, the type of all data items must be known to the VM run-time system. For this reason, the language-to-VM compiler generates a type descriptor for all compound types. This descriptor reports the location of all pointers within the type, allowing the VM to track references as items are copied.
Compare this to the Java VM, whose instruction set makes it difficult to track references as objects are copied. This biases against reference counting, so JVM implementations choose lazier techniques such as mark-and-sweep, inducing a larger arena and delayed collection, both of which increase the memory use and therefore the cost of the overall system.
c = a + b;
An SM would execute this by a code burst such as this, which we have annotated with its memory traffic using L for load and S for store:
push a # LS push b # LS add # LLS store c # LS
The corresponding MM code burst would be the plain three-operand instruction
add a,b,c # LLS
When interpreting, the extra memory traffic of the SM is masked by the time saved by not decoding any operand fields. The operand fields are implicit in the SM instructions, while the MM they are explicit: three operand fields must be decoded in every instruction, even those without operands.
When compiling, the tradeoffs are different. Clearly, either design can produce the same native instructions from its just-in-time compiler (JIT), but in the SM case most of the work must be done in the JIT, whereas in the MM design the front end has done most of the work and the JIT can be substantially simpler and faster.
A JIT for an SM is forced to do most of the work of register allocation in the JIT itself. Because the types of stack cells change as the program executes, the JIT must track their types as it compiles. In an MM, however, the architecture maps well to native instructions. This produces a continuum of register allocation strategies from none, to simple mapping of known cells to registers, to flow-based register allocation. Most of the work of any of these strategies can be done in the language-to-VM compiler. It can generate code for an infinite-register machine, and the JIT can then allocate as many as are available in the native architecture. Again, this distribution of work keeps the JIT simple.
In other words, it is a better idea to match the design of the VM to the processor than the other way around.
Dis fits this criterion better, but we do not plan to implement Dis in silicon. The idea of a VM is to be architecture-independent; offering a special processor to run it negates the original goal by favoring one instruction set. Ignoring that for the moment, though, there could still be two reasons to consider designing silicon for Dis: performance and cost.
On performance, history shows that language-specific CPUs are not competitive. The investment in the special design takes energy away from the systems issues that ultimately dominate performance. Performance gains realized through language-specific support tend to be offset by parallel improvements in general-purpose processors during the life cycle of the CPU.
Dis compiles quickly into native code that runs only 30-50% slower than native C. At the current rate of processor improvement, that is only a few months of processor design time. It is wiser to focus on improving execution on commodity, general purpose processors than on inventing a new architecture.
The issue of cost is more subtle. Dis is close enough to familiar architectures that a special chip with high integration of systems facilities could be cost-effective on small platforms. The real reason for that, though, is that the memory management design of the virtual machine makes it easy to implement Dis in small memory. By contrast, whatever cost gains an integrated Java processor might realize will likely be lost in the extra memory required by its conservative garbage collection scheme [6].