Welcome to CoCo, a virtual machine based on the Python virtual machine. This virtual machine was implemented in C++. Its was designed for educational purposes for learning C++, assembly language, and higher level language implementation. The entire source for CoCo is available on GitHub free of charge. You may download it by cloning the github repository or by downloading a zip file of the source. A README accompanies the source code repository to tell you how to compile the program. Precompiled binaries for the three major platforms are provided below if you do not have a compiler installed on your computer.
You can download CoCo's complete source code from
http://github.com/kentdlee/CoCo.git
Once you have downloaded, you can compile the code by running
./configure make
If for some reason, that does not work for you, you can rebuild the configure and make files with the rebuild script. However, you will have to have make utilities installed to rebuild. To see if you have the right utilities try this.
./rebuild ./configure make
If that does not work, you can brute force the compile, assuming you have a C++ compiler installed, with this command.
g++ -g -o coco *.cpp
If you are going to be working with the source code at all, the above command is not desirable since it will recompile everything every time you make a change and recompile. This can take a little time and could slow down development. It would be better to use make if at all possible.
If you don't have a C++ compiler and just want to try CoCo, you can download a precompiled binary of coco by clicking the links below. You are still encouraged to get the source code as a resource for learning C++.
CoCo includes a disassembler that will disassemble certain Python programs into a CoCo assembly language program. To use the disassembler you must have installed Python 3.2 and run the disassembler with Python 3.2. Usually, if you have version 3.2 installed, you can run it by typing python3.2 at the command-line. Your Python program must have a main function with no arguments where execution will begin. The Python program to disassemble may have as many functions as desired, but to be in CoCo format, classes are not supported.
You can download the disassembler from here. It is also in the tests subdirectory of the CoCo source code distribution.
CoCo implementation detail: Bytecode is an implementation detail of the Python interpreter. No guarantees are made that bytecode will not be added, removed, or changed between versions of Python. For this reason, CoCo is specific to Python 3.2. To use the disassembler that is distributed with this software package, you must have Python 3.2 installed on your system. Use of the disassembler.py module should not be considered to work across Python VMs or Python releases.
You can typically install Python 3.2.5 from Linux distributions or MacPorts. If you don't find it under your platform's software management, it can be downloaded from python.org at
http://www.python.org/download/releases/3.2.5/
There are Windows and Mac installers available on this page. For Linux you can download the source and compile and install it.
CoCo itself is not dependent on Python 3.2. Only the dissassembler is. However, it is highly recommended that you have Python 3.2 installed so you can use the disassembler to learn CoCo assembly language.
CoCo, like Python, is a virtual machine (i.e. an interpreter) for bytecode instructions. CoCo is written in C++ using object-oriented principles and does not store its instructions in actual bytecode format. Instead, it reads an assembly language file and builds an internal representation of the program as a sequence of functions each with their own sequence of ByteCode objects.
Each function of a CoCo program, when called, results in a new stack frame object being created. A stack frame contains all local variables that are defined in the function. Each stack frame also contains two stacks, an operand stack and a block stack.
The operand stack is used by the virtual machine to store all intermediate results of instruction execution. Operands are popped from the operand stack when an instruction is executed possibly leaving its result on the operand stack for use by a subsequent instruction. This style of computation has been in use a long time, from Hewlett Packard mainframe computers of the 1960's through the 1980's to calculators made by Hewlett Packard still today. The Java virtual machine is a stack machine as is the Python virtual machine.
The other stack is a block stack. The block stack keeps track of exit points for blocks of code within a CoCo program. When a loop is entered, the exit point of the loop is pushed onto the block stack. In this way, if a break instruction is executed inside a loop, the exit point of the loop can be found and the execution of the break instruction will jump to that address. Exception handlers also push the address of the handler onto the block stack. If an exception occurs, execution jumps to the exception handler by popping the address from the block stack. When a loop or try block is exited, the corresponding block stack address is popped from the block stack.
A program counter (abbreviated PC) is responsible for holding the address of the next instruction to be executed. The machine proceeds by fetching an instruction from the code, incrementing the PC, and executing the fetched instruction. Execution proceeds this way until a RETURN_VALUE instruction is executed or an exception occurs.
When an exception occurs, if no matching exception handler is found, execution of the function terminates and control is returned to the previously called function where the exception continues to propagate back until a matching exception handler is found. If no matching handler is found, the complete traceback of the exception is printed.
If no exception occurs during the running of a program, execution terminates when the main function executes the RETURN_VALUE instruction.
CoCo is a full-fledged virtual machine. It reads a source file in CoCo assembly format and interprets the instructions. CoCo is a interpreter for Python assembly language instructions. In addition, included with CoCo is a Python disassembler that diassembles Python programs into CoCo format. With these two tools together you can learn a lot about not only the syntax of casm files (the extension understood as CoCo assembly language) but also the Python language itself. Two examples below describe the usage of CoCo and the associated disassembler.
Consider the following Python program. The disassembler is a module found in the tests subdirectory of the source code distribution or you can download it here. To disassemble correctly, Python programs must contain a series of functions including a main function. The disassembler is imported at the top. Instead of calling the main function, the disassemble function of the same module is called on each function.
import disassembler
def main():
x = 5
y = x + 5
print(y+5)
disassembler.disassemble(main)
Calling the disassembler on the main function in the example above produces this output file.
Function: main/0
Constants: None, 5
Locals: x, y
Globals: print
BEGIN
LOAD_CONST 1
STORE_FAST 0
LOAD_FAST 0
LOAD_CONST 1
BINARY_ADD
STORE_FAST 1
LOAD_GLOBAL 0
LOAD_FAST 1
LOAD_CONST 1
BINARY_ADD
CALL_FUNCTION 1
POP_TOP
LOAD_CONST 0
RETURN_VALUE
END
In the code above there is one function called main. The assembly indicates main has 0 formal parameters. Constants that are used in the code include None and 5. There are two local variables in the function: x and y. The global print function is called and so is in the list of globals. Every function in CoCo has at least these categories of identifiers and values within each defined function. Sometimes one or more of these categories may be empty and can be omitted in that case.
The instructions of the code follow the begin keyword and preceed the end keyword. LOAD_CONST 1 means to load the constant value at index 1 (zero based) of the constants onto the operand stack. CoCo is a stack machine and therefore all operations are performed with operands pushed and popped from the operand stack.
The STORE_FAST instruction stores a value in the locals list, in this case at offset 0, the location of x. LOAD_FAST does the opposite, pushing a value on the operand stack from the locals list of variables. BINARY_ADD pops two elements from the stack and adds them together, pushing the result. CALL_FUNCTION pops the number of arguments specified in the instruction (1 in this case) and then pops the function from the stack. Finally, it calls the popped function with the popped arguments. The result of the function call is left on the top of the operand stack. In the case of the print function, None is returned and left on the stack. The POP_TOP instruction pops the None from the stack and discards it only to have this function push a None on the stack just before returning. RETURN_VALUE pops the top argument from the operand stack and returns that value to the calling function.
To run this code, make sure that you have the coco execuatable in your path some place. Then you can execute the following code to try this example.
cd tests python3.2 test1.py > test1.casm coco test1.casm
CoCo is capable of handling complex functions that may be nested inside one another. It can handle functions that return functions and functions that take other functions as parameters. Consider the following Python program.
# If 1 2 3 4 is entered, and 5 for the second prompt # then the answer should be [6, 13, 20, 27] from disassembler import * def main(): def g(aVal): def f(x): return aVal + lstInts[0] + x return f x = input("Please enter a list of integers: ") lst = x.split() lstInts = [] for y in lst: lstInts.append(int(y)) myFun = g(6) print(myFun(lstInts[2])) #main() disassemble(main)
Calling disassemble on main will disassemble any nested functions as well, so it is only necessary to call disassemble on all top-level functions. You can read the code above and the comments to understand how this program works. The disassembled code appears below.
Function: main/0 Function: g/1 Function: f/1 Constants: None, 0 Locals: x FreeVars: aVal, lstInts BEGIN LOAD_DEREF 0 LOAD_DEREF 1 LOAD_CONST 1 BINARY_SUBSCR BINARY_ADD LOAD_FAST 0 BINARY_ADD RETURN_VALUE END Constants: None, code(f) Locals: aVal, f FreeVars: lstInts CellVars: aVal BEGIN LOAD_CLOSURE 0 LOAD_CLOSURE 1 BUILD_TUPLE 2 LOAD_CONST 1 MAKE_CLOSURE 0 STORE_FAST 1 LOAD_FAST 1 RETURN_VALUE END Constants: None, code(g), "Please enter a list of integers: ", 6, 2 Locals: g, x, lst, y, myFun CellVars: lstInts Globals: input, split, append, int, print BEGIN LOAD_CLOSURE 0 BUILD_TUPLE 1 LOAD_CONST 1 MAKE_CLOSURE 0 STORE_FAST 0 LOAD_GLOBAL 0 LOAD_CONST 2 CALL_FUNCTION 1 STORE_FAST 1 LOAD_FAST 1 LOAD_ATTR 1 CALL_FUNCTION 0 STORE_FAST 2 BUILD_LIST 0 STORE_DEREF 0 SETUP_LOOP label02 LOAD_FAST 2 GET_ITER label00: FOR_ITER label01 STORE_FAST 3 LOAD_DEREF 0 LOAD_ATTR 2 LOAD_GLOBAL 3 LOAD_FAST 3 CALL_FUNCTION 1 CALL_FUNCTION 1 POP_TOP JUMP_ABSOLUTE label00 label01: POP_BLOCK label02: LOAD_FAST 0 LOAD_CONST 3 CALL_FUNCTION 1 STORE_FAST 4 LOAD_GLOBAL 4 LOAD_FAST 4 LOAD_DEREF 0 LOAD_CONST 4 BINARY_SUBSCR CALL_FUNCTION 1 CALL_FUNCTION 1 POP_TOP LOAD_CONST 0 RETURN_VALUE END
From the code above you can observe several things that are worth a little more explanation.
The syntax of the CoCo assembly language is pretty well illustrated by the two examples above. The complete syntax of the language is given here as a BNF. There are just a few things to note in the BNF below that weren't illustrated by either example above.
CoCoAssemblyProg ::= FunctionListPart EOF FunctionListPart ::= FunDef FunctionList FunctionList ::= FunDef FunctionList | NULL FunDef ::= Function colon Identifier slash Integer FunctionList ConstPart LocalsPart FreeVarsPart CellVarsPart GlobalsPart BodyPart ConstPart ::= NULL | Constants colon ValueList ValueList ::= Value ValueRest ValueRest ::= comma ValueList | NULL Value ::= None | Integer | Float | String (* the Scanner recognizes None as an Identifier *) LocalsPart ::= NULL | Locals colon IdList FeeVarsPart ::= NULL | FreeVars colon IdList CellVarsPart ::= NULL | CellVars colon IdList IdList ::= Identifier IdRest IdRest ::= comma IdList | NULL GlobalsPart ::= NULL | Globals colon IdList BodyPart ::= BEGIN InstructionList END InstructionList ::= NULL | LabeledInstruction InstructionList LabeledInstruction ::= Identifier colon LabeledInstruction | Instruction | OpInstruction Instruction ::= STOP_CODE | NOP | POP_TOP | ROT_TWO | ROT_THREE | ... OpInstruction ::= OpMnemonic Integer | OpMnemonic Identifier OpMnemonic ::= LOAD_CONST | STORE_FAST | SETUP_LOOP | COMPARE_OP | POP_JUMP_IF_FALSE | ...
CoCo supports the following types within the language.
One of the powerful features of the Python language results from methods being looked up on objects at run-time. This means that new types of objects can easily be added to the language because the virtual machine instructions presented below will polymorphically call the proper methods since lookup happens at run-time. In support of this, CoCo, like Python, has what have traditionally been called magic methods. These methods typically begin and end with two underscores. Magic methods are used by instructions as needed. For instance, the __add__ magic method is used by the BINARY_ADD instruction.
CoCo includes support for all the magic methods that are defined by Python. While support is there for the whole list, not all magic methods are implemented on each type of object. The magic methods that are supported by a type of object are controlled by its type or class definition. When a magic method is called, the magic method is first looked up on the type and if it is supported, the call is made. Otherwise, an IllegalOperationException is raised. The use of magic methods is illustrated below in the descriptions of the CoCo instructions.
The possible magic methods include the following: __cmp__, __eq__, __ne__, __lt__, __gt__, __le__, __ge__, __pos__, __neg__, __abs__, __invert__, __round__, __floor__, __ceil__, __trunc__, __add__, __sub__, __mul__, __floordiv__, __div__, __truediv__, __mod__, __divmod__, __pow__, __lshift__, __rshift__, __and__, __or__, __xor__, __radd__, __rsub__, __rmul__, __rfloordiv__, __rdiv__, __rtruediv__, __rmod__, __rdivmod__, __rpow__, __rlshift__, __rand__, __ror__, __rxor__, __iadd__, __isub__, __imul__, __ifloordiv__, __idiv__, __itruediv__, __imod__, __ipow__, __ilshift__, __iand__, __ior__, __ixor__, __int__, __long__, __float__, __bool__, __cmplex__, __oct__, __hex__, __index__, __coerce__, __str__, __list__, funlist__, __repr__, __unicode__, __format__, __hash__, __nonzero__, __dir__, __sizeof__, __getattr__, __setattr__, __delattr__, __getattribute__, __len__, __getitem__, __setitem__, __delitem__, __reversed__, __contains__, __missing__, __instancecheck__, __subclasscheck__, __call__, __copy__, __deepcopy__, __iter__, __next__, __type__, __excmatch__. The last two magic methods are specific to CoCo. The __type__ magic method is called when the type function is called on an object. The __excmatch__ magic method is called when matching an exception in an exception handler.
In addition, some objects have additional methods defined on them that are accessed like traditional method calls on objects. For instance, str objects have a split method that can be called to split a string on separator characters. Here is the list of attr methods defined in CoCo: split, append, head, tail, concat. The head and tail methods are not found in Python but are defined in CoCo to support funlist objects which are defined to have a head and a tail.
CoCo supports the following globally available built-in functions.
CoCo implements the following instructions. These instructions are a subset of the instructions supported by the Python Virtual Machine plus a few extra instructions. There are two minor differences from the Python Virtual Machine.
Other than these two minor differences, the implementation of the instructions is pretty faithful to a subset of the Python Virtual Machine implementation. The full Python 3.2 instruction descriptions are available here.
Full Python 3.2 Instruction Definition
And here are the CoCo instruction descriptions. Again, this is a subset of the full Python 3.2 instruction set with the addition of a few extra instructions and a couple of minor differences.
In the instructions below TOS refers to the top element on the operand stack. TOS1 refers to the element on the operand stack that is second from the top. TOS2, and so on are similarly defined.
Implements TOS = TOS1 + TOS by making the call TOS.__add__(TOS1).
Implements TOS = TOS1 - TOS by making the call TOS.__sub__(TOS1).
Implements TOS = TOS1 // TOS by making the call TOS.__floordiv__(TOS1).
Implements TOS = TOS1 / TOS by making the call TOS.__truediv__(TOS1).
Loads the global named co_names[namei] onto the stack.
Pushes a reference to the local co_varnames[var_num] onto the stack.
Stores TOS into the local co_varnames[var_num].
This instruction does nothing in CoCo which varies from the Python implementation. The purpose of this instruction seems to be implementation dependent. In the Python Virtual Machine it performs cleanup after an exception has occurred. The handling of exceptions is different in CoCo so this instruction exists to make it work with the disassembler, but it is ignored.
Loads the cell contained in slot i of the cell and free variable storage. Pushes a reference to the object the cell contains on the stack.
Stores TOS into the cell contained in slot i of the cell and free variable storage.
Pushes the contents of the tuple with count elements onto the operand stack. The count must match the tuple's size or an illegal operation exception will be thrown. The elments of the tuple are pushed so the left-most element is left on the top of the stack. This instruction is not part of the Python Virtual Machine. It is CoCo specific.
Works as BUILD_TUPLE, but creates a list.
Works as BUILD_TUPLE, but creates a list.
This instruction pushes the head and the tail (which is a funlist) onto the operand stack. The head of the list is left on the top of the operand stack. The tail is below it on the stack. This instruction is CoCo specific.
Pops two elements from the operand stack. TOS should be a funlist and TOS-1 should be an element. The instruction create a new funlist from the two pieces with TOS-1 the head and TOS the tail of the new list. It pushes this new list onto the operand stack. This instruction is CoCo specific.
Removes the top-of-stack (TOS) item.
Swaps the two top-most stack items.
Duplicates the reference on top of the stack.
Implements TOS=iter(TOS).
Terminates a loop due to a break statement.
Removes one block from the block stack. Per frame, there is a stack of blocks, denoting nested loops, try statements, and such.
Removes one block from the block stack. The popped block must be an exception handler block, as implicitly created when entering an except handler. In addition to popping extraneous values from the frame stack, the last three popped values are used to restore the exception state.
Terminates a finally clause. The interpreter recalls whether the exception has to be re-raised, or whether the function returns, and continues with the outer-next block.
Sets the Program Counter (PC) to target.
If TOS is true, sets the bytecode counter to target. TOS is popped.
If TOS is false, sets the bytecode counter to target. TOS is popped.
Set bytecode counter to target.
TOS is an iterator. Call its __next__() method. If this yields a new value, push it on the stack (leaving the iterator below it). If the iterator indicates it is exhausted TOS is popped, and the PC is set to target.
Pushes a try block from a try-except clause onto the block stack. target points to the first except block.
Pushes a try block from a try-except clause onto the block stack. target points to the finally block.
Pushes a new function object on the stack. TOS is the code associated with the function. The function object is defined to have argc default parameters, which are found below TOS.
Creates a new function object, sets its __closure__ slot, and pushes it on the stack. TOS is the code associated with the function, TOS1 the tuple containing cells for the closure’s free variables. The function also has argc default parameters, which are found below the cells.