Using the Compiler
First, you don't need to use the compiler at all.
You can do everything by writing programs for your emulator
in its assembly language. The compiler is just provided
to make experiments easier.
Secondly, you certainly don't have to understand how the compiler
works. It is just there as a tool.
The compiler consists of three files: compiler.cpp,
defs.cpp, and defs.h, all in the directory
/home/demo which is accessible to everyone on rabbit,
just copy (cp) those files into your own directory,
compile them, and you're ready to go. That directory also contains
a file called emu, which is a "reference" emulator,
executable only. If you're unsure how I intended certain
instructions to behave, you can use that emulator to find out.
Just run it, and enter the command "help" to see what it can do.
Your emulators don't have to behave or look exactly like this reference one.
Compile the compiler like this:CC -c defs.cpp
CC -c compiler.cpp
CC compiler.o defs.o -o cmpl
Then you will be able to use the command cmpl to
run the compiler.
The compiler does not support all of C, but a reasonable subset
useful for the kinds of experiments you are most likely to be doing.
I called the language C-- ("C minus minus") because it had to
be called something.
cmpl filename | compiles the C-- program
in the named file. If the filename ends with .cmm,
you don't have to type the .cmm, it is assumed.
|
cmpl -p file | Print, with proper indentation
and extra parentheses, the program while compiling it. This
is useful if you think the compiler isn't seeing the program the way you intended
it.
|
cmpl -l file | Shows the assembly code instructions being
produced.
|
cmpl -n file | This option prevents the use of labels.
The compiler works out the exact addresses, and puts them in numeric form
in the assembly code. |
Multiple options may be combined, e.g. cmpl -lp filename.
The compiler produces two output files, both with names based on the name of the input file.
If the input file is called file.cmm, the output files will be
called file.ass and file.exe. The .exe file is not very useful
to you. It contains the fully assembled version of the program (i.e. binary
executable codes), with a few numbers describing where in memory it should
be loaded. Your emulators don't have to understand .exe files. The reference
emulator is happy with them, the load command is used to load
a .exe file for execution.
The .ass file contains an assembly language program ready for
processing by your emulator. In it, semicolons ';' are used to start
comments: your emulators should ignore everything on any input line after a semicolon.
The .ass file contains each C-- statement from the original program as a comment
immediately before the assembly code that it produced. This is to make
it easier for you to follow the program.
The .ass file also makes use of two Pseudo-Ops, things that look like
assembly code instructions, but are really commands to the assembler. Both of them
have names beginning with a dot '.' to make them stand out.
.LOC N tells the assembler that the next instruction should be
stored in memory location N. Programs usually begin with .LOC 512
to ensure that page 0 is not used.
.DATA N tells the assembler to store the number N in the next memory
location, instead of an assembled instruction. This is used to initialise
variables. If a program uses memory location 1025 for a variable, at the end
of the assebly code there will be the sequence .LOC 1024;
.DATA 0.
The C-- Language
The language that the compiler understands is a subset of C, with a few simple
additions to support shared memory and multiple processes. The basic statements
are: Assignment, If-Else, While, For, Print, Read.
Assignments, If-Elses, Whiles, and Fors are exactly as they are in normal C.
Expressions used may consist of variables, integers, character constants,
parentheses, the following binary operators: (+ - * / == != < > <=
>= && ||), and the following unary operators: (+ - ! &). The
following assignment operators are also allowed: (= += -= *= /=). The &
operator is only provided so that you can print the address of a variable;
pointers are not supported.
Print has been made into a simple statement, to avoid the complexities
of printf formats. After the keyword "print" there may be a list of values
separated by commas and terminated by a semicolon. The values may be variables,
constants, expressions, or strings. Strings are surrounded by quotes (") in the
normal way; The print statement is the only place where strings are used
in C--. Inside strings ("x") or character constants ('x'), the following
escape characters are understood: (\n \r \t).
Read has also been made into a simple statement to avoid the complexities
of scanf. After the keyword "read" there may be a list of one or more variable
names, separated by commas and terminated by a semicolon.
A group of statements may be surrounded by curly brackets { } to make a single
compound statement exactly as in C.
As in C, all variables must be declared before they are used. As C-- does not
have functions, there is no such thing as a global variable. A declaration statement
consists of "int" or "char" followed by a list of variable names, separated by commas
and terminated by a semicolon. Just like in C, except that initialisation is not allowed.
There are no arrays.
Generally, a C-- program just consists of a sequence of statements. They do not need to be
enclosed in { and }, but can be.
Sample Programs
These sample programs do not use shared memory or multiple processes, which are described next.
A program to print out an un-aligned, ugly 12x12 multiplication table:
int a, b;
for (a=1; a<=12; a+=1)
{ for (b=1; b<=12; b+=1)
print ' ', a*b;
print '\n'; }
This is the output it produces: 1 2 3 4 5 6 7 8 9 10 11 12
2 4 6 8 10 12 14 16 18 20 22 24
3 6 9 12 15 18 21 24 27 30 33 36
4 8 12 16 20 24 28 32 36 40 44 48
5 10 15 20 25 30 35 40 45 50 55 60
6 12 18 24 30 36 42 48 54 60 66 72
7 14 21 28 35 42 49 56 63 70 77 84
8 16 24 32 40 48 56 64 72 80 88 96
9 18 27 36 45 54 63 72 81 90 99 108
10 20 30 40 50 60 70 80 90 100 110 120
11 22 33 44 55 66 77 88 99 110 121 132
12 24 36 48 60 72 84 96 108 120 132 144
A program to calculate a factorial:int n, x, f;
f=1;
print "enter a number: ";
read n;
x=n;
while (x>0)
{ f=f*x;
x=x-1; }
print "the factorial of ", n, " is ", f, "\n";
and the output it producesenter a number: 7
the factorial of 7 is 5040
This is the contents of the .ass file produced
when the first sample (multiplication table) is compiled: .LOC 512
; int a, b;
; int a_0 at 1025
; int b_0 at 1026
; for (a=1; a<=12; a+=1) ...
LOADC 1
STORE 1025
L1:
; a<=12
LOAD 1025
CMPC 12
JPOS L2 ; = 537
; for (b=1; b<=12; b+=1) ...
LOADC 1
STORE 1026
L3:
; b<=12
LOAD 1026
CMPC 12
JPOS L4 ; = 531
; print ' ', a*b;
LOADC 32
OUT
LOAD 1025
MUL 1026
OUTD
; b+=1
LOAD 1026
ADDC 1
STORE 1026
; end of for
JUMP L3 ; = 519
L4:
; print '\n';
LOADC 10
OUT
; a+=1
LOAD 1025
ADDC 1
STORE 1025
; end of for
JUMP L1 ; = 514
L2:
EXIT 0
.LOC 1025
.DATA 0
.DATA 0
Multiple Processes
A program may define a number of processes. These are a bit like
normal C functions in appearance, and should come before the
statements of the main program. A process definition consists
of the keyword process followed by a name, followed by
a list of statements inside curly brackets { }. Defining a process
does not automatically execute the process, just as defining a function
doesn't automatically call that function.
A process may be started by using the fork function. Unlike
in unix, the fork function takes one parameter, which must be the name
of a previously defined process. The idea is that fork simply starts
up new time-shared process, exactly identical to the original one,
(with a private copy of all of its memory pages), but running the
named process. The original process continues executing the next
statement in the normal way.
Here is a sample of a program that uses multiple processes, but not
in a very useful way:process one
{ int i;
for (i=1; i<8; i+=1)
print i, '\n';
print "one done\n"; }
process two
{ int i;
for (i=100; i<700; i+=100)
print i, '\n';
print "two done\n"; }
int i;
fork(one);
fork(two);
for (i=20; i<28; i+=1)
print i, '\n';
And here is the output it produces when run:
1
20
100
212
3200
22
300
234
400
24
5
5006
25
7
600
26
one dont27
e
wo done
If you look carefully, you can see that the numbers 1, 2, 3, 4, 5, 6, 7
all appear in the right order, as do the numbers 100, 200, 300, 400, 500, 600,
and the numbers 20, 21, 22, 23, 24, 25, 26, 27. However, the numbers from
the three sequences are all interleaved in a random way, as are the
characters from the two strings.
Shared Memory
A variable declaration may be preceded by the sequence shared "name",
where name is a string of up to four characters used to identify a page of shared
memory. If two processes declare variables using the same share name, those variables
will be allocated in the same physical page, and can therefore be used to communicate
between the two processes.
This sample program is a misguided attempt at communications between two
processes:
process one
{ shared "pig" int square;
int i;
for (i=0; i<10; i+=1)
{ square=i*i;
sleep(); }
square=99; }
process two
{ shared "pig" int value;
sleep();
while (value!=99)
print value, '\n'; }
fork(one);
fork(two);
The variable 'square' in the first process and the variable 'value' in the second
process occupy the same locations in physical memory, and so are really just two
names for the same shared variable. The first process generates the squares
0, 1, 4, 9, 16, 25, 36, 49, 64, 81 one-by-one and puts them in the shared variable.
The second process prints the values it finds in that shared variable. The
problem is that there is no proper synchronisation between the processes. The
sleep() function simply generates a SLEEP emulator instruction,
which makes the current process surrender the rest of its turn on the CPU.
This output was produced by the program:0
1
4
9
9
16
25
25
36
36
49
49
49
64
81
81
99
As you can see, it contains the right numbers, but with repetitions, because there
is no way for the second process to know when a new number has been produced.
Semaphores
A semaphore is simply a shared variable that is used in a very special way, to
protect a critical section. The two operations that may be performed on a
semaphore are P() and V(). The compiler implements these functions as
SemaphoreP(s) and SemaphoreV(s), where s is the semaphore
variable itself.
The compiler provided to you compiles these two operations
deliberately incorrectly, because part of your assignment
is to work out the correct implementation for your emulator.
The following program is a corrected version of the previous program.
It uses a shared variable called ready to say whether or not there
is a new square waiting to be printed, and a semaphore to properly protect
accesses to that shared variable.process one
{ shared "pig" int s, square, ready;
int i;
for (i=0; i<10; i+=1)
{ while (ready) sleep();
semaphoreP(s);
square=i*i;
ready=1;
SemaphoreV(s); } }
process two
{ shared "pig" int s, value, ready;
int a;
while (1)
{ while (!ready) sleep();
semaphoreP(s);
a=value;
ready=0;
SemaphoreV(s);
print a, '\n'; } }
fork(one);
fork(two);
Note that read-only accesses to a shared variable (such as looking at ready
in the while loop conditions) do not necessarily need to be protected. This is the output
it would produce if the compiler produced the correct code for semaphoreP and
semaphoreV:0
1
4
9
16
25
36
49
64
81
But note that in this version process two would never terminate.
Correcting the Semaphore Code
For the sake of easy experimentation, I have made the part of the
compiler that is responsible for translating semaphoreP and semaphoreV
more-or-less understandable. To correct it, edit compiler.cpp, and
search for the function called compile_semaphore_p, and you will
see the following definitions. The function new_label is used
to prepare a label that can be used in jump instructions. The function
generate actually produces the assembly code: its first
parameter is the instruction's op-code, and the second is its operand
(zero if none), the third operand is the number of operands (0 or 1)
and the fourth is a letter indicating how the optional operand is
to be treated, 'N'=normally, 'L'=as a label.
Beside the functions, I have included a sample of the assembly language
produced, so you can see what does what, and work out how to correct
the functions:void compile_semaphore_p(int semaphore_address)
{ int again=new_label(); // ; set aside L1 as label 'again'
int cont=new_label(); // ; set aside L2 as label 'cont'
generate(I_TAS, semaphore_address, 1, 'N'); // TAS s ; s is the sempahore
define_label(again); //L1:
generate(I_ADDC, 1, 1, 'N'); // ADDC 1
generate(I_JNEG, cont, 1, 'L'); // JNEG L2
generate(I_JUMP, again, 1, 'L'); // JUMP L1
define_label(cont); } //L2:
void compile_semaphore_v(int semaphore_address)
{ generate(I_LOADC, 3, 1, 'N'); // LOADC 3
generate(I_ADD, semaphore_address, 1, 'N'); // ADD s
generate(I_HALT, 0, 0, 'N'); } // HALT
For debugging purposes, the following little program
allows us to see exactly the code produced for these functions.
Don't try to run this code, just look at the .ass
file.int s;
print 97;
semaphoreP(s);
print 98;
semaphoreV(s);
print 99;
The print statements are just there to clearly
separate the program into the P and V parts. This is the assembly
code produced, you can clearly see how it matches the two compiler
functions shown above: .LOC 512
; int s;
; int s_0 at 1025
; print 97;
LOADC 97
OUTD
; semaphoreP(s);
TAS 1025
L1:
ADDC 1
JNEG L2 ; = 518
JUMP L1 ; = 515
L2:
; print 98;
LOADC 98
OUTD
; semaphoreV(s);
LOADC 3
ADD 1025
HALT
; print 99;
LOADC 99
OUTD
EXIT 0
.LOC 1025
.DATA 0