Virtual machine 4 separate areas of memory memP: program code memD: data/variables regs: (registers) immediate access short term data, 16 of them memS: strings instructions have three parts op: the action to be performed reg: register to use n: numeric value user sees human readable versions, but they are translated to ints before execution instructions: ldc r n regs[r] set to n - load constant ldm r n regs[r] set to memD[n] - load from memory ldi r n regs[r] set to memD[regs[n]] - load indirect (follow pointer) stm r n memD[n] set to regs[n] - store in memory sti r n memD[regs[n]] set to regs[r] - store indirect add r n regs[r] set to regs[r] + regs[n] sub r n regs[r] set to regs[r] - regs[n] mul r n regs[r] set to regs[r] * regs[n] div r n regs[r] set to regs[r] / regs[n] mod r n regs[r] set to regs[r] % regs[n] skeq r n skip over next instruction if regs[r] == regs[n] skne r n skip over next instruction if regs[r] != regs[n] sklt r n skip over next instruction if regs[r] < regs[n] skle r n skip over next instruction if regs[r] <= regs[n] skgt r n skip over next instruction if regs[r] > regs[n] skge r n skip over next instruction if regs[r] >= regs[n] jump r n jump so that evecution continues with memP[n] r is ignored but must be there to avoid ambiguity inn r wait for user to type an int, regs[r] set to it outn r print regs[r] outs r n print the string from memS[n] r is ignored but must be there to avoid ambiguity line r print \n r is ignored but left there for simplicity in processing halt r n stop running, display n as exit code some instructions don't use the n value, so it is OK for it not to be there. some instructions don't use r or n, so both may be absent. in an instruction that uses n but not r, the unused r must still be typed otherwise it becomes ambiguous: in "outs 3" is the 3 r or n? each line of program code can begin with: a space, in which case the whole line is taken to be an instruction # for a comment, just ignore the whole line blank lines are also totally ignored the word label, in which case the next thing on the line is a name that will be used by a skip or jump instruction the word data, in which case the next thing on the line is a name that will be used to refer to some memD[i], and the name will be followed by an initial value (int only) the word string, in which case it is followed by a number to give the string's position in memS. The rest of the line is the string to be stored in memS. first pass: read file, counting instructions and doing the actions required for label, data, and string. second pass: re-read the file, this time ignoring label, data, and string, just putting translated instructions into memP. The reason for two passes is the this kind of programming requires referring things before they have been defined. The first pass through the file just works out where all named things will be, then the second pass can convert names into numbers because it worked out where they are in the first pass Sample programs first. If you want to use these, you'll need to edit out the little comments that say what is happening at the ends of the lines. Add numbers between 1 and 10 # reg 1 for total, reg 2 for current position ldc 1 0 reg 1 = 0 ldc 2 1 reg 2 = 1 label a ldc 3 10 skle 2 3 skip if reg 2 <= 10 jump 0 end add 1 2 reg 1 += reg 1 ldc 3 1 reg 3 = 1 add 2 3 reg1 += 1 jump 0 a label end outs 0 s1 outn 1 print reg 1 the total line 0 halt 0 0 string s1 the total is more likely, use named variables: data total 0 data i 1 label a ldm 1 i reg 1 = i ldc 2 10 reg 2 = 10 skle 1 2 skip if i <= 10 jump 0 end therefore jump to end if i > 10 ldm 2 total reg 2 = total ldm 1 i reg 1 = i add 2 1 reg 2 = total + i stm 2 total total = total + i ldc 3 1 reg 3 = 1 add 1 3 reg 1 (still i) += 1 stm 1 i i == 1 jump 0 a label end outs s1 ldm 1 total outn 1 print the total line 0 halt 0 0 The translation process (passes 1 and 2) struct instr { int op, reg, n; }; const int max_memP = 1000 instr memP[max_memP]; const int max_memD = 1000 int memD[1000]; int regs[16]; const int max_memS = 100 string memS[max_memS] int pos_memP = 0, pos_memD = 0, pos_memS = 0; struct valpair { string name; int val } vector<valpair> labels, strings, varnames; first pass read each line instruction pos_memP += 1 but otherwise ignore it label sss check "sss" not already used add valpair("sss", pos_memP) to labels string abc this is it check "abc" not already used add ("abc", pos_memS) to strings memS[pos_memS] = "this is it" pos_memS += 1 data name 99 check "name" not already used add ("name", pos_memD) to varnames memD[pos_memD] = 99 pos_memD += 1 second pass to start the input file anew, no need to close and re-open it, just fi.clear(); then fi.tellg(0); to go back to the beginning of file fi. pos_memP = 0 label or string or data ignore instruction get 3 parts op string, reg num, optional N as string p = position of op in array ops if N is numeric, convert to int intN if X is string find it in one of the valpair arrays set intN to its associated int value create instr struct with (p, reg, intN) memP[pos_memP] = that struct pos_memP += 1 string ops[] = { "ldc", "ldm", ...., halt" } const int i_ldc = 0, i_ldm = 1, i_ldi = 2, .... ADDED --------- I wnated to reduce typeing, butthose two lines above are too likely to cause difficult errors, this way is safer: struct valpair // as seen in class { int v; string s; }; const int i_ldc = 0, i_ldm = 1, i_ldi = 2, i_halt = 3; // only 4 instructions just to reduce typing // this way it will be blindingly obvious if a string has the wrong value associated with it. valpair ops[] = { { i_ldc, "ldc" }, { i_ldm, "ldm" }, { i_ldi, "ldi" }, { i_halt, "halt" } }; const int num_ops = sizeof(vps) / sizeof(vps[0]); to convert instruction name (string q) to int representation loop with i from 0 so long as i < num_ops if ops[i].s == q // ops[i].v is the numeric version if not found by end of loop, error END OF ADDITION --------- to run program pos_memP = 0 while (true) instr i = memP[pos_memP] pos_memP += 1 if i.op == i_ldc regs[i.reg] = i.n else if i.op == i_ldm regs[i.reg] = memD[i.n] else if i.op == i_skgt { if regs[i.reg] > regs[i.n] pos_memP += 1 } ... etc etc etc for the other instructions ... else if i.op == i_halt { cout << "\nexitted with code " << i.n << "\n"; break } how to process program lines easily: #include <sstream> ifstream fi(.... string line; while (...) { getline(fi, line); if fi.fail() break // its the end of the file if line.length() == 0 || line[0] == '#' continue // its a comment bool its_an_instruction = line[0] == ' '; istringstream si(line); in loop string s; si >> s; if (si.fail()) at end of line for string abd hello there second si >> s sets s to "abc" getline(si, s) sets s to rest of line Another sample program, printing a 12 x 12 multiplication table neatly I am using // to make comments fit on the same line as an instruction This is not a requirement, but fairly easy to do data row 1 data column 1 label row_loop ldm 1 row ldc 2 12 skle 1 2 jump 0 end // row > 12 ldc 1 1 stm 1 column // column = 1 label column_loop ldm 1 column ldc 2 12 skle 1 2 jump 0 end_of_row // column > 12 ldm 1 row ldm 2 column mul 1 2 // reg 2 = row x column outs 0 space // keep the numbers separate ldc 2 100 skge 1 2 // less than 100 needs extra space to line up outs 0 space ldc 2 10 skge 1 2 // less than 10 needs another extra space outs 0 space outn 1 // print the row x column value ldm 1 column ldc 2 1 add 1 2 stm 1 column // column += 1 jump 0 column_loop label end_of_row line 0 ldm 1 row ldc 2 1 add 1 2 stm 1 row // row += 1 jump 0 row_loop label end halt string space _ // just to make the space visible ADDED --------- You do not have to do this, but it is quite easy and will produce a more satisfying product nothing so far allows anything like a function to be called. All we need is four more instructions and another memory area. memF is an array of ints that will act as a stack pos_memF an int, initially 0, that will act as the stack pointer instructions: call r n memF[pos_memF] = pos_memP; pos_memF += 1; pos_memP = n; ret r pos_memF -= 1; pos_memP = memF[pos_memF]; remember that pos_memP says which instruction is to be executed next so the call r n instruction (it ignores r) pushes pos_memP onto the stack then behaves like jump n. n would be the label for a function at the end of a function (or anywhere a return is wanted) just put a ret r instruction (it also ignores r). That pops pos_memP from the stack, making ther next instruction to be executed be the one after the call instruction. We have jumped back to where we came from. all variables are global variables here, so we can't be very sophisticated, but it can still work. here's an example function that sets the variable m to the smallest out of a and b, it is preceeded by a demonstration call to it' data a 0 data b 0 data m 0 ldc 1 99 stm 1 a ldc 1 37 stm 1 b call 0 min ldm 1 m outn 1 line halt that little bit just gives a and b initial values of 99 and 37 then calls the min function, then it just prints m so we can see that it is indeed 37 the smallest. label min ldm 1 a ldm 2 b stm 2 m skle 1 2 stm 1 m ret 0 it gets a and b into registers 1 and 2 ready to be compared, but simplifies things by setting m equal to b. If b does turn out to be the smallest, the job is done so we skip the next instruction to reach the ret. If a is the smallest, the skip won't happen and m is overwritten with a it is fairly easy to get the effect of local variables and parameters too. That is why there are four new instructions not just two: push r memF[pos_memF] = regs[r]; pos_memF += 1; pop r pos_memF -= 1; regs[r] = memF[pos_memF]; these also push and pop things on and off the new stack, but this time it is just a register, not a program code position. They allow us to use the stack to save values for later. Every time there is about to be a function call with parameters, push the current values of the parameter variables. Then those can safely be overwritten with the new values the parameters should have. Immediately after a function call just pop the same variables (but in reverse order) and everything is back the way it was. While the function was running it could use the parameter variables freely, knowing that the later pop will undo all changes, just like local variables. Here is an example, the well known recursive function for printing the digits of an int separately, normally written as void digits(int a) { if (a > 9) digits(a / 10); cout << " "; cout << a % 10; } first the "main program", it calls digits(3579864) be first saving the one parameter a on the stack, then setting a to 3579864 and calling digits. When the call is complete it recovers the original value of a from the stack so that everything else can continue to work with the unchanged original value. data a 0 ldm a push 1 ldc 1 3579864 stm 1 a call 0 digits pop 1 stm 1 a line halt now the digits function. It knows it can do whatever it wants with a because the caller will have saved it. First see if a > 9, and if it isn't, jump over the recursive call. The recursive call is just like any other, it has to save the current value of a on the stack before computing the new value, so ldm and push. Then it can safely do the division by 10, update the parameter a, and make the recursive call. When the call returns restore a as before. A space is printed just so that we can see the number really has been split into its individual digits. The next part, done regardless of whether a recursive call happend or not, is just calculating a % 10 and printing it. label digits ldc 2 9 skgt 1 2 jump 0 recn_done ldm 1 a push 1 ldc 2 10 div 1 2 stm 1 a call 0 digits pop 1 stm 1 a outs 0 space label recn_done ldm 1 a ldc 2 10 mod 1 2 outn 1 ret 0 string space _ remember that you are not required to do this additional part, but the results will almost certainly be more personally satisfying if you do. END OF ADDITION ---------