Virtual machine

4 separate areas of memory
    memP: program code
    memD: data/variables
    regs: (registers) immediate access short term data, 16 of them
    memS: strings

instructions have three parts
  op:  the action to be performed
  reg: register to use
  n:   numeric value

user sees human readable versions, but they are translated to ints
before execution

instructions:

    ldc  r n     regs[r] set to n              - load constant
    ldm  r n     regs[r] set to memD[n]        - load from memory
    ldi  r n     regs[r] set to memD[regs[n]]  - load indirect (follow pointer)
    stm  r n     memD[n] set to regs[n]        - store in memory
    sti  r n     memD[regs[n]] set to regs[r]  - store indirect

    add  r n     regs[r] set to regs[r] + regs[n]
    sub  r n     regs[r] set to regs[r] - regs[n]
    mul  r n     regs[r] set to regs[r] * regs[n]
    div  r n     regs[r] set to regs[r] / regs[n]
    mod  r n     regs[r] set to regs[r] % regs[n]

    skeq r n     skip over next instruction if regs[r] == regs[n]
    skne r n     skip over next instruction if regs[r] != regs[n]
    sklt r n     skip over next instruction if regs[r] <  regs[n]
    skle r n     skip over next instruction if regs[r] <= regs[n]
    skgt r n     skip over next instruction if regs[r] >  regs[n]
    skge r n     skip over next instruction if regs[r] >= regs[n]

    jump r n     jump so that evecution continues with memP[n]
                 r is ignored but must be there to avoid ambiguity

    inn  r       wait for user to type an int, regs[r] set to it
    outn r       print regs[r]
    outs r n     print the string from memS[n]
                 r is ignored but must be there to avoid ambiguity
    line r       print \n
                 r is ignored but left there for simplicity in processing

    halt r n     stop running, display n as exit code

some instructions don't use the n value, so it is OK for it not to be there.
some instructions don't use r or n, so both may be absent.
in an instruction that uses n but not r, the unused r must still be typed
  otherwise it becomes ambiguous: in "outs 3" is the 3 r or n?

each line of program code can begin with:

  a space, in which case the whole line is taken to be an instruction

  # for a comment, just ignore the whole line
    blank lines are also totally ignored

  the word label, in which case the next thing on the line is a name
    that will be used by a skip or jump instruction

  the word data, in which case the next thing on the line is a name
    that will be used to refer to some memD[i], and the name will be
    followed by an initial value (int only)

  the word string, in which case it is followed by a number to give
    the string's position in memS. The rest of the line is the string
    to be stored in memS.

first pass:

  read file, counting instructions and doing the actions required for
  label, data, and string.

second pass:

  re-read the file, this time ignoring label, data, and string, just
  putting translated instructions into memP. The reason for two passes
  is the this kind of programming requires referring things before they
  have been defined. The first pass through the file just works out where
  all named things will be, then the second pass can convert names into
  numbers because it worked out where they are in the first pass
  
Sample programs first. If you want to use these, you'll need to edit out
the little comments that say what is happening at the ends of the lines.

  Add numbers between 1 and 10

    # reg 1 for total, reg 2 for current position

      ldc  1 0                  reg 1 = 0
      ldc  2 1                  reg 2 = 1
    label a
      ldc  3 10
      skle 2 3                  skip if reg 2 <= 10 
      jump 0 end
      add  1 2                  reg 1 += reg 1
      ldc  3 1                  reg 3 = 1
      add  2 3                  reg1 += 1
      jump 0 a
    label end
      outs 0 s1
      outn 1                    print reg 1 the total
      line 0
      halt 0 0
     
    string s1 the total is

  more likely, use named variables:

    data total 0
    data i 1

    label a
      ldm  1 i                  reg 1 = i
      ldc  2 10                 reg 2 = 10
      skle 1 2                  skip if i <= 10 
      jump 0 end                   therefore jump to end if i > 10
      ldm  2 total              reg 2 = total
      ldm  1 i                  reg 1 = i
      add  2 1                  reg 2 = total + i
      stm  2 total              total = total + i
      ldc  3 1                  reg 3 = 1
      add  1 3                  reg 1 (still i) += 1
      stm  1 i                  i == 1
      jump 0 a
    label end
      outs s1
      ldm  1 total
      outn 1                     print the total
      line 0
      halt 0 0

The translation process (passes 1 and 2)

struct instr { int op, reg, n; };

const int max_memP = 1000
instr memP[max_memP];
const int max_memD = 1000
int memD[1000];
int regs[16];
const int max_memS = 100
string memS[max_memS]

int pos_memP = 0, pos_memD = 0, pos_memS = 0;

struct valpair { string name; int val }
vector<valpair> labels, strings, varnames;

first pass

  read each line
    instruction
      pos_memP += 1
      but otherwise ignore it

    label sss
      check "sss" not already used
      add valpair("sss", pos_memP) to labels

    string abc this is it
      check "abc" not already used
      add ("abc", pos_memS) to strings
      memS[pos_memS] = "this is it"
      pos_memS += 1

    data name 99
      check "name" not already used
      add ("name", pos_memD) to varnames
      memD[pos_memD] = 99
      pos_memD += 1

second pass

  to start the input file anew, no need to close and re-open it, 
  just fi.clear(); then fi.tellg(0); to go back to the beginning of file fi.

  pos_memP = 0

  label or string or data
    ignore

  instruction
    get 3 parts op string, reg num, optional N as string
    p = position of op in array ops
    if N is numeric, convert to int intN
    if X is string
      find it in one of the valpair arrays
      set intN to its associated int value
    create instr struct with (p, reg, intN)
    memP[pos_memP] = that struct
    pos_memP += 1

string ops[] = { "ldc", "ldm", ...., halt" }
const int i_ldc = 0, i_ldm = 1, i_ldi = 2, ....

ADDED ---------
  I wnated to reduce typeing, butthose two lines above are too likely to cause difficult 
  errors, this way is safer:

  struct valpair      // as seen in class
  { int v;
    string s; };

  const int i_ldc = 0, i_ldm = 1, i_ldi = 2, i_halt = 3;   // only 4 instructions just to reduce typing

  // this way it will be blindingly obvious if a string has the wrong value associated with it.

  valpair ops[] = { { i_ldc, "ldc" }, { i_ldm, "ldm" }, { i_ldi, "ldi" }, { i_halt, "halt" } };
  const int num_ops = sizeof(vps) / sizeof(vps[0]);

  to convert instruction name (string q) to int representation
    loop with i from 0 so long as i < num_ops
      if ops[i].s == q
        // ops[i].v is the numeric version
    if not found by end of loop, error
END OF ADDITION ---------

to run program

  pos_memP = 0
  while (true)
    instr i = memP[pos_memP]
    pos_memP += 1

    if i.op == i_ldc
      regs[i.reg] = i.n
    else if i.op == i_ldm
      regs[i.reg] = memD[i.n]
    else if i.op == i_skgt
    { if regs[i.reg] > regs[i.n]
        pos_memP += 1 }
    ...
       etc etc etc for the other instructions
    ...
    else if i.op == i_halt
    { cout << "\nexitted with code " << i.n << "\n";
      break }


how to process program lines easily:    
        
  #include <sstream>

  ifstream fi(....
  string line;
  while (...)
  { 
    getline(fi, line);
    if fi.fail()
      break                                     // its the end of the file
    if line.length() == 0 || line[0] == '#'
      continue                                  // its a comment
    bool its_an_instruction = line[0] == ' ';

    istringstream si(line);

    in loop
      string s;
      si >> s;
      if (si.fail()) at end of line

    for string abd hello there
      second si >> s sets s to "abc"
      getline(si, s) sets s to rest of line


Another sample program, printing a 12 x 12 multiplication table neatly
   I am using // to make comments fit on the same line as an instruction
   This is not a requirement, but fairly easy to do

  data row 1
  data column 1

  label row_loop
    ldm  1 row
    ldc  2 12
    skle 1 2
    jump 0 end          // row > 12
    ldc  1 1
    stm  1 column       // column = 1
  label column_loop
    ldm  1 column
    ldc  2 12
    skle 1 2
    jump 0 end_of_row   // column > 12
    ldm  1 row
    ldm  2 column
    mul  1 2            // reg 2 = row x column
    outs 0 space        // keep the numbers separate
    ldc  2 100
    skge 1 2            // less than 100 needs extra space to line up
    outs 0 space
    ldc  2 10
    skge 1 2            // less than 10 needs another extra space
    outs 0 space
    outn 1              // print the row x column value
    ldm  1 column
    ldc  2 1
    add  1 2
    stm  1 column       // column += 1
    jump 0 column_loop
  label end_of_row
    line 0
    ldm  1 row
    ldc  2 1
    add  1 2
    stm  1 row          // row += 1
    jump 0 row_loop
  label end
    halt

  string space _        // just to make the space visible

ADDED ---------
  You do not have to do this, but it is quite easy and will produce a more
  satisfying product

  nothing so far allows anything like a function to be called. All we need
  is four more instructions and another memory area.

  memF      is an array of ints that will act as a stack
  pos_memF  an int, initially 0, that will act as the stack pointer

  instructions:
    call  r n     memF[pos_memF] = pos_memP; pos_memF += 1; pos_memP = n;
    ret   r       pos_memF -= 1; pos_memP = memF[pos_memF];

  remember that pos_memP says which instruction is to be executed next
  so the call r n instruction (it ignores r) pushes pos_memP onto the stack
  then behaves like jump n. n would be the label for a function

  at the end of a function (or anywhere a return is wanted) just put a ret r
  instruction (it also ignores r). That pops pos_memP from the stack, making
  ther next instruction to be executed be the one after the call instruction.
  We have jumped back to where we came from.

  all variables are global variables here, so we can't be very sophisticated,
  but it can still work. here's an example function that sets the variable
  m to the smallest out of a and b, it is preceeded by a demonstration call
  to it'

    data a 0
    data b 0
    data m 0

      ldc  1 99
      stm  1 a
      ldc  1 37
      stm  1 b
      call 0 min
      ldm  1 m
      outn 1
      line
      halt

                 that little bit just gives a and b initial values of 99 and 37
                 then calls the min function, then it just prints m so we can
                 see that it is indeed 37 the smallest.  

    label min
      ldm  1 a
      ldm  2 b
      stm  2 m
      skle 1 2
      stm  1 m
      ret  0
                 it gets a and b into registers 1 and 2 ready to be compared,
                 but simplifies things by setting m equal to b. If b does turn
                 out to be the smallest, the job is done so we skip the next
                 instruction to reach the ret. If a is the smallest, the skip
                 won't happen and m is overwritten with a

  it is fairly easy to get the effect of local variables and parameters too. That
  is why there are four new instructions not just two:
    push  r       memF[pos_memF] = regs[r]; pos_memF += 1;
    pop   r       pos_memF -= 1; regs[r] = memF[pos_memF];

  these also push and pop things on and off the new stack, but this time it is just
  a register, not a program code position. They allow us to use the stack to save
  values for later.

  Every time there is about to be a function call with parameters, push the current
  values of the parameter variables. Then those can safely be overwritten with the
  new values the parameters should have. Immediately after a function call just pop
  the same variables (but in reverse order) and everything is back the way it was.
  While the function was running it could use the parameter variables freely, knowing
  that the later pop will undo all changes, just like local variables.

  Here is an example, the well known recursive function for printing the digits of
  an int separately, normally written as

          void digits(int a)
          { if (a > 9)
              digits(a / 10);
            cout << " ";
            cout << a % 10; }

  first the "main program", it calls digits(3579864) be first saving the one parameter
  a on the stack, then setting a to 3579864 and calling digits. When the call is complete
  it recovers the original value of a from the stack so that everything else can continue
  to work with the unchanged original value.

    data a 0

      ldm  a
      push 1
      ldc  1 3579864
      stm  1 a
      call 0 digits
      pop  1
      stm  1 a
      line
      halt

  now the digits function. It knows it can do whatever it wants with a because the caller
  will have saved it. First see if a > 9, and if it isn't, jump over the recursive call.
  The recursive call is just like any other, it has to save the current value of a on the
  stack before computing the new value, so ldm and push. Then it can safely do the division
  by 10, update the parameter a, and make the recursive call. When the call returns restore
  a as before. A space is printed just so that we can see the number really has been split
  into its individual digits. The next part, done regardless of whether a recursive call
  happend or not, is just calculating a % 10 and printing it.

    label digits
      ldc  2 9
      skgt 1 2
      jump 0 recn_done
      ldm  1 a
      push 1
      ldc  2 10
      div  1 2
      stm  1 a
      call 0 digits
      pop  1
      stm  1 a
      outs 0 space
    label recn_done
      ldm  1 a
      ldc  2 10
      mod  1 2
      outn 1
      ret  0

    string space _

  remember that you are not required to do this additional part, but the results will almost
  certainly be more personally satisfying if you do.
END OF ADDITION ---------