| Identifier | A programmer-defined name, not limited to variable names, but also including function names, data-structure names, and other things. |
| Address | Memory is considered to be a huge array of identical storage locations. The address is just a number used to identify a particular one. |
| Attribute | (of an identifier) one of the pieces of information that may be known about an identifier, including what kind of identifier it is (variable, function name, etc), what its type is (int, float, void->int, etc), its value, its address in memory, etc. |
| Static | (referring to an attribute) one that does not change, and can be known by the compiler. In most languages, this describes type and address. |
| Dynamic | (referring to an attribute) one that can change, and can only be known when the program is running. In most cases, this only covers value. |
1- int iii,jjj;
2-
3- void fff(void)
4- { int xxx,yyy;
5- ...
6- iii=jjj*72;
7- xxx=yyy+1;
8- ... }
9-
10- int mmm;
11-
12- int ggg(void)
13- { float iii;
14- ...
15- iii=jjj+1;
16- ... }
Initially, the symbol table is empty. On reaching line 1, it is noted that iii and jjj
are now known, and that they are both ints, a primitive compiler might also decide what addresses
in memory they will occupy. A modern compiler recognises that this file might
be just part of a larger program that will be linked together later. Other parts of the program
(that will be, or have been, separately compiled) will also create an unknown number of their own
global variables, so it is not possible to know what memory addresses will still be available.
So, in summary, the compiler produces an object file, which contains executable code for the program,
but with any references to global things (variables or functions) appearing by name instead of in their
executable binary form. The linker later works out the correct addresses and substitutes them in. This
results in a lot of extra work, but solves the problem of not being able to know addresses of global
variables, and also makes globals truly global, so that they can be accessed even from other files
that form part of the same program.
Getting back to the example program, and making up a simple assembly language, we might expect the function
fff to be translated to assembly code thus:
1- iii: bytes 4
jjj: bytes 4
3- fff:
4- ...
5- ...
6- MUL jjj, #72, iii
7- ADD yyy, #1, xxx
8- ...
RET
If we imagine that the machine code translations of the assembly code mnemonics are (in hexadecimal)
MUL=3A, ADD=3B, RET=44, one byte constant operand=F7, memory address operand=FF, and note that
72 in hexadecimal is 48, the object file
produced would contain the following mix of executable code and special notations: 1- <"iii" is here>, 00, 00, 00, 00,
<"jjj" is here>, 00, 00, 00, 00
3- <"fff" is here>,
4- ...
5- ...
6- 3A, FF, <address of "jjj">, F7, 48, FF, <address of "iii">
7- 3B, ......., F7, 01, .......
8- ...
44
The text enclosed in pointy brackets <...> represents notations left in the object file
for processing by the linker. The linker can work out what all the addresses really will be,
and substitute in exact numbers as requested. The large numbers of dots on line 7 represent
the fact that we haven't yet seen how to deal with local variables.
<"iii" is here>, 00, 00, 00, 00,
<"jjj" is here>, 00, 00, 00, 00
<"fff" is here>
and its symbol table will contain three pieces of information:
iii int globalvar
jjj int globalvar
fff void->void function
which is all it needs to know about those three identifiers, given that it will be referring to
them by name.
4- SUB #12, SP
MOV FP, 0(SP)
MOV SP, FP
The second instruction is saving the frame pointer's value on the stack. The notation 0(SP) [the 0
may be any number, the SP may be any register name] indicates a special form of addressing understood
by virtually all CPUs. It means: take the contents of the register SP (this will be the address of the
lowest byte of the stack), add 0 to it (no effect), and use the result as an address (or pointer). So
the frame pointer is stored wherever the stack pointer points. The third instruction copies the contents of
the stack pointer into the frame pointer (so now FP points to the place where the old FP was saved).
At the end of any function, there should also be three instructions: one to recover the old FP, one to reduce the stack to its original size, and one to jump back to the caller:
8- ADD #12, SP
MOV 0(FP), FP
RET
Also at line 4, the compiler adds to the symbol table information about the two new variables, so the
symbol table now contains:
iii int globalvar
jjj int globalvar
fff void->void function
xxx int localvar offset=4
yyy int localvar offset=8
The new entries contain nothing unusual, except for the offset=N part. The stack has been enlarged by
12 bytes, to make room for the saved FP and these two ne variables. Each of those three items probably
require 4 bytes. As the saved FP occupies the very beginning of the new stack portion (or Stack Frame),
the first new variable will be offset by 4 bytes from the beginning of the frame, and the second by 8.
"Offset=4" means that xxx is stored 4 bytes from the beginning of the current stack frame.
In summary, the function:
3- void fff(void)
4- { int xxx,yyy;
5- ...
6- iii=jjj*72;
7- xxx=yyy+1;
8- ... }
produces the assembly/object code:
3- <"fff" is here>
4- SUB #12, SP
MOV FP, 0(SP)
MOV SP, FP
5- ...
6- MUL <address of "jjj">, #72, <address of "iii">
7- ADD 8(FP), #1, 4(FP)
8- ...
ADD #12, SP
MOV 0(FP), FP
RET
After processing the end of the function, the compiler removes from the symbol table any entries
describing its local variables. It then continues to process the rest of the program in the normal way.
iii int globalvar
jjj int globalvar
fff void->void function
mmm int globalvar
ggg void->int function
1- int a;
2-
3- void e(void)
4- { ... }
5-
6- void f(void)
7- { int b;
8- static int c;
9- ...
10- ... }
11-
12- int d;
13- ...
The variable a
has as its scope the entire program from line 1 onwards.
Its declaration is not inside any block, so its scope stretches from its point
of declaration to the end of the program. Its extent is the whole run of the program;
it is allocated memory when the program starts, and keeps it until it ends.
This brings us to an important and deceptively non-trivial question: What is a variable?
Obviously a variable is not just a name that may be used in an assignment.
Nor can it be a memory location: variables can cease to exist, but memory locations
are permanent. The only workable answer is that a variable is a two-part object, consisting
of the association between a declaration and a memory location.
If a memory location that was used for a dynamic variable gets recycled after the end
of that variable's extent, and is reused as the location of another variable, it is still
very definitely a different variable. Similarly, if a function is called recursively,
many versions of its dynamic variables are brought into existence. Each was created from
the same declaration, but occupies a different memory location. Again, they are all clearly
different variables.
We may also ask "how global is a global variable?". If a program is split into a number
of files, each compiled separately, how do the global variables behave?
Fortran, Pascal, and Algol have no standardised support for separate compilation,
so the question is left up to the designers of individual compilers (if it is answered
at all). In C, obscure linguistic tricks are used.
In C, as in other languages, all global
(i.e. top-level) variables have static extent.
Programmers do not put the keyword static in front of a top level declaration
to indicate static extent, it is automatic. However, it is possible to put the keyword
static in front of a global variable declaration, in which case it means
something else all together. A normal top-level variable declaration, such as
int i;, not inside any block, produces a global static variable whose
declaration is "exported to the linker". This means that it can be accessed from
other separately compiled units of the same program when they are linked together.
Adding the word static, producing static int i; still produces a
global static variable, but prevents its name from being exported to the linker,
so it will be a different variable from any other global integer called i
in other separately compiled program parts.
Language Specifics
C: Declarations that appear inside a block (between { and })
have as their scope the region between their declaration and the }
that terminates that block. Declarations
that are not inside any block (i.e. Top-Level declarations) have as their scope the
whole program from their declaration point onwards. All globals have static extent.
Locals have dynamic extent (limited to the lifetime of the block they were declared in)
unless the declaration begins with static.
Pascal: Declarations that appear inside a block (between begin and
end) have as their scope the region between their
declaration and the end that terminates that block. Top-level declarations
(not inside any block) have as their scope the
whole program from their declaration point onwards. All globals have static extent.
All locals have dynamic extent (limited to the lifetime of the block they were declared in).
Algol 60: All declarations appear inside a block. An Algol program is just one big
block. All declarations have as their scope the entire block that they are declared inside
(not just from their declaration point on). The extent of any variable is the lifetime
(period of time between entry and exit) of their block; if a variable's block is the
whole program, then it is effectively static, although technically, all variables
are dynamic in extent. The one exception is that if a variable is preceeded by the keyword
own, the variables it creates have true static extent.
Fortran: All variables have as their scope the entire program unit in which they
appear. Program units are whole subroutines, whole functions, or the main program. There
is no such thing as a global variable (although they may be simulated with "common blocks").
All variables have static extent.