Using Unix

(first edition, 31-1-99)
(updated 1-11-05)

This note describes the most important commands, options, and little tricks involved in using a unix system. It only scratches the surface, and is only aimed at beginning users. It is no substitute for a proper guide book, and some quite good books are not appallingly expensive.
       For information on various aspects of programming under unix, look here.


Logging In and Logging Out

The most important things to know are how to log in, how to get help, and how to log out. Fortunately, two of those are easy.
        As far as logging in is concerned, there is nothing to know: you don't get any options. You must enter your username and password, and then you are logged in.
        Today, you are very unlikely to come across a terminal directly connected to the computer you need to use; you will probably use a PC to connect over the internet. "Telnet" programs provide this functionality (
how to get a telnet program). Make sure you set your telnet program to use SSL or SSH (secure sockets layer or secure shell) before logging in.
        BEWARE that people occasionally try little tricks to find out your password. A popular trick is to write a program that produces output that looks just like the normal system login procedure, and leave that program running on a computer in a shared lab. The victim walks up, sees a free terminal and tries to log in. The program asks the username and password in the usual way and records them secretly, then pretends that something has gone wrong so that the victim won't be suspicious when he, she, or it finds they are not logged in really at all. The best defense against this trick is not to log in through an already running terminal program. Restart it for yourself. If you are using a "dumb terminal" or something else that can't be restarted, type control-C or control-Z first (making sure it reacts as it normally does), or deliberately give an incorrect password before logging in for real.
        ALSO be really careful to make sure you log out properly before leaving. Password protection isn't much use if you just leave a logged in session for someone to play with. Some people seem to feel obliged to do some mischief if they find an abandoned session, just to "teach you a lesson". When you log out, make sure the terminal program has exitted, or is at least demanding a new username. Hanging up a modem should also be enough.

        Logging out is also very easy. In fact, it is too easy. Users often log them selves out accidentally at inopportune moments by mistyping something. The official logout command is "logout". People often use "exit" to log out, because it is a little easier to type. Unfortunately exit has other important uses too. In particular, exit is the command for terminating shells (more about shells later). It is often useful to create new temporary shells, and when they are no longer needed, type exit to get rid of them. Sadly, if you create three shells, but type exit four times, the extra exit will log you out. Additionally, in many circumstances a single keypress, control-D, is enough to log you out, and that is really irritating because control-D is also something that you have to type for some other purposes. Get into the habit of always typing logout to logout, and only using exit to terminate a subshell. There are in fact many other ways to log out, but all of them are really just tricks.


Getting Help

One of the most important commands in any unix system is man (short for MANual). It is the interface to an extensive on-line documentation system, which should have an entry for every command that you can type, and every function that you can use in a program. To use it, just type man followed by the name of the command or function. The trouble is, you have to know the name of the command before you can ask for help on it.
        This is the kind of output that man provides. Pretending that I have forgotten how to use the sed command, I entered man sed. The complete output is quite long; this is just the beginning:
SED(1)             FreeBSD General Commands Manual             SED(1)

NAME
     sed - stream editor

SYNOPSIS
     sed [-an] command [file ...]
     sed [-an] [-e command] [-f command_file] [file ...]

DESCRIPTION
     The sed utility reads the specified files, or the standard input if no
     files are specified, modifying the input as specified by a list of com-
     mands.  The input is then written to the standard output.
Although man often says things in a way that defies comprehension, the essential parts can usually be decrypted with a little effort. The first section is a nice clear one line description: "sed - stream editor". It doesn't say what a stream is, or why you might want to edit one, but if you didn't already know that, you wouldn't be looking up sed, whose whole raison d'être is stream editing.
        The next section is also fairly clear once you know what to expect. Each line describes a different way of using the sed command. With the first, after the command "sed", you can specify a combination of options: "-a", "-n", or "-an". After that, you must provide a command, and finally you may provide a list of file names. Just what the options mean, and what the command looks like, will be described later in the text.
        The general pattern is that anything in square [brackets] is allowed but not required. Command options, which slightly modify the behaviour of a command, are represented by single letters and preceeded by a -dash- (specifying "-an" usually means you want both option a and option n, not just a single option called "an"). The word "file" refers to any file name, and "..." means that the previous thing may be repeated any number of times.
        So the second version of the command consists of the word "sed" followed by the same "-a", "-n", or "-an" options as before. After that if you want to provide a command, you put "-e" in front of it. Then, if you want a command-file (whatever that may be), you put "-f" in front of it. Then, as before, you may provide a list of file names.
        
        Right at the end of the very extensive output that "man sed" provides, which really is hard to read (not just for beginners) but does tell you almost everything if you can read it, we see:
SEE ALSO
     awk(1),  ed(1),  grep(1),  regex(3),  re_format(7)

HISTORY
     A sed command appeared in Version 7 AT&T UNIX.

STANDARDS
     The sed function is expected to be a superset of the IEEE Std1003.2
     (``POSIX.2'') specification.
History and Standards are not very useful, but "SEE ALSO" tells you some other man entries that might be worth looking at. They will be for other commands that have somewhat similar functions. If you decide that "regex" sounds interesting, the way to follow the reference is by typing the command man 3 regex. The 3 in parentheses gives the section of the on-line manual, which has to be given before the command name when you use man.


Passwords and Usernames

Whatever password you are given when your account is set up, change it the first time you log in. People do occasionally find out other people's passwords. The only defense is to change your password reasonably frequently, and definitely every time you think someone was watching a little too closely when you logged in. Unix provides absolutely no protection against accidentally or maliciously deleted files. Only you can protect your password. Trust nobody with your password. Anybody who has any business "doing official system things" will not need your password to do them. Even best friends, just as a joke, have been known to do things that don't seem very funny when you're approaching a deadline.
        Changing a password is easy. Just type the command passwd (note that it is "passwd", not "password". Unix is like that). The system will first ask for your old password, just to ensure that it really is you making the change. It will then ask you twice to type the new password. Whenever you type your password, your terminal's echo is automatically turned off (that means that as you type the characters, they do not appear on the screen). If ever your password does appear as you type it, stop immediately because that means you are not using the real password facility. Most systems do not accept new passwords that they consider to be easily guessable; in general, you will not be able to select a password that consists of all capital or all lower case letters, or all digits.
        If you forget your password, nobody can tell you what it is unless they have been spying on you. Even the system manager can't tell what your password is. They are stored in an encrypted form, and there is no known decryption method. In this case, the only thing you can do is ask the system manager to reset your password to some new value.

        
        Your username (the thing that you type in response to login: when you are logging in) is usually something made from your real name in a reasonable way, so people very rarely forget their usernames. If you want to find out somebody else's username, or if you have forgotten your own, but have access to someone who is able to log in, ask them to finger you. (OK, snigger now, and get it out of your system). To do that, they type the command finger followed by your real name (first, last, or both). The system will print out a brief description of every known user who has any of the specified names. One of them will probably be you. Example:
finger jellybean smith
Login: jsmith                           Name: Jellybean Smith
Directory: /home/spring99/jsmith        Shell: /usr/local/bin/tcsh
Last login Thu Jan 28 15:36 (EST) on ttyp4 from mars
Plan:
  To have a better plan than this.
This says that there is a user named Jellybean Smith, and her username is "jsmith". It even tells you her home directory, when she last logged in, and her "plan". A plan is a simple piece of text provided by the user herself. If you want a plan, just create a file called .plan (note the dot at the beginning) and write whatever you want in it. Everyone who ever fingers you will see it.
        Unfortunately, some users create very long .plan files, which make the output of finger impossible to read. If this happens, use the -p option: finger -p jellybean smith, which suppresses the printing of .plan files.
        You can find out who is logged in to the system at any time by typing one of the commands w or who. They are very similar but slightly different; try them both and see, they are harmless. Both w and who list the usernames of the logged-in people, to find their real names, you can use finger again. Finger works both ways: give it a real name and it will provide the matching username; give it a username and it will provide the matching real name.
        It is generally believed that your username is what the system uses to identify you. w, who and finger list usernames, the command ls -l (which lists the owners of files) uses usernames, and so does just about everything else. The truth is different. The system actually identifies you by a single number, called your UID ("User IDentification"). If two people have the same UID, then as far as the system is concerned, they are the same person, even if they seem to have different usernames. There should be a simple one-to-one correspondence between UIDs and usernames, but sometimes it goes wrong (unix is like that). It is possible to have a username with two UIDs, or two usernames with the same UID, or even a UID without any username at all. Always it is the UID that really indicates a person. A file stores the UID of its owner, not their name. Usernames are just irrelevant decorations added by the system only when something is expected to be read by real humans.
         There is one master file that contains the correspondences between usernames and UIDs. It is traditionally called the "password file" (because in many systems encrypted passwords are also stored in it), and its name is /etc/passwd. Anyone can look at it. If you look at the password file (use an editor, or say more /etc/passwd) you will see that it has one line for each user. If you scan through it, you should be able to find your own entry:
jsmith:*:2013:1001:Jellybean Smith:/home/spring99/jsmith:/usr/local/bin/tcsh
This entry is for the user with UID 2013. The username is "jsmith" and the real name is "Jellybean Smith". The "*" means that the encrypted password has been hidden elsewhere for extra security. 1001 is her GID (Group IDentifier). The other two entries are her home directory and shell. This is exactly the information listed by finger.
        Group identifiers (GIDs) are like UIDs but much less important. A group is just a collection of users with similar needs (so although only one user has UID=2013, hundreds may have GID=1001). Groups are not used very frequently, but (for example) you can set a file's protection to allow all people from the same group as you to use it, but nobody else.


The Simplest Commands

Always remember that unix is case-sensitive. All commands are lower-case only. typing cat will run the cat command; typing CAT won't. The same applies to options: ls -r is not the same as ls -R, and even to files. You can have a file called mine, another file called Mine, and another file called MINE, and they are all completely different and unrelated.
        Who am I? Where am I? The command whoami tells you who you are (i.e. the username you logged in with). If you need to use it you should probably go home for a while and rest. The command "hostname" tells you which computer you are logged in to, in its full internet form (e.g. rabbit.eng.miami.edu). The command "pwd" (Print Which Directory) tells you which directory you are in at the moment (showing its full path). This is more useful that it sounds: many users keep their work filed away in a lot of subdirectories, and it is easy to get lost after a lot of "cd" commands.
        Moving About in Directories is acheived by the "cd" command, which is essentially the same as on a PC (except that PCs use \ to separated directories, and unix uses /). If the current directory (the one named by "pwd") has a subdirectory called "fred", then the command "cd fred" will make that be your current directory. Subdirectories may be nested to (just about) any depth. ".." refers to the parent directory, so "cd fred" followed by "cd .." will usually have no nett effect.
        Creating and Destroying Subdirectories: mkdir is the same as in dos or windows. "mkdir wawawa" makes a new subdirectory of the current directory, calls it wawawa, and leaves it there, empty and ready for use. "rmdir wawawa" would then delete that subdirectory. Try to use "rmdir" instead of "rm -d" to delete directories if you get the choice; it is a little bit safer. rmdir will not delete a directory until it is empty. Subdirectories are a GOOD THING whenever you have a number of projects or assignments or just plain things to keep, create a subdirectory with a sensible suggestive name for each one, and keep all the files for any project in the appropriate directory. That makes it much easier to find things, and gives you protection against accidental deletion. You would think that nobody would ever type "rm *" by accident, but I have seen it, more than once. "rm *" only pummels the current directory; to eviscerate neighbouring subdiretories as well, you would need to use the command "rm -R ../*", which is much harder to type by accident.
        Listing a Directory The command "ls" will list the names of all the files in a directory. It has a large number of options, the important ones are: "ls -l" (l for long) lists the files one per line, with other information including owner, size, date last changed, and protection. "ls -a" (a for all) lists absolutely all the files in the directory, even the ones that are normally hidden. "ls -R" (R for recursive) lists all files in this directory, and all sub-directories. "ls -F" (F for file type) adds an extra character to the end of each file's name, to indicate what kind of file it is: "*" indicates executable, "/" indicates a directory, and "@" indicates a link. As always, mutliple options can be combined: "ls -lR" enables both long and recursive options. You may also provide a list of filenames to restrict the listing. "ls -l p1.c p2.c" gives the detailed form of the listing for just the two files p1.c and p2.c.
        Seeing a File The command "cat filename" (peculiarly short for conCATenate) prints the entire contents of a file on your terminal. If it is too big to fit on the screen you will miss the beginning. The command "more filename" solves that problem: it lists the file in small sections. After filling the screen it waits for you to react. Press the space bar to see the next screen-full, or press Enter to move on by a single line. You can also tell more to jump to other places in the file, see "man more" for details. The command "tail filename" just shows the last few lines of a file.
        Deleting Files The "rm" command (short for ReMove) is used to delete files. It is very dangerous: it does not ask "are you sure?", and can not be reversed. If you delete a file you've lost it for ever. "rm *" deletes all the files in the current directory without a second thought. It is possible to set the protection of a file to prevent accidental deletion (described later). Options: "rm -i" (i for inquire) makes it ask for confirmation (y/n) before deleting each file. Normally rm refuses to delete directories, but "rm -d" will delete anything it can. "rm -f" (f for force) even deletes files that are protected against deletion (so long as the files are owned by you); you probably don't want to use this option. "rm -R" (R for recursive) deletes not only from the current directory but also from all subdirectories; if you use the -R option, the -d option is automatically turned on too. So, "rm -Rf *" will delete absolutely everything that can be reached from the current directory. BEWARE: The "wild card" * is not the same as it is on dos or windows: "rm -R *.tmp" does not mean delete all .tmp files in all subdirectories.
        Copying Files is done with the cp (cp for CoPy) command. "cp one.c two.c" makes a copy of one.c, and calls it two.c; if a file called two.c already existed it is overwritten and permanently lost. If "fred" is a subdirectory then "cp one.c fred" makes a copy of one.c, also called one.c, in the directory fred. BEWARE: Do NOT use the wild-card * with the cp command until you fully understand how unix handles wild-cards. The result is not what you would expect, and can be very destructive.
        Renaming Files is done with the mv (mv for MoVe) command. mv is primarily used to move a file from one place to another without making a new copy (making a copy creates a new file with a new date. Moving just moves the file, leaving its modification date alone). A mv is almost the same as a cp followed by a rm, but not quite. Renaming a file is simply a matter of moving it to the directory it is already in, but with a different name. The parameters to mv are the same as the parameters to cp. "mv one.c two.c" changes the name of one.c to two.c; if a file called two.c already existed it is overwritten and permanently lost. If "fred" is a subdirectory then "mv one.c fred" moves the file one.c into it. BEWARE: Do NOT use the wild-card * with the mv command until you fully understand how unix handles wild-cards. The result is not what you would expect, and can be very very destructive.
        Protecting Files is done with the chmod (chmod for CHange MODe) command. You must first work out a numeric representation for the level of protection you wish a file (or some files) to have, then enter the command "chmod NNN FFF", where NNN is the protection, and FFF is a file name, or a space-separated list of file names. It is not strictly necessary to express the protection in numeric form, but the alternative is too complicated to explain here.
        There are nine different kinds of access that one might want to allow or forbid, and ech has its own numeric code:
                  400: Owner Read - The owner of the file (you) may see the contents of the file.
                  200: Owner Write - The owner may modify or overwrite the contents of the file.
                  100: Owner Execute - The file is executable code, and the owner may run it.
                  40: Group Read - Others in the same user-group as you may see the contents of the file.
                  20: Group Write - Others in the group may modify or overwrite the contents of the file.
                  10: Group Execute - The file is executable code, and others in the group may run it.
                  4: World Read - Everyone may see the contents of the file.
                  2: World Write - Everyone may modify or overwrite the contents of the file.
                  1: World Execute - The file is executable code, and everyone may run it.
Simply add together the zero to nine numbers for the accesses you wish to allow, and you've got the numeric version of the file's protection.
To make xxx.cpp usable only by you: "chmod 600 xxx.cpp".
To make xxx.cpp modifiable only by you and readable by everyone: "chmod 644 xxx.cpp".
The owner of a file is always allowed to change its protection, so if you have a file with very important contents, you may protect it against accidents with a mode of 400.
        Of course, a file can only be accessed at all if the directory containing it can be accessed. A simple way to protect everything is to put it in an unreadable subdirectory; if you want to make a file available to everyone, you have to make sure that the directory containing it is readable by everyone.


Processes

Most users of computers never have to think about processes, but if you are using unix they can't really be avoided. A process is really just the collection of software bits and pieces required to support a running program. It is the memory your program occupies plus the files it is using plus the connection to your terminal plus a record of who you are and what share of the computer's resources you are allowed to use, and a few other things.
        With a serious, heavy-duty operating system, a process is created as soon as you log in, and stays around until you log out. Every time you run a program it is run under that same process. A program can leave information with the controlling process for the next program to make use of. Occasionally a user may create an extra process to carry out some background tasks, but usually one process is enough for anyone.
        Unix is different. Unix spends processes like they are going out of fashion. A unix process can only run one program. As soon as it is created, it must have a program to start running, and as soon as the program terminates, the process is destroyed ("deleted" is the normal term). When you log in, a new process is created, and made to run a "shell" program. It is the shell program that primarily interacts with you: it prints out the "rabbit>" prompt and waits for you to type a command; it analyses the command and decides what to do with it. If ever this shell program terminates, your process dies, and you are automatically logged out. logout and exit are simply commands that tell a shell to terminate.
        Unfortunately, the unix shells don't know how to do very much. They can obey simple commands like exit, and echo (which just echoes back its own parameters), but they don't know how to list files or compile or edit programs. Whenever you type anything but the simplest of commands, your shell just delegates responsibility. It creates a new process, tells it to run the program which will obey the command you entered, and just sits back to wait for the results. Even when you type a simple little command like "ls", a totally new process is created, and made to run a program called ls from one of the system directories. Every command (except for the trivial ones that the shell can handle) is actually executed by a program (with the same name as the command) from one of the system directories.
You can see direct evidence of this is you type the ps command (ps for list ProcesseS). Typically, if you type ps, you see something like this:
  PID  TT  STAT      TIME COMMAND
 2443  p3  Ss     0:00.05 -tcsh (tcsh)
 2501  p3  R+     0:00.00 ps
Each line of output describes one independent process belonging to you. The first column gives the process' identification number or PID; this is a unique identifier: no two processes alive at the same time can have the same PID. The second column gives the name of the terminal that the user controlling that process is logged in at; it is not very useful, but if they are not all the same, that could mean that you have accidentally left yourself logged in somewhere. The third column gives the status of the process, and the important information comes from the first letter: R means running, S means Sleeping (probably waiting for something to happen), I usually means waiting for input from the user, and D means waiting for a disc read or write. The fourth column gives the amount of CPU time used by the process (a total of 0.05 seconds, so I obviously haven't been very busy), and the last program says what program the process is running. "tcsh" is the name of the shell that I (and most other users) run, and "ps" is of course the program that produced the output we are reading.
        So, we have a shell process that is just sitting there waiting for something to happen, and a second process running the ps program. The event that the shell process is waiting for is the termination of the ps process. When that happens it will wake up and pring the prompt again.
        If you are the owner of a process, and you know its PID (the number printed out by ps) you have the power of life and death over it. This power is wielded by sending messages ("signals" in unix terminology) to the process. There are only a few signals that can be sent to a process, so you don't get very fine control, but at least you can kill them. The command for sending a signal to a process is very confusingly called "kill". Most users believe quite reasonably that kill kills processes. It doesn't, but it can send a signal that orders a process to kill itself. Signals are given short names, the useful ones are INT, KILL, TSTP, STOP, and CONT. INT (short for INTerrupt) is the signal caused when you type control-C, so sending INT to a process is exactly the same as typing control-C to it, which is normally enough to make it stop. KILL is an extra-strength INT; it is quite esy to write a program that ignores control-Cs, but it is impossible to write a program that ignores KILLs. TSTP (Terminal SToP) is the signal caused when you type control-Z, which will be described very soon. STOP is identical to TSTP, except that programs are incapable of ignoring it. CONT reverses the effect of a TSTP or STOP.
        Sending a signal is very easy: the command "kill -KILL 2443" sends the KILL signal to process 2443 (but only if you own that process). Killing your own shell process is one of the trick ways of logging out. If sending a kill signal to one of your own processes does not stop it cold, there is something seriously wrong with the whole system.
        kill is useful for two main reasons. One is that if you realise that you forgot to log out, you can log yourself out from anywhere. The other is that sometimes programs go wrong and can't be stopped by normal means. If you ever find that a program you are using goes mad and won't stop, type control-Z. It is possible for a program to turn off control-Zs, but most don't. If the program has not turned off control-Zs, it will be put to sleep. No killed, just stunned. What really happens is that it has swapped places with your shell process. Your shell process is awake, gives you the prompt, and waits for a new command. The program that you stopped is asleep in the background waiting to be dealt with. If you type ps after using control-Z to stop a mad program you see something like this:
  PID  TT  STAT      TIME COMMAND
 2443  p3  Ss     0:00.06 -tcsh (tcsh)
 2508  p3  T      0:00.00 loony
 2509  p3  R+     0:00.00 ps
The status T indicates sTopped (S was already taken for Sleeping). If you don't do anything about it, the sleeping mad program will stay there for ever, consuming valuable memory and things.
        There are three sensible things you can do with a mad program once it has been stopped in this way. The most common is to kill it properly, by sending the KILL signal. Alternatively, if you realise that it hadn't gone mad at all, you can reinstate it, by typing the "fg" command (fg for ForeGround) which will put your shell back to sleep, and bring the most recently stopped process back to life. The third alternative would be to let it continue to run, but detached from your terminal (this might happen if the program will take a very long time to complete its job; you want it to run, but you also want to be able to use your terminal for other things). If you send the CONT signal to a stopped process, for example "kill -CONT 2508", the program will start running again, but completely in the background. You will still be able to type other commands to your shell, and even have other programs running alongside it in the background. The output from a background process still appears on your screen, but a background process is detached from your keyboard. If it tries to do input, it will be out of luck. When a background process tries to do input it is simply stopped again; you can use the "fg" command to bring it back to the foreground, talk to it and make it happy, then send it into the background again. There is even a quick way of doing that: first type control-Z to put it to sleep, then type the "bg" (BackGround) command, which simply sends the CONT signal to your stopped process.
        Running programs in the background can be very useful. Any time you have a program that will take a long time, but doesn't require constant attention from you, backgrounding it is a sensible plan. Starting a program, the sending the STOP signal, then sending the CONT signal is a very long-winded way to do a very useful thing. Fortunately there is a shortcut. Any time you enter a unix command, simply type an ampersand: & at the very end of the line, before pressing return. That single character tells the shell to run the program completely in the background; it is exactly as though you had started the program normally, then immediately typed control-Z, then sent it a CONT signal.
        
        One consequence of all this process creation is that it is very hard for a program to leave little messages with the process for later programs to read. Every program is run in a totally new process. Programs are perfectly free to leave little messages all over the place, but as soon as the program terminates, its process is deleted, and all the little messages go with it.
        Occasionally people create shell "scripts" that are supposed to set up things so that their shell process behaves in some particular way. A shell script is just a normal text file that contains unix commands (a .bat file under DOS), so for instance, you might create a file that contains these two commands "setenv TERM vt100" and "set term=vt100" (the belt-and-braces way to make sure the system knows your terminal is of the VT100 kind), and then tell the shell to run it. Unfortunately, it would be run in a new process. That new process would know that you've got a VT100, but it would be instantly deleted, and your main shell would be none the wiser.
        This particular instance of the problem can be evaded. If your script is in a file called "dothis", then the command "source dothis" tells your shell to obey that script itself, instead of with a new process, so the information is properly remembered. When a new process is created it picks up all kinds of things (including values of variables) from its parent process, so all program you run from now on will know you've got a VT100. Unfortunately, the source trick only works for shell scripts, not for normal programs, so there is still no simple way for programs to leave little messages.


Variables and the Command Line

Unix and the shells will keep track of the values of some variables for you. They behave more-or-less like the variables you use in a program, except that they don't have to be declared, and the only values they can contain are strings or lists of strings. These variables are not connected in any way with the variables inside a program. A shell variable called xyz has absolutely nothing to do with a variable called xyz inside a program.
        Using Variables: There are two different kinds of variables, and they are easily confused. There is no good reason for keeping the two kinds of variables separate, but that's just the way it is. Shell Variables are maintained by the shell. Only the shell can see them, programs can't make any use of them at all. Environment Variables are maintained by the operating system at a deeper level by the shell, and can be seen by programs. Very often you will have both a shell variable and an environment variable with the same name, and have to make sure that both versions always have the same value to be sure that everything will work properly. It is not always clear whether or not you have to do this.
        Environment Variables: You can create a new environment variable, or change the value of one that already exists with the setenv command. Example: setenv TERM vt100 sets the value of the environment variable called TERM to be the string "vt100". The command printenv TERM will tell you what the value of that variable is. The command unsetenv TERM will remove the variable from the environment entirely. Every time a program is run, the environment is passed to it as a parameter in the form of a list of strings, so environment variables are a good way to set up pieces of information that programs will need to see (such as your terminal type, as in the examples above).
        Shell Variables: You can create a new shell variable, or change the value of one that already exists with the set command. Example: set TERM=vt100 sets the value of the shell variable called TERM to be the string "vt100". Notice that the set command has an equals sign, but the setenv command just has a space. The command echo $TERM will tell you what the value of that variable is; this is a use of the shell's substitution scheme discussed below, not a special command. The command unset TERM will remove the variable from the shell's memory entirely. Shell variables are never passed to running programs, but using the "$" substitution you can get around this to a certain extent.
        Important Variables: There are certain important variables that must always be set correctly. Some only need to be shell variables, some only need to be environment variables. In case of doubt, it is best to do things both ways, just in case. The most important variable is PATH; if it is wrong, you won't be able to run any programs. When you tell the system to run a program, it looks through all the directories that appear in your path until it finds a program with the right name. It won't look anywhere else, so you'd better get it right. The second most important variable is pobably TERM; all editors use it to decide what kind of control codes to send to your terminal program to move the cursor around, so if its value is wrong, editors (and a lot of other things) either won't work at all or won't work properly. There is a special file called .login (note the dot) that is run every time you log in. If there are any important variables that need to be set, you should put the commands in there. You will already have a .login file; if you look at it now, you'll see a few mysterious commands, but you'll also see PATH being set.
        Most variables just contain a single string, but some (especially PATH) contain a list of variables, so the set command works differently: set name=(one two three) sets the variable called name to be the list of three strings "one", "two", "three". If you want to add an extra string to the end of the list, use set name=($name extra), and similarly if you want to put an extra string at the beginning, use set name=(extra $name). For anything more complex you'll need to read the man entry for the shell you are using (note that different shells have different commands, so make sure you read the right one). Man tcsh is likely to be it, but it has a lot to say.

        Substitutions. Usually, when you type a command, the shell first splits it into a number of strings separated by spaces: the command ls -l p1.c p2.c is split into four strings: "ls", "-l", "p1.c", and "p2.c". The first string is the command name. Depending on what that command name is, one of five things will happen:
(1) If it is one of the few commands that the shell can handle itself, it does. For example, if the first string is "echo", the shell just prints out the other strings.
(2) if it is a long filename (long meaning it has at least one "/" in it, so "./xx" is considered long, whereas "verylongfilename" is short), anyway, if it is a long filename, then that file is executed (assuming it is an executable program), and given the remaining strings as parameters.
(3) If the command name matches one of the aliasses that the shell knows about, it is replaced by the alias value, and the process starts again. You can create your own aliasses easily. For example, the command alias dir ls -l creates a new alias called dir; if you later type the command dir p1.c the results will be exactly as though you had typed ls -l p1.c. To remove an alias, type for example unalias dir.
(4) If none of the above conditions apply, the shell will search through all the directories in your path until it finds one that contains an executable file with the same name as the command name. It runs that program, giving it the remaining strings as parameters. In your path, the name "." refers to the "current directory" whatever it happns to be whenever your path is searched. If you forget to put "." in your path, you can still run your programs by using the second option. For example, if you have a compiled program called a.out, but typing the command a.out fails, type the command ./a.out instead.
(5) If the search of the path fails, there is nothing left to do, and you get an error message.
Warning: If you create an executable program with the same name as a system program, strange and dangerous things can happen. For example, there is a system program called "test" which most beginning users don't know about. It normally lives in the directory /bin. "test" is a very popular name for programs, so there is a good chance you might also create an executable file called test in your own directory ".". What happens when you try to run it by typing the command test? Well, if /bin appears before . in your path, /bin/test will be run instead; you will get a very strange error message from it, and wonder how your program could possibly have produced such output. If . appears before /bin in your path, then the command test will work, but other things mysteriously will not. Some other system programs that you occasionally run might make use of test internally themselves; those programs will now use your test instead of the system one, and will of course go wrong. This is a very bad state of affairs, and will occasionally catch you, its almost inevitable. Your one protection is the which command, it tells you which program file it would execute if you were to type a particular command. So if you are thinking of writing a program called test, and miraculously remember to do this, you would first type which test and the system would respond with /bin/test, telling you that there is already a program called test in your path, so you should really pick a different name.

        Another thing that happens after the shell has split the command line into separate strings, but before it decides what to do with them, is that it runs through the strings, replacing any special forms with different strings. The most common special form is the symbol $ immediately followed by the name of a variable. It is replaced by the value of that variable. For example, if you type the command echo TERM all you get is the string TERM printed out. However, if you type echo $TERM, the special form $TERM is replaced by the value of the variable TERM before the command is executed, so what you see printed is the string vt100 or something like that. The word before is significant. Substitution is performed before commands are executed, so programmers do not have to build the ability to recognise special forms into their programs. This also makes it possible to pass shell variables into programs, even though programs can't directly access shell variables.
        The exact details of how $ works vary from shell to shell. The popular shell "tcsh" will replace $xyz by the value of the shell variable xyz. Only if no such shell variable exists does it use the value of an environment variable.

        Another important substitution is the Wild Card substitution. Whenever a string containing the character * is found, it is replaced by the names of all the files in the current directory that match the string. For example, if the current directory contains files called "ant.c", "bat.c", and "apple.c", the pattern a*.c is replaced by "ant.c apple.c", so the command rm a*.c deletes both of those files, and the commands echo a*.c and ls a*.c both provide a list of those two file names.
        This is why you have to be really careful using * and why it can't be used with cp or mv. Suppose you are in the directory with the three .c files described above, and decide that you want to change them into .txt files. If you type the command mv *.c *.txt the first pattern, *.c is replaced by the three names ant.c, bat.c, and apple.c, and the second pattern, *.txt is replaced by nothing at all, because there are no files matching that pattern. So the command actually executed is mv ant.c bat.c apple.c, whcih does not make any sense at all. If you were unfortunate enough to type echo a*.c a*.txt, the command executed would be echo ant.c apple.c, which would change the name of ant.c to apple.c, destroying the file apple.c.
        In wild card substitutions, * can match any number of characters, and ? can match any single character. You can create very complicated matching patterns if you have the patience. For details read man tcsh.


Redirecting Input and Output

Redirection of input and output is also done by the shell; your programs don't even need to know it is happening.
        If you have a program that requires a lot of input, and you are going to run it many times with the same input (perhaps while debugging), you can put all the input, exactly as you would type it to the running program, into a file, and tell the program to take input from there instead of your keyboard. For example, if your program is called prog and you have put all the user input in a file called input, then the command prog <input will run the program "on automatic pilot". You can't have it both ways: input comes either from a file or from the keyboard, never both.
        If you want to capture the output of a program, a very similar trick will do the job. The command prog >output creates a new file called output (if one already exists, it is destroyed). prog is run in the normal way, but all output that would normally have appeared on your screen goes into the file instead. You can't have it both ways: if output is redirected, nothing appears on the screen. This makes it very difficult to use with interactive programs. You can use < and > together, as in prog <input >output.
        If you want the captured output to go to the end of a file instead of completely replacing the contents of that file, use the slightly different notation prog >>output instead.
        Some programs (particularly compilers) print errors in a special way. Although they do appear on your screen, they are not printed the same way as normal output, and the >output trick will not catch them. Normal output is captured, error messages are not. If you want to capture error messages too, the notation is prog >&output.

        If you want to keep a transcript of some interactive work, and still see what you're doing, use the script command. It actually starts up a new shell process for you to type commands to, and stores a record of everything that passes between computer and user as long as that shell is running in a file. The new shell will probably have a prompt of % instead of your usual one, so you won't forget that you are still making a transcript. When you have done everything that you want recorded, type the exit command. This will terminate the temporary shell that you were using, and stop the recording. You will return to your original shell, and find a file containing an exact copy of everything you did. Example: script record.txt starts the process, storing the transcript in a new file called record.txt.


The Bigger Commands

It would take a whole book to describe the bigger commands. I'll just outline them here. If you want more detals, get a book, or use man.

Compiling The setup may be slightly different from one computer to another. Described here is the setup for Rabbit. For anything else, it will be very similar. The name of the command to compile may be slightly different, thinbgs to try are CC, cc, cxx, c++, g++, gcc, gpp, you can't go far wrong trying them.
        C programs should be in files whose names end with ".c", and C++ programs should be in files whose names end with ".cpp".
        "CC prog.cpp" (for C++) and "cc prog.c" (for C) compiles a program, which must be complete, with a main() function, and puts the executable in a file absurdly called a.out
        CC prog.cpp -o prog is the same, but the executable is called prog instead.
        CC -c prog1.cpp does nearly all of the job of compiling prog1.cpp, except that it recognises that it may not be a complete program, but might have to be linked with other parts before an executable can be made. It creates an almost-executable file called prog1.o. Compile all the separate aprts of a program in this way.
        CC prog1.o prog2.o -o prog takes two (or more) almost-executable .o files created as above, and combines them into a single really executable file called prog. If you keep all the parts of a program in the same subdirectory, with nothing else cluttering it up, you can use CC *.o -o prog to the same effect.

Editing
        pico is the most popular editor we've got, it is very simple to use, and even gives you a menu of available commands always visible on the bottom two lines of the screen so you can't forget them. It is very easy and convenient, but can't do anything remotely complex (it can search, but that's about it). Type pico test.c to try it.
        vi is the unix standard editor. It can do just about anything that you'd want an editor to do, but is really horrible to use. The commands are so obscure that even people who claim to like using it don't remember them very well.
        awk and sed are special unix utilities that perform repetitive editing operations quite well. You have to program them; they are not interactive. At same time I'll add a short description of them, in the mean time, you could say man awk or man sed.

Searching
        grep is used to search through a bunch of files to find one that contains a particular string. If you have a lot of .c files, and one of them contains a clever function you wrote, called "PredictTheFuture", but you can't remember which file its in, you could type grep PredictTheFuture *.c
        find answers a similar question. If you have a lot of subdirectories, and you can't remember which one you put the file "thisone.c" in, you could type this: find . -name thisone.c -print.

Electronic Mail
        pine is the most used email utility on our system. It is pretty much self-explanatory, and in fact is based on pico, so there is really only one new thing to get used to. If you know what email is, you'll work out how to use pine.


I'm sure there are a lot of important things that I forgot to mention. Tell me what's missing, and I'll add it if I reasonably can.