Using Getchar

The getchar() function is another part of the old C library. It is the most basic input function there is, and is very useful in building your own more complex input operations when the provided cin ones aren't up to the job. To use getchar(), your program must #include <stdio.h>

getchar() has no parameters. Every time you call it, it reads the next character of input and returns it to you. The function returns an int, being the ASCII code of the relevant character, but you can assign the result to a char variable if you want.


Press ENTER to continue

Here's a simple example: The "press ENTER to continue" operation that is used when a program produces a lot of output, and the reader needs to be given a chance to read it. The function should print "press ENTER to continue", then wait until ENTER actually is pressed, ignoring all other keypresses. Very simple.
         void PressEnterToContinue(void)
         { printf("Press ENTER to continue: ");
           char c=getchar();
           while (c != '\n')
             c=getchar(); }
Except of course, things are never quite that simple, even in this case. What if there is no more input? The user might type control-D (or on windows control-Z) to signify End Of File, or the user might have redirected the program's input channel so that it is receiving from a file, and that file could just end.

        If either of those things happens, this program will just continue to run for ever. Getchar really does strictly obey its rules. It always gives you the next input character. If it hasn't been typed yet, then of course it waits, but if the input file has ended, there is nothing to wait for. It gives you the special value EOF, over and over again. Any program that enters a loop waiting for some particular input should be designed to survive unexpected end of files.

        Fortunately, that is very easy. Just stop the loop if c is EOF:
         void PressEnterToContinue(void)
         { printf("Press ENTER to continue: ");
           char c=getchar();
           while (c != '\n' && c != EOF)
             c=getchar(); }
Personally, I don't like even that little bit of repeated code, where a character has to be read once before the loop is entered, and then again inside the loop. It makes it too easy to make mistakes when the program is modified in the future to meet slightly different requirements. I would probably prefer something like this:
         void PressEnterToContinue(void)
         { printf("Press ENTER to continue: ");
           while (true)
           { char c=getchar();
             if (c=='\n' || c==EOF) break; } }

A word about EOF

The value of EOF is always defined to be -1. That works well because all ASCII codes are positive, so it can't possibly clash with any real character's representation. Unfortunately, C and C++ have a very strange feature that can cause trouble. It is not defined what the range of possible values for a char variable must be. On some systems it is -128 to +127, which is fine; but on other systems it is 0 to +255, which is fine for normal ASCII values, but not so hot for EOF's -1.

Generally it is best to store getchar()'s result in an int guaranteeing that EOF is properly handled.

So the final version might be this:
         void PressEnterToContinue(void)
         { printf("Press ENTER to continue: ");
           while (true)
           { int c=getchar();
             if (c=='\n' || c==EOF) break; } }
You might reasonably think that using a signed char to receive the result of getchar() would be even better. It should be. But things are not always as they should be in C++. The ReadLine function (next) explains what is wrong.



Reading a Whole Line

C++'s cin functions like to skip white space before they do anything (although that can be changed with an input stream manipulator), and generally stop reading as soon as they find a second patch of white-space. It is quite aggravating when you want to read a whole line. Using getchar it is extremely easy.

Start with an empty string, and enter a loop. Each time round the loop, read a single character. If it is the newline character (or EOF), then exit from the loop. Otherwise just add the new character to the end of the string. When the loop is finished, you've got the whole line of input, totally unmolested, in the string.
         string ReadLine(void)
         { string s = "";
           while (true)
           { char c = getchar();
             if (c=='\n' || c==EOF) break;
             s=s+c; }
           return s; }
And that really is it this time.

Except for one thing. We have the problem of not knowing whether EOF (which is -1) is going to be properly representable on any given system. The majority of C++ systems these days seem to allow negative char values, so it will normally work out fine. But platform-independence is a good thing. It would be nice if we could safely run the same program on all computers. What if we come across a computer that makes char variables unsigned by default?

The natural thing to try is to make the variable c be explicitly declared as signed, so we would have signed char c = getchar(); That should be the perfect solution. But it doesn't work. Absurdly, C++ knows how to add a normal char to the end of a string, but not how to add a signed char to the end of a string. You'd get a compilation error.

Another thing to try would be declaring c as an int, which would also get an error. The only things that C++ knows how to add to the ends of strings are plain chars and other strings.

Usually the best solution is to save the result of getchar() in an int variable, and use a Type Cast to convert it to a char just as it is being added to the end of the string. The fully correct version would look like this:
         string ReadLine(void)
         { string s = "";
           while (true)
           { int c = getchar();
             if (c=='\n' || c==EOF) break;
             s=s+(char)c; }
           return s; }
And really, every use of getchar() should be handled in the same way. Always keep its result in an int.

What would go wrong?

On a computer that has unsigned char variables by default, what could the problem be? It is the result of C's and C++'s rules for mixed-type operations. When we read past the end of the input file, and the value -1 (EOF) is stored in an unsigned char variable, it takes on the value of 255 (that's the way negatives work in binary). The value of EOF is assumed to be an int, so when testing for EOF, we have c==EOF, the test actually performed is 255==-1, which is of course false. So EndOfFile conditions would never be detected.



Reading a Comma-Delimited List

An almost identical function can be used to read strings that don't necessarily occupy whole lines, but are not delimitted (begun and ended) with spaces. A common alternative is to use commas or colons as the delimiter, especially when dealing with real names that could have spaces in them.

The need is for a function that will read a string, stopping when it reaches a colon (or any other character you designate) OR the end of the line. It makes sense to have the delimiter character as a parameter:
         char terminator = 0;

         string ReadDelimitedString(char marker)
         { string s = "";
           while (true)
           { int c = getchar();
             if (c==marker || c=='\n' || c==EOF) break;
             s=s+(char)c; }
           terminator=c;
           return s; }
What is the global variable terminator for? Well, after reading a string that may have beed terminated by a colon or may have been terminated by the end of the line, perhaps you want to know what the terminator actually was. Perhaps you care how many strings are on each line. Perhaps each line contains colon-separated strings describing one object or person. After calling ReadDelimitedString just look at that global variable, if terminator=='\n' then you know that you have finished a whole line of input.



Reading a Squashed Number

C++'s cin is fine for reading properly delimited numbers. If we have a date written as three separate integers (year, month, date), for example 2004 2 12, the basic cin >> operations will read all three perfectly.

Very often, we like to leave out unnecessary characters in files to reduce their size and make other processing easier. Dates are very often written still as three numbers, but with no spaces, so the above example date would be 20040212. Even though we know that it consists of a four digit number followed immediately by two two digit numbers, there is nothing that can be done with it using cin's normal operations. You would have to read the whole thing as a string, perform substring operations to extract the three parts, then convert each of those three substrings to ints separately (another thing C++ isn't too good at!).

Wouldn't it be nice if we had a function that would read an int of a particular known number of digits, and not require spaces around it? That's very easy using getchar:
         int ReadInt(int size)
         { int value=0;
           for (int i=0; i<size; i+=1)
           { char c=getchar();
             if (c==' ') c='0';
             value=value*10+c-'0'; }
           return value; }
A nice simple little function, but it may need to have a few things explained.

Why don't we bother with reading c as an int? We could, but there is no need. When we are never going to compare c with EOF, no problems can arise.

What is that "if (c==' ') c='0';" line? It isn't necessarily necessary, but it may be that when the month or day number only requires one digit, it could be typed with a space instead of the (logically unnecessary) zero, so just treating spaces as zeros keeps it working properly.

What is that "value=value*10+c-'0';" line? That is the one that does the conversion from just a bunch of characters into a single int. If value contains what we've read so far, that kind of operation is necessary to keep it updated. Consider reading the number1234: at some point you will already have read the 12 and understood it as twelve. When you read the next character 3, the whole thing read so far is 123, so value should be one hundred and twenty three. Given twelve and three, how do you get to one hundred and twenty three? Multiply by ten and add. That's what is happening.

Except for the -'0': That is to handle the ASCII coding. The character '0' is not the same thing as the integer value zero. Characters are represented by their ASCII codes. The code for '0' is 48, the code for '1' is 49, the code for '2' is 50, the code for '3' is 51, the code for '9' is 57, but you don't need to remember that. Just knowing that the follow a logical pattern starting with '0' is enough. To convert an ASCII code to the correct int, just subtract the ASCII code for zero. Take three as an example: '3'-'0' = 51-48 = 3. Its always the same.

So, using that nice little function, we can read compressed dates very easily:

        { int year=ReadInt(4);
          int month=ReadInt(2);
          int day=ReadInt(4);




Whenever an input situation is too much for cin to handle, getchar is there to rescue us. There's nothing it can't do.

As with printf, you should not use getchar and cin in the same program: two different systems both fighting over control and processing of what you type can lead to errors.