EEN218 Class Notes Tuesday 2004-02-10

1: Defining Objects

A fraction, we found, is best represented by three values: a number for the top, a number for the bottom, and a true/false for whether or not it is valid (it is possible to use something like 2/0 as a fraction which is invalid. Anything created by adding, subtracting, multiplying, or dividing invalid fractions is also invalid, so a record needs to be kept. This is not what we are used to for simpler kinds of numbers; there is no such thing as an invalid int, any sequence of digits is acceptable).
        So to create fraction as a new data-type that can be used in programs, this declaration is used:
struct fraction
{ int top, bot;
  bool ok; };
*
A vector in three-dimensional space is represented by three real numbers. It is not normally possible to have an invalid vector, so we might create vector as a new data-type with this declaration:
struct vector
{ double x, y, z; };
*
Chemical elements have a huge amount of information associated with them. If we were writing a program that is supposed to work on them, we would have to decide which pieces of information are going to be useful (Perhaps if we were just creating a giant chemistry database we would want to record everything, but in most cases we have to be practical: any information we choose to record will occupy memory in a running program, and somebody is going to have to type it all). So making an assumption because this is just an example, perhaps for each element we need its name (e.g. Carbon or Gold), its symbol (e.g. C or Au), its atomic number (e.g. 6 or 79), its melting point (e.g. 3900.0°K or 1336.1°K), and its boiling point (e.g. 5100.0°K or 3239.0°K). The first two are clearly strings, and the third is an int. Normally all floating point numbers should be coded as doubles. In this case it is not necessary to use doubles, The boiling points of chemical elements are not normally known to any great accuracy, so a simple float would do the job just as well, and use a little less memory too.
        To create element as a new data-type for a chemistry program, we would use this declaration:
struct element
{ string name, symbol;
  int at_number;
  float m_point, b_point; };
It is worth putting a little effort into thinking of good names for the contents (fields) of a struct. Should we type out "atomic_number", or could we reduce it even further to something like "an"? There is no one correct answer; it depends on your intentions for the program. Anyone using a chemistry program could be expected to realise that at_number stands for Atomic Number, but something like "an" might be too small to be unambiguous. On the other hand, the less typing that is required, the happier your users are going to be.
        Also, we should ask ourselves whether enough information is given to make the program reliable in use. When we see the data listed in a book, units are always given. Nobody could reasonably say that the melting point of gold is "1336.1". Is it 1336.1 degrees Fahrenheit, Centigrade, or Kelvin? Once something is just reduced to a number in a program, there is no way that units can remain associated with it, a number is a number, and nothing else. What are we going to do about it? A recent space probe failed expensively because its specifications were all written in inches, but part of it was built by a European manufacturer who assumed centimetres. We don't want to be responsible for any mix-ups.
        One good solution to the problem of units is to remember that nobody can access any of the data inside a struct without typing its name. If we make the name include a reminder of the units, perhaps "melting_point_in_Kelvin", nobody who uses the data provided by this program can ever claim they thought it was centigrade. Of course, that name is too long, so again we have to use some judgement. Chemists are used to working with melting and boiling points, and know what is meant by mp and bp, so perhaps a better definition of the struct could be this:
struct element
{ string name, symbol;
  int at_number;
  float mp_K, bp_K; };
There is an interesting alternative solution to the problem of knowing the units of melting points and boiling points. We could create a new type to represent temperatures as a normal number plus a letter (either F, C, or K) to indicate the units:
struct temperature
{ float value;
  char scale; };
Then we could redefine the element data-type so that it has two of those values instead of two floats:
struct element
{ string name, symbol;
  int at_number;
  temperature mp, bp; };
We could even define special functions for doing arithmetic operations on temperatures, so that the units are automatically checked, and converted if they aren't already the same. That would be an interesting and useful thing, but it will have to wait a few weeks.

*
If we are writing a program that has to deal with real people (perhaps a customer database or an address book application), again we are faced with an enormous amount of data, and the need to decide which items are important. Without knowing the intended use of the program, we can only guess. Normally you will either be told what needs to be recorded, or you'll be able to work it out by knowing what the program is meant for. Let's assume that for each person we need to know their title (Mr., Mrs, Ms., etc), their first name (Sally, Bub, etc), their Middle Initial (J., P., etc), their last name (Jones, Slugge, Smith, etc), their street address (123 N.W. Cat St., etc) their city (Miami, Fort Lauderdale, Timbuktu, etc), their state (FL, GA, etc), their zip code (33333, 12542, etc), their account number (6327627, 0234639, etc), and their phone number (305-111-2222, 212-694-2347, etc). That's a lot, but only a fraction of what might be expected in a real commercial application.
        How should all of those be represented? It is clear that many should be strings. Middle initial could perhaps be a single char, but is that a good decision? What about people with two middle initials, or with none? Maybe a string would be best there too. Zip codes can of course be ints, they have a small size and a well-defined format. Phone numbers can be thought of as ints, but give a range problem. On common computers, the maximum int value is just over 2000000000; that is perfect for nine-digit social security numbers, and probably for account numbers too (but it depends on the company), but isn't quite enough for ten-digit phone numbers. As long long ints aren't really standard C++, phone numbers should probably be strings. Unless we are going to be concerned about international customers, whose addresses might have a completely different format, we can probably leave it like this.
struct person
{ string title, fname, midinit, lname, streetaddr, city, state, phone;
  int zipcode, account; };
Of course, other decisions are possible. We could decide not to record state at all, knowing that the first two digits of a zip code always tell you the state. We could decide that the whole name (title, fname, midinit, lname) could be stored as one big string instead of four little ones. The same could be done for the address, giving:
struct person
{ string name, address, phone;
  int account; };
This would remove all the problems associated with foreign addresses, but would make common operations difficult and inefficient to perform. For example, head office might want to know how many customers live in a particular state, and that would take a lot of work if the state had to be extracted from the address string for every single customer.

2: Using Objects

Once a new struct has been declared, the data-type that it defines can be used anywhere in a program, just like any standard type (int, double, string, etc). You can have variables of type person, arrays of persons, functions that take person parameters, and functions that return person results. Just about anything. Of course, the system will not know how to perform operations like +, *, < on a person, and it won't know how to print or read them, but that makes sense: how could it know how to add two persons together? It just doesn't make sense. Even for temperatures, where the idea does make sense, we can't expect C++ to have built-in knowledge of how temperature scales work. It is out job to teach the system how to perform the required operations on our new data-types by defining functions that do the job.

Already-Known Operations

There are a few things that C++ does already know how to do, even if we don't define a function to do the job. As soon as you define a new struct, C++ is already capable of the following operations: Here are some examples of things that can be done, using a slightly reduced version of the person definition:
struct person
{ string fname, lname, street, city, state;
  int zip;
  string phone; };

void main()
{ person a = { "Jilly", "Jones", "1234 Cat St.", "Monkeyville", "PA", 21432, "414-555-2323" };
  person b = { "Arthur", "O'Pod", "72 N.W. 14th Ave., #22A", "Hellzapoppin", "PA", 21427, "414-555-7264" };
  person c = { "Jane", "Grit", "221B Baker St.", "Frog City", "FL", 33314, "305-555-1234" };

    // this will print Jilly Jones's phone number is 414-555-2323

  cout << a.fname << " " << a.lname << "'s phone number is " << a.phone << endl;

    // this will allow someone to correct b's zip code if it was wrong

  cout << "Enter new zip code for " << b.fname << " " << b.lname << ": ";
  cin >> b.zip;

    // this creates a new person object with exactly the same data as c

  person d = c;

    // Perhaps a and b get married, and b moves in with a:

  b.street = a.street;
  b.city = a.city;
  b.state = a.state;
  b.zip = a.zip;

    // This looks to see if b and d have the same phone number:

  if (b.phone == d.phone)
    cout << "same phone number\n";

Good Design

It is often a good rule of thumb (almost a rule, but not quite) that the objects in your program should correspond fairly exactly with the objects in the real world that your program is trying to model. In most of the examples above, that rule of thumb was followed perfectly, but in the case of the person, it wasn't. A person's address represents a home, a real object. It might be more sensible if we realised that addresses are meant to represent real things, and created a special address data-type for that job. It would simplify the definition of person too:
struct address
{ string streetaddr, city, state;
  int zip; };

struct person
{ string fname, lname;
  address home;
  string phone; };
The examples given above would remain almost the same, but with a few notable exceptions:
void main()
{ person a = { "Jilly", "Jones", { "1234 Cat St.", "Monkeyville", "PA", 21432 }, "414-555-2323" };
  person b = { "Arthur", "O'Pod", { "72 N.W. 14th Ave., #22A", "Hellzapoppin", "PA", 21427 }, "414-555-7264" };
  person c = { "Jane", "Grit", { "221B Baker St.", "Frog City", "FL", 33314 }, "305-555-1234" };

    // this will print Jilly Jones's phone number is 414-555-2323

  cout << a.fname << " " << a.lname << "'s phone number is " << a.phone << endl;

    // this will allow someone to correct b's zip code if it was wrong

  cout << "Enter new zip code for " << b.fname << " " << b.lname << ": ";
  cin >> b.home.zip;

    // this creates a new person object with exactly the same data as c

  person d = c;

    // Perhaps a and b get married, and b moves in with a:

  b.home = a.home;

    // This looks to see if b and d have the same phone number:

  if (b.phone == d.phone)
    cout << "same phone number\n";
We normally expect a better design for the structure of our data-types to result in a smaller, simpler, and clearer program.
*
Now of course, we have to create functions that perform all the common operations on our new data-types. For the examples above, the only operations we can be fairly sure will be needed are those for reading and writing the objects; anything else will be totally application-dependent.
struct address
{ string streetaddr, city, state;
  int zip; };

struct person
{ string fname, lname;
  address home;
  string phone; };

address read_address()
{ address temp;
  cout << "Street addr: ";
  cin >> temp.streetaddr;
  cout << "City: ";
  cin >> temp.city;
  cout << "State: ";
  cin >> temp.state;
  cout << "Zip: ";
  cin >> temp.zip;
  return temp; }

person read_person()
{ person temp;
  cout << "First Name: ";
  cin >> temp.fname;
  cout << "Last Name: ";
  cin >> temp.lname;
  temp.home = read_address();
  cout << "Phone: ";
  cin >> temp.phone;
  return temp; }

void print(address a)
{ cout << a.streetaddr << ", " << a.city << ", " a.state << " " << a.zip };

void print(person p)
{ cout << p.fname << " " << p.lname << ", of ";
  print(p.home);
  cout << ", tel: " << p.phone); }
There is a difference between the styles of definition of the print functions and the read functions. We can have two functions called print because they have differently typed parameters, so the system can always tell which one to use just by looking at the parameter. The read functions return a new object as their result, so they do not need parameters. That means there is no information the system can use to tell which is needed; they can't both be called read, the name needs to provide the necessary information.
        In many cases, possibly even this one, it may be preferrable to have nice simple unforgettable names for all functions. If we were to redefine the read functions so that they are given a parameter (the empty object to read data into) instead of returning a result, this could be done. (Rememeber that only reference parameters (with an & in their declaration) can successfully be modified by a function).
        The alternative version of the two read functions would be:
void read(address & a)
{ cout << "Street addr: ";
  cin >> a.streetaddr;
  cout << "City: ";
  cin >> a.city;
  cout << "State: ";
  cin >> a.state;
  cout << "Zip: ";
  cin >> a.zip; }

void read(person & p)
{ cout << "First Name: ";
  cin >> p.fname;
  cout << "Last Name: ";
  cin >> p.lname;
  read(p.home);
  cout << "Phone: ";
  cin >> p.phone; }
So now, if we want to read the information on a whole load of people into a database, we could simply create a large array of person objects to be the database, and have a simple loop:
const int max=1000;

person database[max];
int num=0;

.
.
.

void main()
{ .
  .
  .
  while (num<max)
  { cout << "More data to enter? (Y or N) ";
    string s;
    cin >> s;
    if (s=="N") break;
    if (s!="Y") continue;
    read(database[num]);
    num+=1; }
  .
  .
  .
  cout << "The database contains " << num << " people:\n";
  for (int i=0; i<num; i+=1)
  { cout << i << ": ";
    print(database[i]);
    cout << "\n"; }
  .
  .
  . }
Everything becomes much simpler, doesn't it?

You should note that there is a slight flaw in this implementation. It is not a problem with the objects themselves; everything in that respect is perfectly correct. The problem is with using cin for input. When cin is told to read a string, it skips over any White-Space (that is spaces, tabs, and end-of-lines) that appear before any solid characters, then it reads as many solid (visible) characters as are available, but it stops reading as soon as it meets another white-space character. This means that if comeone dares to have a space in their address (for example "123 N.W. Cat St.", then cin >> a.streetaddr; will just read the "123", then stop, leaving the "N.W." to be read as the city.
        If you really want to use cin and cout for input and output, you will have to learn how to take control of them, and it isn't always easy. There is a function called getline, whose job is to read a whole line of input, regardless of whether or not it contains spaces. As an illustration of it, here is a handy little function:
string read_line()
{ string temp
  getline(cin, temp);
  return temp; }
As usual, cin.eof() and cin.fail() become true after using getline if it failed to read anything.