18th September 2001

18^th September 2001

Dealing with multi-field data records

Thinking back to the problem of recording events: when the event happened, the name of the person to whom it happened, and a description of the event itself. A program that reads a file-full of such events would be quite easy to write, provided that we have some idea of the maximum possible number of such events:

#include <iostream>

#include <fstream>

#include <string>

const int MaxEvents = 2000;

int date[MaxEvents], time[MaxEvents];

string person[MaxEvents], event[MaxEvents];

int NumEvents=0;

void ReadTheData()

{ ifstream file("data.txt");

NumEvents=0;

if (file.bad())

{ cout << "Error, can't open data.txt\n";

return; }

while (1)

{ int d, t;

string p, e;

file >> d >> t >> p >> e;

if (file.eof()) break;

date[NumEvents]=d; time[NumEvents]=t; person[NumEvents]=p; event[NumEvents]=e;

NumEvents+=1; } }

As we discussed in class, a useful operation might be finding the record describing the (chronologically) first event. Conceptually very simple, and not too hard to code.

int FindEarliestInRange(int first, int last)

{ int EarliestSoFar = first;

for (int i=first+1; i<=last; i+=1)

{ if (date[i]<date[EarliestSoFar] ||

date[i]==date[EarliestSoFar] && time[i]<time[EarliestSoFar])

EarliestSoFar = i; }

return EarliestSoFar; }

This isn't too bad, but if there were more than just date and time involved in the comparison, the if statement would get horribly complicated. After that, we designed a sorting function:

void sort()

{ for (int pos=0; pos<NumEvents; pos+=1)

{ int earliest = FindEarliestInRange( pos, NumEvents-1 );

int ti;

ti=date[earliest]; date[earliest]=date[pos]; date[pos]=ti;

ti=time[earliest]; time[earliest]=time[pos]; time[pos]=ti;

string ts;

ts=person[earliest]; person[earliest]=person[pos]; person[pos]=ts;

ts=event[earliest]; event[earliest]=event[pos]; event[pos]=ts; } }

Again, simple in concept (I hope you remember from class how it works, because I'm not going to type the explanation here), but the very basic operation of swapping the earliest^th item with the pos^th item has become painfully overblown.

All the unpleasantness of this program is caused by the fact that we have to use four separate arrays to store the data. If only C++ had a built-in type to represent one of our events, the whole thing would be so much simpler:

#include <iostream>

#include <fstream>

const int MaxEvents = 2000;

EventRecord event[MaxEvents];

int NumEvents=0;

void ReadTheData()

{ ifstream file("data.txt");

NumEvents=0;

if (file.bad())

{ cout << "Error, can't open data.txt\n";

return; }

while (1)

{ EventRecord e;

file >> e;

if (file.eof()) break;

event[NumEvents]=e;

NumEvents+=1; } }

int FindEarliestInRange(int first, int last)

{ int EarliestSoFar = first;

for (int i=first+1; i<=last; i+=1)

{ if (event[i]<event[EarliestSoFar])

EarliestSoFar = i; }

return EarliestSoFar; }

void sort()

{ for (int pos=0; pos<NumEvents; pos+=1)

{ int earliest = FindEarliestInRange( pos, NumEvents-1 );

EventRecord temp;

temp=event[earliest]; event[earliest]=event[pos]; event[pos]=temp; } }

Of course, it would be totally unrealistic to expect a general purpose programming language like C++ to already have a built-in understanding of the kinds of events that we are dealing with. There is an infinite variety to the possible ways that events (or anything else) could be defined, and there is no way they could all have been predicted by the designers of C++.

So, we won't find anything called EventRecord already existing in C++, ready to make our program as simple as the one above, but something almost as good does exist.

C++ gives programmers the ability to define their own data types. Even though EventRecord is unknown to C++, we can write some extra stuff in our program that teaches C++ what we want an EventRecord to be like. It takes a bit of extra work (because we have to write a very exact description of exactly how these new EventRecords are supposed to behave), but when that extra bit of work has been done just once, we really will be able to write simplified programs just like the one above. For anything beyond really trivially small programs, the extra effort at the beginning really does pay off.

This is what Object Oriented Programming is all about.

When we define the behaviour of an EventRecord, we are creating a new kind of Object, and the way we write programs is supposed to be oriented towards manipulating and defining these objects.

Defining New Kinds of Objects

First, we think about what pieces of data are required to make up one of our new objects. Fortunately, we have already worked that out: we need an int for the date, an int for the time, a string for the person, and a string for the event-description. The first step is to define a new Object Class ("Class" means "kind of" in C++ terminology) containing those four data items:

struct EventRecord

{ int date, time;

string person, description; };

For really simple needs, just that would be good enough. Putting those three lines in a C++ program is enough to create a new Object Class. We have not said anything about how this new kind of object should behave, so it won't do very much for us, but it is already usable.

This declaration says that from now on in the program, the name "EventRecord" represents a type, just like the familiar "int", "float", and "string" already do. We can create a variable whose type is EventRecord in exactly the way we can create a variable whose type is "int" or "string":

void main()

{ int w;

EventRecord x, y;

string z;

...

An EventRecord is a container for other smaller pieces of information. Every EventRecord will contain within it two ints and two strings. It is possible to access and play around with the EventRecord as a whole, or with its component parts, without any restrictions. For example, all of the following are allowed, and mean exactly what you would expect them to mean:

x.date = y.date+1;

if (y.person == "Mrs. Smiggins") cout << "found her!\n";

y.decription = z;

x = y;

The first three example lines show how to access the individual components of a EventRecord object: if x is an EventRecord object, then x.date, x.time, x.person, x.description all behave like perfectly normal variables. In fact they are perfectly normal variables.

The fourth line is the important one, it illustrates that an EventRecord can be treated as a single item just like an int or a string. The assignment x=y copies each of the four components of y into the corresponding components of x. It is like getting four separate assignment statements for the price of typing just one.

We could also write functions that take EventRecord parameters:

void print(EventRecord e)

{ cout << e.date << ":" << e.time << ":"

<< e.person << ":" << e.description << "\n"; }

which can be called in exactly the way you would expect:

print(x);

Even this simple beginning can make a lot of our example program much smaller and easier to understand:

struct EventRecord

{ int date, time;

string person, description; };

const int MaxEvents = 2000;

EventRecord event[MaxEvents];

int NumEvents=0;

void ReadTheData()

{ ifstream file("data.txt");

NumEvents=0;

if (file.bad())

{ cout << "Error, can't open data.txt\n";

return; }

while (1)

{ EventRecord e;

file >> e.date >> e.time >> e.person >> e.description;

if (file.eof()) break;

event[NumEvents] = e;

NumEvents+=1; } }

int FindEarliestInRange(int first, int last)

{ int EarliestSoFar = first;

for (int i=first+1; i<=last; i+=1)

{ if (event[i].date < event[EarliestSoFar].date ||

(event[i].date == event[EarliestSoFar].date &&

event[i].time < event[EarliestSoFar].time) )

EarliestSoFar = i; }

return EarliestSoFar; }

void sort()

{ for (int pos=0; pos<NumEvents; pos+=1)

{ int earliest = FindEarliestInRange( pos, NumEvents-1 );

EventRecord temp;

temp=event[earliest]; event[earliest]=event[pos]; event[pos]=temp; } }

The sort function is already as simple as I suggested it might become; ReadTheData hasn't changed much, but is a little simpler; FindEarliestInRange has got a little bit worse. But we have only just started making improvements. You only need to learn how to make constructors and methods, then everything will fall into place.

Another big improvement can be made right away: It would be easy to write a function that takes two EventRecord objects, and works out which comes first chronologically. To make it useful, we would probably make it return true if the first object comes first, and false if it doesn't:

struct EventRecord

{ int date, time;

string person, description; };

bool ObjectsAreInOrder (EventRecord a, EventRecord b)

{ if (a.date < b.date) return true;

if (a.date == b.date && a.time < b.time) return true;

return false; }

We could then use that function in the FindEarliestInRange function, to tidy it up. We could replace the ugly statement

if (event[i].date < event[EarliestSoFar].date ||

(event[i].date == event[EarliestSoFar].date &&

event[i].time < event[EarliestSoFar].time) )

with the pretty statement

if (ObjectsAreInOrder(event[i], event[EarliestSoFar]))

And that would actually work. But wouldn't it be nice if we could somehow tell it that the < operator between two EventRecords should behave just like that? We can. In C++ you can define how operators behave in almost exactly the same way as you can define functions.

bool operator < (EventRecord a, EventRecord b)

{ if (a.date < b.date) return true;

if (a.date == b.date && a.time < b.time) return true;

return false; }

That's all it takes, now look at how much nicer FindEarliestInRange becomes:

int FindEarliestInRange(int first, int last)

{ int EarliestSoFar = first;

for (int i=first+1; i<=last; i+=1)

{ if (event[i] < event[EarliestSoFar])

EarliestSoFar = i; }

return EarliestSoFar; }

Again, a little bit of extra work at the beginning of a program has really paid off and made the whole thing smaller, simpler, easier to understand, and therefore more likely to be correct.