Arrays that can Grow

When we declare an array like this:
Person people[1000];
there are many significant disadvantages. We have created 1000 person objects all at once. They will occupy a large amount of space, even if we aren't ready to use all of them yet. We had no choice though, because arrays can't really grow; if we are going to need 1000 at some time in the future, there has to be space for 1000 all along. Also, the constructor got called 1000 times, and that could involve a lot of effort from the CPU. That effort is very likely to be completely wasted; in the future, when we are ready to put a person in the array, we are liekly to do something like this:
{ Person temp("Fred","Smithhh","123 Ant St","Frog City");
  people[next]=temp;
  next+=1; }
A totally new person is created, (with the constructor being called of course) and stored in temp. When that Person is copied into the array (on the second line) it completely overwrites the person that was already in the array at that person. All the effort expended in creating that person is totally wasted. Plus that innocent little assignment on the second line could really involve a lot of work; it has to copy every single byte of data from temp into the Person at people[next]. We really want something better.

        The first improvement is made by using pointers to Persons instead of direct Persons in the array, like this:
Person *people[1000];
That declaration creates an array of 1000 pointers-to-Persons. No actual Persons are created, so no constructors are called (pointers can't have constructors of their own). Also, the amount of memory consumed is 1000 x the size of a pointer, rather than 1000 x the size of a Person, so while we are only using a few Persons in the array, we are saving a lot of memory.
        When we want to put a Person in the array, we must of course use a pointer (because that is how the array was declared), so the piece of program would be slightly different:
{ Person *temp;
  temp=new Person("Fred","Smithhh","123 Ant St","Frog City");
  people[next]=temp;
  next+=1; }
I wrote it in four lines so that you can more clearly see the separate operations: declare a temporary pointer variable, create a person, add the person to the array, and update the count. I could have written it like this:
{ Person *temp=new Person("Fred","Smithhh","123 Ant St","Frog City");
  people[next]=temp;
  next+=1; }
in which case you can see that the visual difference is very slight. What happens inside the computer is very different though. The assignment on the second line is this time just copying a pointer, one single little 4-byte value. It is a very fast and efficient operation.
        The rest of the program would be more-or less unchanged. We would just have to remember to use an "->" instead of a "." when accessing members of elements of the array. Also, because we used new to create the actual Persons, we have consumed some permanent memory, and are responsible for releasing it when it is no longer needed. For example, if a Person is to be removed from the array, and never accessed again, we would not just say:
people[deadun]=NULL;
this would correctly remove the Person from the array and make it inaccessible, but it would not release the memory occupied by the Person. Instead, we would use:
delete people[deadun];
people[deadun]=NULL;
Only use delete when the object in question is really done-for, and never to be used again. If you are just removing an objet from the array, but still intend to use it elsewhere, don't delete it.


So, we have saved a lot of memory and time by using an array of pointers, but we still have the problem of arrays having a fixed size. Once we have declared an array like this char thing[1000]; its size is fixed for ever. The same rule applies whether we are talking about an array of chars, Persons or pointers-to-Persons.
        Although we can never make an array change its size, we can do something that at least makes it appear to change its size. To avoid too much confusion, we'll step down to considering an array of ints instead of Persons for a while...

In C and C++ when a variable is declared with the type "int *", it is capable of holding the address of an integer variable. That variable might be a single variable all on its own, or it might be part of a huge array of integers. As far as C and C++ are concerned, it doesn't make any difference. So, in this bit of program:
{ int *p;
  int a;
  int b[1000];
I can properly do any of these four assignments: p=&a; p=&b; p=&b[0]; p=&b[39]; In the first case, p is made to point to a. In the second case, p is made to point to b; b is such a big thing (probably 4000 bytes long) that we have to think about what pointing to it really means. The rule, as you know very well by now, is that when a pointer points to something, it contains the address of the very first byte of memory that that something occupies. So, p=&b makes p contain the address of the first byte occupied by the array b. That is of course, the byte (or one of the bytes) occupied by b[0], so pointing to an array of integers is exactly the same as pointing to a single element of that array. "&b" is exactly the same thing as "&b[0]".
        EXCEPT that C++ has an irritating inconsistency. It always treats the name of an array as a pointer to the beginning of that array. It is as though C++ always automatically slips an & in there for you. That means that if you actually type the & that is logically required, C++ sees two &s, and calls it an error.         So everything is as stated above, except that you must always type "p=b" instead of "p=&b" when b is an array.

        What this means is that pointer declarations in C and C++ can be very confusing. When you see something like "int *p;", or "Person *p", you can't possibly tell whether p is supposed to be a pointer to a single thing, or to a whole array of them. (Aside: this confusion is not normal in programming languages. For example, is Pascal a pointer to an integer is declared like this: "var p: ^integer;", whereas a pointer to an array of integers is declared like this: "var p: ^array[1..1000] of integer")

        This means that there are two ways to create an array of integers. One is with a normal declaration like this "int a[1000]"; it creates an array of 1000 ints in the temporary memory of the current stack frame. The name a is a constant, referring to the beginning of that array.
        The second way is in two stages like this "int *a; a=new int[1000]"; it creates an array of 1000 ints in the permanent memory area, and makes the variable a point to it. In both cases, we can use a in exactly the same way: as an array of integers. In both cases a[0] gets you the first element of the array, and a[999] gets the last. There is no easily detectable difference.
        Of course, it is normal to combine the declaration of a and its assignment into a single statement: "int * a = new int [1000];".

The one detectable difference between the two ways, is that the first method makes a be a constant pointer, so we can't possibly make it point to a different array if for some reason we decide we don't like the original one any more. On the other hand, the second way makes a be a perfectly normal variable. We could make it point somewhere else if we want to. In particular, if we find that the 1000 elements of a isn't enough any more, even though we can't make the array grow, we can replace it with a bigger one.
        Here's a skeletal little example:
{ int *A=new int[100];
  // happily use the array A[0] to A[99] for a while
  for (int i=0; i<100; i+=1)
  { sum = sum + A[i];
    product = product * A[i]; }
  // happy happy happy smile smile smile
  // Now realise that A isn't big enough. We need 10000 elements, so...
  A=new int[10000];
  // That's all we need to do. Now we can happily use A[0] to A[9999].
  // It could hardly be easier.
One annoying little fact is that when we say "A=new int[10000];", we have replaced the old array by a totally new one. Any information left in the old one is totally lost. Normally, we would add a little loop to copy everything from the old array before it is lost. The whole idea could be encapsulated in a little function that is used whenever we want to change the size of an array of ints. It might look something like this:
int *ChangeIntArraySize(int *A, int oldsize, int newsize)
{ int *newarray=new int[newsize];
  int i, smallest;
  if (oldsize<newsize) smallest=oldsize; else smallest=newsize;
  for (i=0; i<smallest; i+=1)
    newarray[i]=A[i];
  delete[] A;
  return newarray; }
Inside this function, the smallest size is calculated, because sometimes we may want to shrink an array instead of grow it. The strange statement "delete[] A;" is used to activate the memory recycling system. Any time you use new to allocate new memory, you are responsible for using delete to release that memory when you have finished with it. You are not required to release memory that you have finished using, but you are responsible for it. If your program doesn't use much memory, or if it is about to terminate anyway, there isn't much point in using delete (If we knew that the world was going to be destroyed in 6 weeks, how much effort would we put into recycling aluminium cans?) On the other hand, if your program is going to continue to run for some while, and it keeps using new without ever using delete, it will eventually run out of memory all together, and die. The empty pair of square brackets after the word delete is to remind the system that it is to delete a whole array, not just one variable.
        In the previous example, we would use the function like this:
{ int *A=new int[100];
  // happily use the array A[0] to A[99] for a while
  for (int i=0; i<100; i+=1)
  { sum = sum + A[i];
    product = product * A[i]; }
  // happy happy happy smile smile smile
  // Now realise that A isn't big enough. We need 10000 elements, so...
  A=ChangeIntArraySize(A,100,10000);
  // That's all we need to do. Now we can happily use A[0] to A[9999].
  // It could hardly be easier.



There is nothing special about ints. The same trick could be used to grow an array of Person objects, or even an array of Pointer-to-Persons. Remember that a growable array of ints has to be declared as "int *A"; so a growable array of Person objects would have to be declared as "Person *A"; and a growable array of Pointer-to-Persons would have to be declared as "Person **A"; There is always an extra star in there.
        Everything else stays the same. A program that deals with an unknown number of Persons might start off with an array of just a hundred, but provide itself with a special function for growing the array, exactly as above, when needed. In the last example, we needed to provide the array-growing function with the two numbers 100 and 10000; if it didn't know the original array size, it wouldn't be able to copy all the old information safely into the new array. So there would also need to be an integer variable telling us how big the array is at the moment.
The example becomes:
{ Person **everyone=new (Person*)[100];
  int array_size=100;
  // happily use the array everyone[0] to everyone[99] for a while
  sum=0;
  for (int i=0; i<100; i+=1)
  { sum = sum + everyone[i]->weight; }
  printf("Average weight = %d\n", sum/100);
  // happy happy happy smile smile smile
  // Now realise that it isn't big enough. We need 10000 elements, so...
  everyone=IncreaseArray(everyone,array_size,array_size+100);
  array_size=array_size+100;
  // That's all we need to do. Now we can happily use everyone[0]
  // to everyone[199]. It could hardly be easier.


Now that we have two special variables, and a special function that has to be used in a very special way, we might think of implementing the whole idea of a growable array of Persons as a new object or class in its own right. Perhaps we could call it a Population, defining a class like this:
class Population
{ public:
    Person **everyone;
    int array_size;
    Population();         // the constructor
    void GrowArray(int newsize);
    Person *person(int i);    // the access function
    etc etc etc; };
Then we could write little programs like this (some of the details have been ignored. We'll get to them another day).
void main()
{ Population world;
  // add data here.
  // play around with the data
  world.GrowArray(1000);
  // play around with the larger array
  printf("This is the entire population of the world:\n");
  for (int i=0; i<1000; i+=1)
    world.person(i)->print();
  // continue playing...


In fact, C++ lets us define operators with very strange properties. We can define the square-brackets operation that is usually used to access entries from an array, so that it also applies to a Population object, returning a "reference" to a particular person's location in the array, and automatically increasing the array size for us if necessary. That would allow us to write programs like this:
void main()
{ Population world;
  world[1]=new Person("Fred","Astaire","3982492 NE 13342nd St");
  world[100000]=new Person("Ginger","Rogers","3982494 NE 13342nd St");
  printf("This is the entire population of the world:\n");
  for (int i=0; i<world.size; i+=1)
    world[i]->print(); }
Which would really simplify programming a lot. That's one of the aims of object oriented programming: you have to learn a lot before you can use it effectively, but once you have cleared that hurdle, and got quite a lot of experience with it, it can allow you to write much simpler and much much more understandable programs.

The definition of the square-brackets operator used above looks very strange. I'll show you it now, just for completeness, but we won't be examining it properly for another couple of weeks. Here it is:
Person * & Population::operator[](int n)
{ if (n>=array_size)
  { Person **temp=new (Person *)[n+1];
    for (int i=0; i<array_size; i+=1)
      temp[i]=everyone[i];
    array_size=i+1; }
  return (everyone[i]); }
Anyway, as I just said, don't worry about this odd operator. Concentrate on understanding the use of "TYPE **p;" to create a growable array of Pointer-to-TYPEs, and the way such things work.