523 First Test, 11 Feb 99

Disc Things

"Surface" means what it means in English. It is one of the possibly many magnetisable rotating disc surfaces used to store data.
"Track" is a circular path on one surface, about the centre of the disc. It is the path followed by the heads if they don't move while the disc rotates. Data is written in one-bit-wide streams along the tracks.
"Block" is a division of a track. Tracks are split into equal-sized segments called blocks. Between each block there is an unused portion of disc surface; this allows some leeway in the timing of reads and writes. A block is the smallest amount of data that can be read or written at one time.
"Sector" is the collection of blocks at the same angular position over all the tracks of a surface. Blocks are aligned so that the Nth block on one track is at the same angular position as the Nth block on all other tracks. The number of sectors is the same as the number of blocks per track.
"Cylinder" is the collection of all tracks at the same radial position over all the surfaces of a disc. The number of cylinders is the same as the number of tracks (per surface).
"Cluster" is a collection of 1 or more blocks, as defined by the operating system. To save space in the tables that record which blockas are in use, blocks are often grouped into clusters each represented by a single bit. A cluster is the minimum unit of disc space that can be allocated at one time.

Most discs these days have 512 bytes per block and 63 sectors, which comes to roughly 32K per track. A 4GB disc would have to have a total of 4G/32K = 125K tracks. Typically there are 10K or more tracks per surface, so maybe we would have 12500 tracks on each of 10 surfaces. That means also 12500 cylinders and a total of 4G/512 = 8M blocks. The number of clusters is determined by the operating system; A cluster could be anything from 1 to 63 blocks, so we could have anything from 8M/63 = 125K to 8M clusters.

"track-to-track seek time" is the amount of time it takes to move the heads from one track to any other track. The time taken to actually move is very small; most of the time is spent waiting for the heads to settle in a stable position.
"rotational latency" is the time spent waiting for the disc to rotate to the right position for a read or write operation.

A disc rotating at 5400RPM takes 60/5400 seconds = 11.11 mS to make a full rotation; the average wait will only be for half a rotation, so the rotational latency should be about 5.5mS. Track-to-track seek time does not depend on rotational speed; it is typically in the region of 3mS.

That makes the average access time 8.5mS.

We have already calculated that there are 8M blocks in the disc; if the average access takes 8.5mS, it would take a total of 8Mx8.5mS = 68000 seconds = 19 hours to read each block individually and count the number of times "cat" appears.
However, if we know we are searching the whole disc, we could speed things up by reading and examining whole tracks at a time. It takes one full rotation time to read a whole track (no need to wait for the start if we're reading it all), which is 11mS, and we can read all the tracks of a cylinder without having to move the heads. We assumed 10 surfaces, so 110mS is enough to read a whole cylinder. Between each cylinder we do have to mobe the heads, so the total is 113mS times 12500 cylinders = 1412 seconds = 24 minutes.
With some systems it is possible to read more than one track of the same cylinder concurrently, so the best conceivable speed would come from reading all 10 tracks of each surface at the same time, reducing it to 11mS + 3mS per cylinder, times 12500 cylinders = 175 seconds, or three minutes.

Circus Animals

You were asked to define in SQL a database that keeps track of animals owned by North American circusses. For each animal we must record name, species, birth-date, parents (if also in the database), trainer (including his name, address, social security number, and birth date), and which circus owns the animal (including circus name and address, and circus owner's name, address, and social security number).

It seems most sensible to split the data into three tables, one for animals, one for circusses, and one for people. Each circus and person needs to be given a unique primary key. People already have their social security number, and we can invent a circus serial number. For ease of use, and to prevent ambiguity in the future, we might as well also give each animal a unique animal serial number.
create table animals
( id int,                       // never 0
  name char(20),
  species char(10),
  birthdate int,
  fatherid int,                 // 0 = no father
  motherid int,                 // 0 = no mother
  trainerid int                 // 0 = no trainer
  ownerid int );

create table circusses
( id int,
  name char(30),
  address char(50),
  ownerid int );

create table people
( ssn int,
  name char(30),
  address char(50),
  birthdate int                 // may be 0 for circus owners
  );

The queries:
  1. Who trained Flossie the Elephant?
    select p.ssn, p.name, p.address
    from people p, animals a
    where a.name='Flossie'
      and a.species='Elephant'
      and a.trainerid=p.ssn;
  2. What other elephants were trained by the same person as Flossie?
    select x.id, x.name, x.species
    from animals x, animals a
    where a.name='Flossie'
      and a.species='Elephant'
      and x.trainerid=a.trainerid
      and x.id<>a.id;
  3. Which circusses own Flossie's brothers and sisters?
    select c.id, c.name, 'owns', x.name
    from circusses c, animals a, animals x
    where a.name='Flossie'
      and a.species='Elephant'
      and (x.mother=a.mother or x.father=a.father)
      and x.id<>a.id
      and x.ownerid=c.id;
  4. How many of Flossie's children are owned by circusses?
    select count(x.*)
    from animals a, animals x
    where a.name='Flossie'
      and a.species='Elephant'
      and (x.motherid=a.id or x.fatherid=a.id)
      and x.ownerid<>0;
  5. Who owns more than one circus?
    select p.ssn, p.name
    from people p, circusses a, circusses b
    where a.ownerid=p.ssn
      and b.ownerid=p.ssn
      and a.id<>b.id
    (this will produce repetitions in the output, but that isn't necessarily a problem).

The Difficult One