523 First Test, 11 Feb 99
Disc Things
"Surface" means what it means in English. It is one of the possibly many
magnetisable rotating disc surfaces used to store data.
"Track" is a circular path on one surface, about the centre of the disc.
It is the path followed by the heads if they don't move while the disc rotates.
Data is written in one-bit-wide streams along the tracks.
"Block" is a division of a track. Tracks are split into equal-sized
segments called blocks. Between each block there is an unused portion
of disc surface; this allows some leeway in the timing of reads and writes.
A block is the smallest amount of data that can be read or written at one time.
"Sector" is the collection of blocks at the same angular position over
all the tracks of a surface. Blocks are aligned so that the Nth block on
one track is at the same angular position as the Nth block on all other tracks.
The number of sectors is the same as the number of blocks per track.
"Cylinder" is the collection of all tracks at the same radial position over
all the surfaces of a disc. The number of cylinders is the same as the number
of tracks (per surface).
"Cluster" is a collection of 1 or more blocks, as defined by the operating
system. To save space in the tables that record which blockas are in use,
blocks are often grouped into clusters each represented by a single bit.
A cluster is the minimum unit of disc space that can be allocated at one time.
Most discs these days have 512 bytes per block and 63 sectors, which comes to
roughly 32K per track. A 4GB disc would have to have a total of 4G/32K = 125K
tracks. Typically there are 10K or more tracks per surface, so maybe we would
have 12500 tracks on each of 10 surfaces. That means also 12500 cylinders
and a total of 4G/512 = 8M blocks. The number of clusters is determined by the
operating system; A cluster could be anything from 1 to 63 blocks, so we could
have anything from 8M/63 = 125K to 8M clusters.
"track-to-track seek time" is the amount of time it takes to move the heads
from one track to any other track. The time taken to actually move is very small;
most of the time is spent waiting for the heads to settle in a stable position.
"rotational latency" is the time spent waiting for the disc to rotate to the
right position for a read or write operation.
A disc rotating at 5400RPM takes 60/5400 seconds = 11.11 mS to make a full rotation;
the average wait will only be for half a rotation, so the rotational latency
should be about 5.5mS. Track-to-track seek time does not depend on rotational speed;
it is typically in the region of 3mS.
That makes the average access time 8.5mS.
We have already calculated that there are 8M blocks in the disc; if the
average access takes 8.5mS, it would take a total of 8Mx8.5mS = 68000 seconds
= 19 hours to read each block individually and count the number of times "cat"
appears.
However, if we know we are searching the whole disc, we could speed things up by
reading and examining whole tracks at a time. It takes one full rotation time
to read a whole track (no need to wait for the start if we're reading it all),
which is 11mS, and we can read all the tracks of a cylinder without having to
move the heads. We assumed 10 surfaces, so 110mS is enough to read
a whole cylinder. Between each cylinder we do have to mobe the
heads, so the total is 113mS times 12500 cylinders = 1412 seconds = 24 minutes.
With some systems it is possible to read more than one track of the same
cylinder concurrently, so the best conceivable speed would come from reading
all 10 tracks of each surface at the same time, reducing it to 11mS + 3mS
per cylinder, times 12500 cylinders = 175 seconds, or three minutes.
Circus Animals
You were asked to define in SQL a database that keeps track of animals owned
by North American circusses. For each animal we must record name, species, birth-date,
parents (if also in the database), trainer (including his name, address, social security number,
and birth date), and which circus owns the animal (including circus name and address,
and circus owner's name, address, and social security number).
It seems most sensible to split the data into three tables, one for animals,
one for circusses, and one for people. Each circus and person needs to be given a
unique primary key. People already have their social security number, and we
can invent a circus serial number. For ease of use, and to prevent ambiguity in
the future, we might as well also give each animal a unique animal serial
number.
create table animals
( id int, // never 0
name char(20),
species char(10),
birthdate int,
fatherid int, // 0 = no father
motherid int, // 0 = no mother
trainerid int // 0 = no trainer
ownerid int );
create table circusses
( id int,
name char(30),
address char(50),
ownerid int );
create table people
( ssn int,
name char(30),
address char(50),
birthdate int // may be 0 for circus owners
);
The queries:
- Who trained Flossie the Elephant?
select p.ssn, p.name, p.address
from people p, animals a
where a.name='Flossie'
and a.species='Elephant'
and a.trainerid=p.ssn;
- What other elephants were trained by the same person as Flossie?
select x.id, x.name, x.species
from animals x, animals a
where a.name='Flossie'
and a.species='Elephant'
and x.trainerid=a.trainerid
and x.id<>a.id;
- Which circusses own Flossie's brothers and sisters?
select c.id, c.name, 'owns', x.name
from circusses c, animals a, animals x
where a.name='Flossie'
and a.species='Elephant'
and (x.mother=a.mother or x.father=a.father)
and x.id<>a.id
and x.ownerid=c.id;
- How many of Flossie's children are owned by circusses?
select count(x.*)
from animals a, animals x
where a.name='Flossie'
and a.species='Elephant'
and (x.motherid=a.id or x.fatherid=a.id)
and x.ownerid<>0;
- Who owns more than one circus?
select p.ssn, p.name
from people p, circusses a, circusses b
where a.ownerid=p.ssn
and b.ownerid=p.ssn
and a.id<>b.id
(this will produce repetitions in the output, but that isn't necessarily
a problem).
The Difficult One