Duality of Tables and Relations
an example
Consider two tables in a database:
- Table ANIMALS
- id: int;
- name: char(20);
- circus_id: int;
- species: char(12);
- birthdate: int;
- etc etc etc
- Table PEOPLE
- id: int;
- name: char(20);
- address: char(50);
- favourite_colour: char(12);
- birthdate: int;
- home_town: char(15);
- etc etc etc
If we want to pair people with animals born on the same date, a simple
SQL query would do the trick:
SELECT p.name, a.name
FROM people p, animals a
WHERE p.birthdate=a.birthdate
The condition p.birthdate=a.birthdate is a simple calculated
condition; in an ER diagram we would naturally represent it with a
diamond shape. However, we could just
ask that query once, then create a new table:
- Table SAMEBIRTHDAYS
- person_id: int;
- animal_id: int;
and just copy the results of the query into it. If we have a sufficiently
sophisticated implementation of SQL, we could even automate that task, with
this command:
INSERT INTO samebirthdays
SELECT p.id, a.id
FROM people p, animals a
WHERE p.birthdate=a.birthdate
Then, any future query about people and animals who were born on the same
days could be answered by a table lookup instead of the computation.
For example, this query:
SELECT p.name, p.address, a.name, c.name
FROM people p, animals a, circusses c
WHERE p.birthdate=a.birthdate AND a.circus_id=c.id
could become:
SELECT p.name, p.address, a.name, c.name
FROM people p, animals a, samebirthdays s, circusses c
WHERE p.id=s.person_id AND s.animal_id=a.id AND a.circus_id=c.id
Even for simple queries like this, it could speed things up (perhaps
for some reason you can't create an index on animal.birthdate but you can
create an index on a new "samebirthdays" table); if the condition is complex,
converting it into a table could make a really significant difference to the
speed.
It should be clear that the same idea could be applied to any condition: any
true-or-false expression can be precomputed and stored in a new table; diamonds
in an ER diagram can always be converted into rectangles if you are willing
to expend the storage space. It may not be sensible
to do so, but it is always possible.
Now consider a database with four tables, extracts from which are shown below:
AnimalNames
|
---|
id | name
|
---|
17 | Flossie
|
... | ...
|
102 | Oinky
|
... | ...
|
144 | Fluffy
|
... | ...
|
Species
|
---|
id | name | kind | size
|
---|
4 | Elephant | mammal | large
|
... | ... | ... | ...
|
13 | Seal | mammal | medium
|
... | ... | ... | ...
|
92 | Penguin | bird | smallish
|
... | ... | ... | ...
|
Circusses
|
---|
id | name
|
---|
1 | Ringling Brothers
|
2 | Al's Discount Circus
|
3 | Barnum and Bailey's
|
... | ...
|
6 | Circusses 'R' Us
|
... | ...
|
Animals
|
---|
id | nameid | speciesid | ownerid |
|
---|
1 | 144 | 4 | 2
|
2 | 17 | 13 | 3
|
3 | 102 | 4 | 6
|
... | ... | ... | ...
|
This collection of tables results from avoiding the redundancies that naturally arise
because lots of animals have the same names, lots of animals have the same species, and
each circus owns lots of animals. The original non-normalised table might have looked
like this:
Animals
|
---|
id | name | species | owner |
|
---|
1 | Fluffy | Elephant | Al's Discount Circus
|
2 | Flossie | Seal | Barnum and Bailey's
|
3 | Oinky | Elephant | Circusses 'R' Us
|
... | ... | ... | ...
|
so you can see that normalisation probably saved a lot of space.
Anyway, The first version of the "Animals" table (with only numbers in it) could be replaced
by a computation:
name(id) = 106*id*id - 445*id + 483
species(id) = -9*id*id + 36*id - 23
owner(id) = id*id - 2*id + 3
You can easily verify that this is correct: name(2) = 106*2*2-445*2-483
= 17. The point is that
mathematics guarantees that a formula exactly fitting any data, no matter how unnatural
it seems, can always be produced. It takes a lot of work to produce the formula, and
it is easy to make a mistake along the way, but it can be done. The formulæ will
generally involve really big numbers, so computation can take a long time, but it is always
possible. We can even simulate indexes on different fields by working out an inverse
formula, for example:
id(owner) = -1*owner*owner/6 + 11*owner/6 -2
so that id(6) = -1*6*6/6+11*6/6-2 = 3.
So, a query that involves accessing data from the animals table, such as:
SELECT n.name
FROM animalnames n, animals a, circusses c
WHERE c.name='Circusses R Us'
AND a.ownerid=c.id
AND a.nameid=n.id
(which finds the name of the animal owned by 'Circusses R Us') can be replaced by an
equivalent query that calculates the connection instead:
SELECT n.name
FROM animalnames n, circusses c
WHERE c.name='Circusses R Us'
AND n.id=106*(-1*c.id*c.id/6+11*c.id/6-2)*(-1*c.id*c.id/6+11*c.id/6-2)
-445*(-1*c.id*c.id/6+11*c.id/6-2)
+483
The same transformation could be applied to any table.
It os not often a sensible thing to do, but any condition can be turned into a table, and
any table can be turned into a condition, so tables and conditions are really equivalent
at a deep level. There is only one kind of thing in a relational database, and it is called
a Relation. A relation can be seen as a table or as a condition, depending on how you choose
to look at (or implement) it.