Duality of Tables and Relations

an example

Consider two tables in a database:

Table ANIMALS
- id: int;
- name: char(20);
- circus_id: int;
- species: char(12);
- birthdate: int;
- etc etc etc

Table PEOPLE
- id: int;
- name: char(20);
- address: char(50);
- favourite_colour: char(12);
- birthdate: int;
- home_town: char(15);
- etc etc etc

If we want to pair people with animals born on the same date, a simple SQL query would do the trick:

        SELECT p.name, a.name
        FROM people p, animals a
        WHERE p.birthdate=a.birthdate

The condition p.birthdate=a.birthdate is a simple calculated condition; in an ER diagram we would naturally represent it with a diamond shape. However, we could just ask that query once, then create a new table:

Table SAMEBIRTHDAYS
- person_id: int;
- animal_id: int;

and just copy the results of the query into it. If we have a sufficiently sophisticated implementation of SQL, we could even automate that task, with this command:

        INSERT INTO samebirthdays
               SELECT p.id, a.id
               FROM people p, animals a
               WHERE p.birthdate=a.birthdate

Then, any future query about people and animals who were born on the same days could be answered by a table lookup instead of the computation. For example, this query:

        SELECT p.name, p.address, a.name, c.name
        FROM people p, animals a, circusses c
        WHERE p.birthdate=a.birthdate AND a.circus_id=c.id

could become:

        SELECT p.name, p.address, a.name, c.name
        FROM people p, animals a, samebirthdays s, circusses c
        WHERE p.id=s.person_id AND s.animal_id=a.id AND a.circus_id=c.id

Even for simple queries like this, it could speed things up (perhaps for some reason you can't create an index on animal.birthdate but you can create an index on a new "samebirthdays" table); if the condition is complex, converting it into a table could make a really significant difference to the speed.

It should be clear that the same idea could be applied to any condition: any true-or-false expression can be precomputed and stored in a new table; diamonds in an ER diagram can always be converted into rectangles if you are willing to expend the storage space. It may not be sensible to do so, but it is always possible.

Now consider a database with four tables, extracts from which are shown below:

AnimalNames
id	name
17	Flossie
...	...
102	Oinky
...	...
144	Fluffy
...	...

Species
id	name	kind	size
4	Elephant	mammal	large
...	...	...	...
13	Seal	mammal	medium
...	...	...	...
92	Penguin	bird	smallish
...	...	...	...

Circusses
id	name
1	Ringling Brothers
2	Al's Discount Circus
3	Barnum and Bailey's
...	...
6	Circusses 'R' Us
...	...

Animals
id	nameid	speciesid	ownerid
1	144	4	2
2	17	13	3
3	102	4	6
...	...	...	...

This collection of tables results from avoiding the redundancies that naturally arise because lots of animals have the same names, lots of animals have the same species, and each circus owns lots of animals. The original non-normalised table might have looked like this:

Animals
id	name	species	owner
1	Fluffy	Elephant	Al's Discount Circus
2	Flossie	Seal	Barnum and Bailey's
3	Oinky	Elephant	Circusses 'R' Us
...	...	...	...

so you can see that normalisation probably saved a lot of space. Anyway, The first version of the "Animals" table (with only numbers in it) could be replaced by a computation:

         name(id) = 106*id*id - 445*id + 483
      species(id) =  -9*id*id +  36*id -  23
        owner(id) =     id*id -   2*id +   3

You can easily verify that this is correct: name(2) = 106*2*2-445*2-483 = 17. The point is that mathematics guarantees that a formula exactly fitting any data, no matter how unnatural it seems, can always be produced. It takes a lot of work to produce the formula, and it is easy to make a mistake along the way, but it can be done. The formulæ will generally involve really big numbers, so computation can take a long time, but it is always possible. We can even simulate indexes on different fields by working out an inverse formula, for example:

       id(owner) = -1*owner*owner/6 + 11*owner/6 -2

so that id(6) = -1*6*6/6+11*6/6-2 = 3.

So, a query that involves accessing data from the animals table, such as:

        SELECT n.name
        FROM animalnames n, animals a, circusses c
        WHERE c.name='Circusses R Us'
          AND a.ownerid=c.id
          AND a.nameid=n.id

(which finds the name of the animal owned by 'Circusses R Us') can be replaced by an equivalent query that calculates the connection instead:

        SELECT n.name
        FROM animalnames n, circusses c
        WHERE c.name='Circusses R Us'
          AND n.id=106*(-1*c.id*c.id/6+11*c.id/6-2)*(-1*c.id*c.id/6+11*c.id/6-2)
                  -445*(-1*c.id*c.id/6+11*c.id/6-2)
                  +483

The same transformation could be applied to any table.

It os not often a sensible thing to do, but any condition can be turned into a table, and any table can be turned into a condition, so tables and conditions are really equivalent at a deep level. There is only one kind of thing in a relational database, and it is called a Relation. A relation can be seen as a table or as a condition, depending on how you choose to look at (or implement) it.