Discs


If the data wanted is at the position marked W, and the read/write heads are currently at the position marked H, the access time is easy to calculate. The disc needs to rotate through the right angle, and the heads need to move to the right track. For a random access, the average time will be the time for half a rotation (= the Rotational Latency) plus the track-to-track seek time.

The speed of rotation for a disc is usually heavily advertised. If we take 7200 R.P.M. as an example, that means 120 revolutions per second, or one revolution in 8.3 mS, for a rotational latency of 4.1 mS.

The track-to-track seek time is less clear. The time to move the heads by a small number of tracks is usually quite a bit less than the time to move the heads right across the disc's surface, but it is not a linear relationship. Moving the heads by one track is probably no faster than moving the heads by 20 tracks. Usually we give the manufacturers the benefit of the doubt and take the published track-to-track seek time as the average, but such a plan is not always correct. Sometimes the "full stroke" seek time is nearly ten times as long as the single track seek time. However, if disc defragmentation and compaction is performed, it is quite likely that each block accessed is quite close to the previous block accessed, so the track-to-track seek time is not totally unreasonable. Typically, this might be 3mS.

That gives an average access time of about 7mS for a modern disc drive, but with a lot of caveats.


Until recently it was safe to assume that there would always be 63 blocks around each track. Now the number is likely to be higher, but for the purposes of estimations, using numbers like 63, 64, or 100 is quite reasonable.

If a program needs to read 25,252 bytes from a file, that will occupy 50 blocks (50*512=25600). If no special care is taken to allocate blocks in an orderly way, that will require 50 random disc accesses, for a total of 350mS. However, if those 50 blocks are all on the same track, it is only necessary to move the heads to that track, then in the worst case, read that whole track, for an access time of about 10mS. Taking care to ensure that consecutive blocks in a file are also consecutive blocks on one track of a disc can speed up disc accesses by a factor of 50 or more.


To refer uniquely to a byte of data on a disc, it is necessary to specify four things:
  1. Which surface (a disc can have more than 2),
  2. Which track on that surface (= which cylinder),
  3. Which block in that track (= which sector), and
  4. which byte in that block.
As the minimum acessible unit is a block (it is not possible to read or write a single byte on disc), the fourth number is not of interest to the file system. The other three can be compressed into a single unique integer:
   (which surface) * (number of cylinders) * (number of sectors)
 + (which cylinder) * (number of sectors)
 + (which sector)
This number (called the or Address) completely identifies any block on a particular disc. A file system refers to places on disc by their block numbers.

Disc Formats

Each disc (or each partition of each disc) needs to be set-up correctly before it can be used. There are generally four essential regions on a disc (or partition):
  1. All of the essential parameters that can't be found out from the hardware, such as the partitions name and any protections that apply to it, the cluster size (if clusters are used), and the positions and sizes of the other three regions, need to be stored in some fixed place. Typically block 0 of each partition is set aside for storing these parameters.
  2. The Free-List, or File-Allocation-Table (or whatever scheme is in use): when creating or extending files, it is essential to be able to find a new block that is not already in use for some other purpose. The free list is really just a list of block numbers. When a partition is formatted, the numbers of all the free blocks are written, in order, in the free list. Whenever a new block is needed, a number is taken from the free list. Whenever a block is realeased (through deleting or shortening a file) its number is added back to the free list.
    If we have a 40GB disc, that means 80M blocks; it takes 4 bytes to store a block number, so the free list will occupy 320M bytes, which requires 640K blocks. If a true free-list system is used, on a 40GB disc, 655360 blocks will be occupied by the free list. That sounds like a lot, but it is less than 1% of the total disc capacity.
  3. The root directory: Directories are just normal files that happen to contain the names and addresses (block numbers) of other files; they are nothing special. Except that the root directory has to be in some known place, so often it is not really stored as a regular file within the file system. The root directory may consist of a small number of blocks in a pre-determined place on the disc.
  4. The usable region for files.
Formatting a disc consists of setting up these four essential structures on the disc (nothing has to be done to set up the usable region for files, and that is by far the largest. Except that it is often considered prudent to test every block before it is used. The formatting process might check that each usable block really is correctly functioning before adding its number to the free list. This can make formatting take a very long time.)

The Free List

The free list just contains the block number of every block that is still available for use. It has to be kept on disc partly because it is very big, and partly because it is permanent information. When a computer is switched off, its file system should survive.

This causes trouble. The free list can not be entirely disc resident, because then every tiem a block is needed or released, two extra disc accesses will be needed (one to read information from the free list, and one to write back the modified portion of the free list). This would slow down all file operations too much to be acceptable.

As the free list is much too big to keep entirely in memory, just part of it is kept in memory, with the rest left on the disc. The free list behaves like a stack, so only the top (active) portion needs to be kept in memory. Typically when the operating system is started, it might read 100 blocks of free block numbers from the free list into memory. As blocks are used or released, only the memory-resident portion is changed. If so many blocks are allocated that the memory-resident "window" is exhausted, it is easy to read a further portion from the disc. Similarly, if so many files are deleted that the memory-resident window gets over-full, it can easily be dumped onto the disc to make more space. The O.S. just has to be careful that the memory-resident window is always saved to disc.

Files

A sensible operating system does not attempt to store all information about a file in directories. There is just too much important information: file name (can be long), dates (creation, last modified, last backed-up), owner, protections, file-type, perhaps even a comment about what the file is, and who-knows-what-else. DOS tries to store evrything in the directory, and as a result almost nothing is known about DOS files.

Every file should have one representative block that stores all of the important information about that file, and somehow ties it all together. This representative block is called the Header Block. Directory entries only need to contain a file's name and the address of its header block. In fact, even the name can be left out of the directory, but that can result in directory listings taking a long time to produce.

The useful information about a file is very unlikely to fill a whole block, so there is normally plenty of space left over in the header block for other things. This is how the whole file is tied together. For the sake of examples, I always assume that there are 112 byte's worth of useful information to be recorded about a file, leaving 400 bytes for administrative purposes. That gives enough space for a nice round 100 pointers to other blocks.


Two possible file formats immeiately present themselves. For really small files (less than 401 bytes of real data) the pointer space in the header block can be used for direct data storage. This results in very efficient use of disc space, but for some unfathomable reason is hardly ever used in real operating systems.
        The other possibility is that the 400 bytes are used to store the addresses of the blocks that contain the actual data. Each block has room for 512 bytes, and we have room for 100 pointers, so (under the stated assumptions) this file format (called a Single Level Index) has a maximum capacity of 51,200 bytes. Not very big, but a reasonable start.


A Two Level Index file also has up to 100 pointers to other disc blocks in its header, but each of those blocks contain northing but pointers to other blocks, and it is those other blocks that contain the data. This gives room for 100 pointers to pointer blocks, each pointer block has room for 128 pointers to data blocks, and each data block as always has room for 512 bytes. A two level index file therefore has a maximum capacity of 100*128*512 = 6,553,600 bytes. Much better, but still not enough for a modern system.

Naturally, there can also be a three level index file, with 100 pointers in the header, each leading to a block of 128 pointers, each leading to another block of 128 pointers, each leading to a block of 512 bytes of data, for a maximum capacity of 838,860,800 bytes. Similarly, a four level index file would have a maximum capacity of 107,374,182,400.


Under unix-like systems, a compromise format, based on I-nodes is used. An I-node is what unix uses as a header blcok for a file, so every file is represented by an I-node. the 400 pointer bytes are divided up into three of four regions of possibly different sizes. For the putposes of this example, I will assume four equally sized regions of 25 pointers each. In the first region, the 25 pointers point directly to data blocks. In the second region, the 25 pointers point to pointer blocks, each containing 128 pointers to data blocks. In the third region, the 25 pointers point to pointer blocks, each of which contains 128 pointers to other pointer blocks, each of which contains 128 pointers to data blocks. The fourth region follows the same pattern. This gives a maximum capacity of 25x512 + 25x128x512 + 25x128x128x512 + 25x128x128x128x512 = about 26.8 GB. Remember that the total capacity can be adjusted by changing the sizes of the four regions, or even adding or removing a region.

Access Speed

Why not just use the format that gives the best maximum capacity? Because that format also gives the worst access time. It is possible for an operating system to support multiple file formats (it just needs one byte in the header to indicate which format is currently in use for this file). Most modern operating systems don't bother to do this. Unix uses the I-node because it gives a reasonable compromise for little extra effort.


When considering the access time, it is important to think of three things:
  1. The time taken to open a file and read the first byte,
  2. The time taken to read the next byte in sequence, and
  3. The time taken to read a random byte from anywhere in the file.
We'll work it all out for a one-level index file as an example; the others can be worked out in the same way. As we have not thought much about exactly what a directory looks like, we will not worry about the time taken to find a file in the directory (it is not really a function of the file format anyway), and just start the timing from when we already know the block number of the file's header block.
  1. Getting the first byte: Two disc accesses: one to read the header and its 100 pointers, then one to read the first data block. Using the previously made assumptions, this comes to 14mS.
  2. Getting the next byte: Once the first byte has been read, the next 511 bytes are automatically available for free (a whole block has to be read at once), it is only after 512 bytes have been read that another block has to be read. So the average time to get the next byte in sequence is one disc access time divided by 512 bytes, or 14 micro-seconds.
  3. Getting a random byte: Taking a simple view, as there are up to 100 blocks in the file, there is a 1% chance that the random byte wanted will be in the same block as the last byte wanted and therefore not require another read, and a 99% chance that it will be in a different block and therefore require one disc access time. But not all files will be of the maximum size, so the 99% figure could be reduced by a large amount. However, it is not until we start dealing with very small files that it makes any appreciable difference to the final result, so we can (almost) safely disregard the possibility of not needing a new disc access. The time for a random byte to be read is 7 milli-seconds.
When calculating the times for larger file formats, the same principles are used, but remember that memory is not as expensive as it used to be. With a two-level index, it could be considered reasonable to read all 100 of the pointer blocks into memory when they are first needed, and keep them permanently in memory until the file is closed. That would only take 51K of memory, and would only be done when random access is required (not very common), so would be a minimal expense, that would result in the average random access time being halved.

It is even possible to do the same for a three-level index. Even though there could be up to 12,800 pointer blocks, requiring a total of 6.4MB to keep them memory-resident, it is something that would only be done when random access of a giant file is being performed very frequeuntly and needs maximum speed. In such rare cases, the ability to effectively speed up disc accesses by a factor of 3 could easily be worth 6MB of ram.


So we can see that as a general rule, access times get longer as the maximum capacity of a file format gets larger. It is also clear that the efficiency for smaller files is similarly reduced. The smallest possible file using a one level index occupies 2 blocks. The smallest possible with a four level index requires 5 blocks, so a single byte of data on disc could occupy 2560 bytes of disc space. This is why supporting multiple formats is good, and unix's I-node is a reasonable compromise. Using I-nodes, small files behave like single level indexes, it is only when the file size exceeds 12,800 bytes that the second region of pointers needs to be used, and it starts to behave like a two level index: you only pay the penalty in performance when you need increase in capacity.



Note: When calculating capacities and other things, a calculator is hardly ever needed. the number of bytes in a block, 512, is 2 to the power of 9 (2^9). The number of pointers in a pointer block is 2 to the power of 7 (2^7), so the maximum capacity of a four level index file is 100 x 128 x 128 x 128 x 512, or 100 x 2^7 X 2^7 x 2^7 x 2^9, whish is 100 x 2^(7+7+7+9) or 100 x 2^30. Everyone can multiply by 100 easily, and everyone should be able to do two-to-the-power-ofs easily. Just remember that 2^10 is very close to 1,000 so 2^30 is very close to 1,000,000,000. So the capacity of a 4 level index file is very close to 100,000,000,000 bytes, or 100GB, without needing a calculator.