Geographical Data
- Major Roads and Intersections in ASCII format.
There are two files: majorroads.txt describes
every major road and ferry crossing in the unites states. This is a 1.3MB text file,
so it may take a while to appear. Each line describes one
road, and there are 47,014 lines.
Here are three sample lines from the file:
CO-17 T-- 19536 19455 0.593
I-85 L-- 16586 16593 0.012
US-56 P-- 16598 16595 0.005
The five items on each line are the road's name (a variable length string), its official
designation (three character string, "T--" means Through Highway, "L--" means Limited
Access Highway, "F-T" means Toll Ferry Crossing, etc.), then two integers indicating
the locations that it links together, then a floating point number giving its length(*).
Most roads are divided into a large number of short segments.
The integers indicating the locations linked by the road are simply line numbers
(starting from zero) in the locations file.
(*) The units for road length are degrees of the Earth's circumference, which come to
almost exactly 69 miles. So a length recorded as 0.100 is really 6.9 miles.
The alphaplaces.txt file contains a list of all officially
named places in the U.S., their locations, and some other stuff.
Population is the first number that appears after the place name. Latitude (positive for North)
and Longitude (positive for East) are the last two numbers on a line. There are no irregualrities
in the file, all data is strictly aligned in fixed-width columns.
The locations.txt file has one line for each location
mentioned in the majorroads file. This is a 1.5MB text file, so it may take a while
to appear. Each line begins with the longitude and latitude
of the location in degrees, followed by some information intended to help in choosing
a name for it. The third item is the number of miles from the nearest place that has
an official name; the rest of the line describes that nearest named place, giving
its population, what kind of place it is (as above, "city", "town", etc), the two
letter abbreviation for its state, and its name. The name may include spaces, it
extends to the end of the line.
There is now a second data set available, containing only interstate highways in
the East (longitude > -90) and the locations connected to them. These files are
about one eighth of the size of the original files, so a quadratic algorithm should
be about 64 times faster processing them. The files are
ielocations.txt and
ieroads.txt.
For quick testing, a third data set exists, containing less precise data but for
a wider range of eastern cities and roads. The files for this reduced data set are
fewlocations.txt and
fewroads.txt.
Additionally, there is a fourth data set available, containing only interstate highways in
the Florida and the locations connected to them. These files are not very interesting,
but they are very small, so even the most inefficient program should be able to
work with them quickly enough.
The files are
ifllocations.txt and
iflroads.txt.
- Digital Elevation Model files in Binary format.
These files contain the same information as the ascii files, but in a compressed
binary format, so processing them is much faster and more efficient. There are also
many more of them, covering the whole country at each of the 5 degree, 10 degree,
20 degree, 30 degree, 60 degree, and 80 degree tile sizes.
For a tile that is R pixels high and C pixels wide, the file is exactly
2*(R+1)*C bytes long. The factor of two is because the data is in the "short int"
format: signed 16 bit numbers in little-endian order. The factor of R+1 instead of
R is because there is an extra dummy row at the beginning of the file. This dummy
row does not contain elevation data, but "meta-data": information about the
file itself, written as an ascii string. The R*C two-byte data items that follow
the dummy row are elevations in metres above sea level.
A typical dummy row contains this information:
rows 600 columns 600 bytesperpixel 2 secondsperpixel 60
leftlongseconds -450000 toplatseconds 180000 min 1 max 4048 specialval -500
In each file the strings are exactly the same and in the same order, but the numbers
may be different. The example indicates that the format of the file is 600 rows (R) and
600 columns (C) and 2 bytes per data value (short ints). The resolution is
60 seconds (i.e. one minute or 1/60 degree) per data value, so an entire row of
600 values will cover ten degrees geographically. The top left hand corner is
at a longitude of -450000 seconds (which is 125 degrees West) and a latitude of
+180000 seconds (which is 50 degrees North). The lowest real data value is 1
(metre above sea level) and the highest is 4048. The value -500 is the special
"marine code": when -500 appears in the data it is not an elevation, but an
indication that this point is out at sea, off the coast.
The files may be found here
The file usaW130N50D80 covers the continental USA, central America, and the Caribbean
Here is a windows executable viewer for the
binary files.