Theory and pragmatics of the tz code and data

Outline

Scope of the tz database

The tz database attempts to record the history and predicted future of civil time scales. It organizes time zone and daylight saving time data by partitioning the world into timezones whose clocks all agree about timestamps that occur after the POSIX Epoch (1970-01-01 00:00:00 UTC). Although 1970 is a somewhat-arbitrary cutoff, there are significant challenges to moving the cutoff earlier even by a decade or two, due to the wide variety of local practices before computer timekeeping became prevalent. Most timezones correspond to a notable location and the database records all known clock transitions for that location; some timezones correspond instead to a fixed UTC offset.

Each timezone typically corresponds to a geographical region that is smaller than a traditional time zone, because clocks in a timezone all agree after 1970 whereas a traditional time zone merely specifies current standard time. For example, applications that deal with current and future timestamps in the traditional North American mountain time zone can choose from the timezones America/Denver which observes US-style daylight saving time (DST), and America/Phoenix which does not observe DST. Applications that also deal with past timestamps in the mountain time zone can choose from over a dozen timezones, such as America/Boise, America/Edmonton, and America/Hermosillo, each of which currently uses mountain time but differs from other timezones for some timestamps after 1970.

Clock transitions before 1970 are recorded for location-based timezones, because most systems support timestamps before 1970 and could misbehave if data entries were omitted for pre-1970 transitions. However, the database is not designed for and does not suffice for applications requiring accurate handling of all past times everywhere, as it would take far too much effort and guesswork to record all details of pre-1970 civil timekeeping. Although some information outside the scope of the database is collected in a file backzone that is distributed along with the database proper, this file is less reliable and does not necessarily follow database guidelines.

As described below, reference source code for using the tz database is also available. The tz code is upwards compatible with POSIX, an international standard for UNIX-like systems. As of this writing, the current edition of POSIX is: The Open Group Base Specifications Issue 7, IEEE Std 1003.1-2017, 2018 Edition. Because the database's scope encompasses real-world changes to civil timekeeping, its model for describing time is more complex than the standard and daylight saving times supported by POSIX. A tz timezone corresponds to a ruleset that can have more than two changes per year, these changes need not merely flip back and forth between two alternatives, and the rules themselves can change at times. Whether and when a timezone changes its clock, and even the timezone's notional base offset from UTC, are variable. It does not always make sense to talk about a timezone's "base offset", which is not necessarily a single number.

Timezone identifiers

Each timezone has a name that uniquely identifies the timezone. Inexperienced users are not expected to select these names unaided. Distributors should provide documentation and/or a simple selection interface that explains each name via a map or via descriptive text like "Czech Republic" instead of the timezone name "Europe/Prague". If geolocation information is available, a selection interface can locate the user on a timezone map or prioritize names that are geographically close. For an example selection interface, see the tzselect program in the tz code. The Unicode Common Locale Data Repository contains data that may be useful for other selection interfaces; it maps timezone names like Europe/Prague to locale-dependent strings like "Prague", "Praha", "Прага", and "布拉格".

The naming conventions attempt to strike a balance among the following goals:

Names normally have the form AREA/LOCATION, where AREA is a continent or ocean, and LOCATION is a specific location within the area. North and South America share the same area, 'America'. Typical names are 'Africa/Cairo', 'America/New_York', and 'Pacific/Honolulu'. Some names are further qualified to help avoid confusion; for example, 'America/Indiana/Petersburg' distinguishes Petersburg, Indiana from other Petersburgs in America.

Here are the general guidelines used for choosing timezone names, in decreasing order of importance:

Guidelines have evolved with time, and names following old versions of these guidelines might not follow the current version. When guidelines have changed, old names continue to be supported. Guideline changes have included the following:

The file zone1970.tab lists geographical locations used to name timezones. It is intended to be an exhaustive list of names for geographic regions as described above; this is a subset of the timezones in the data. Although a zone1970.tab location's longitude corresponds to its local mean time (LMT) offset with one hour for every 15° east longitude, this relationship is not exact. The backward-compatibility file zone.tab is similar but conforms to the older-version guidelines related to ISO 3166-1; it lists only one country code per entry and unlike zone1970.tab it can list names defined in backward.

The database defines each timezone name to be a zone, or a link to a zone. The source file backward defines links for backward compatibility; it does not define zones. Although backward was originally designed to be optional, nowadays distributions typically use it and no great weight should be attached to whether a link is defined in backward or in some other file. The source file etcetera defines names that may be useful on platforms that do not support POSIX-style TZ strings; no other source file other than backward contains links to its zones. One of etcetera's names is Etc/UTC, used by functions like gmtime to obtain leap second information on platforms that support leap seconds. Another etcetera name, GMT, is used by older code releases.

Time zone abbreviations

When this package is installed, it generates time zone abbreviations like 'EST' to be compatible with human tradition and POSIX. Here are the general guidelines used for choosing time zone abbreviations, in decreasing order of importance:

Application writers should note that these abbreviations are ambiguous in practice: e.g., 'CST' means one thing in China and something else in North America, and 'IST' can refer to time in India, Ireland or Israel. To avoid ambiguity, use numeric UT offsets like '-0600' instead of time zone abbreviations like 'CST'.

Accuracy of the tz database

The tz database is not authoritative, and it surely has errors. Corrections are welcome and encouraged; see the file CONTRIBUTING. Users requiring authoritative data should consult national standards bodies and the references cited in the database's comments.

Errors in the tz database arise from many sources:

In short, many, perhaps most, of the tz database's pre-1970 and future timestamps are either wrong or misleading. Any attempt to pass the tz database off as the definition of time should be unacceptable to anybody who cares about the facts. In particular, the tz database's LMT offsets should not be considered meaningful, and should not prompt creation of timezones merely because two locations differ in LMT or transitioned to standard time at different dates.

Time and date functions

The tz code contains time and date functions that are upwards compatible with those of POSIX. Code compatible with this package is already part of many platforms, where the primary use of this package is to update obsolete time-related files. To do this, you may need to compile the time zone compiler 'zic' supplied with this package instead of using the system 'zic', since the format of zic's input is occasionally extended, and a platform may still be shipping an older zic.

POSIX properties and limitations

Extensions to POSIX in the tz code

POSIX features no longer needed

POSIX and ISO C define some APIs that are vestigial: they are not needed, and are relics of a too-simple model that does not suffice to handle many real-world timestamps. Although the tz code supports these vestigial APIs for backwards compatibility, they should be avoided in portable applications. The vestigial APIs are:

Other portability notes

Interface stability

The tz code and data supply the following interfaces:

Interface changes in a release attempt to preserve compatibility with recent releases. For example, tz data files typically do not rely on recently-added zic features, so that users can run older zic versions to process newer data files. Downloading the tz database describes how releases are tagged and distributed.

Interfaces not listed above are less stable. For example, users should not rely on particular UT offsets or abbreviations for timestamps, as data entries are often based on guesswork and these guesses may be corrected or improved.

Timezone boundaries are not part of the stable interface. For example, even though the Asia/Bangkok timezone currently includes Chang Mai, Hanoi, and Phnom Penh, this is not part of the stable interface and the timezone can split at any time. If a calendar application records a future event in some location other than Bangkok by putting "Asia/Bangkok" in the event's record, the application should be robust in the presence of timezone splits between now and the future time.

Leap seconds

The tz code and data can account for leap seconds, thanks to code contributed by Bradley White. However, the leap second support of this package is rarely used directly because POSIX requires leap seconds to be excluded and many software packages would mishandle leap seconds if they were present. Instead, leap seconds are more commonly handled by occasionally adjusting the operating system kernel clock as described in Precision timekeeping, and this package by default installs a leapseconds file commonly used by NTP software that adjusts the kernel clock. However, kernel-clock twiddling approximates UTC only roughly, and systems needing more-precise UTC can use this package's leap second support directly.

The directly-supported mechanism assumes that time_t counts of seconds since the POSIX epoch normally include leap seconds, as opposed to POSIX time_t counts which exclude leap seconds. This modified timescale is converted to UTC at the same point that time zone and DST adjustments are applied – namely, at calls to localtime and analogous functions – and the process is driven by leap second information stored in alternate versions of the TZif files. Because a leap second adjustment may be needed even if no time zone correction is desired, calls to gmtime-like functions also need to consult a TZif file, conventionally named Etc/UTC (GMT in previous versions), to see whether leap second corrections are needed. To convert an application's time_t timestamps to or from POSIX time_t timestamps (for use when, say, embedding or interpreting timestamps in portable tar files), the application can call the utility functions time2posix and posix2time included with this package.

If the POSIX-compatible TZif file set is installed in a directory whose basename is zoneinfo, the leap-second-aware file set is by default installed in a separate directory zoneinfo-leaps. Although each process can have its own time zone by setting its TZ environment variable, there is no support for some processes being leap-second aware while other processes are POSIX-compatible; the leap-second choice is system-wide. So if you configure your kernel to count leap seconds, you should also discard zoneinfo and rename zoneinfo-leaps to zoneinfo. Alternatively, you can install just one set of TZif files in the first place; see the REDO variable in this package's makefile.

Calendrical issues

Calendrical issues are a bit out of scope for a time zone database, but they indicate the sort of problems that we would run into if we extended the time zone database further into the past. An excellent resource in this area is Edward M. Reingold and Nachum Dershowitz, Calendrical Calculations: The Ultimate Edition, Cambridge University Press (2018). Other information and sources are given in the file 'calendars' in the tz distribution. They sometimes disagree.

Time and time zones on other planets

Some people's work schedules have used Mars time. Jet Propulsion Laboratory (JPL) coordinators kept Mars time on and off during the Mars Pathfinder mission (1997). Some of their family members also adapted to Mars time. Dozens of special Mars watches were built for JPL workers who kept Mars time during the Mars Exploration Rovers (MER) mission (2004–2018). These timepieces looked like normal Seikos and Citizens but were adjusted to use Mars seconds rather than terrestrial seconds, although unfortunately the adjusted watches were unreliable and appear to have had only limited use.

A Mars solar day is called a "sol" and has a mean period equal to about 24 hours 39 minutes 35.244 seconds in terrestrial time. It is divided into a conventional 24-hour clock, so each Mars second equals about 1.02749125 terrestrial seconds. (One MER worker noted, "If I am working Mars hours, and Mars hours are 2.5% more than Earth hours, shouldn't I get an extra 2.5% pay raise?")

The prime meridian of Mars goes through the center of the crater Airy-0, named in honor of the British astronomer who built the Greenwich telescope that defines Earth's prime meridian. Mean solar time on the Mars prime meridian is called Mars Coordinated Time (MTC).

Each landed mission on Mars has adopted a different reference for solar timekeeping, so there is no real standard for Mars time zones. For example, the MER mission defined two time zones "Local Solar Time A" and "Local Solar Time B" for its two missions, each zone designed so that its time equals local true solar time at approximately the middle of the nominal mission. The A and B zones differ enough so that an MER worker assigned to the A zone might suffer "Mars lag" when switching to work in the B zone. Such a "time zone" is not particularly suited for any application other than the mission itself.

Many calendars have been proposed for Mars, but none have achieved wide acceptance. Astronomers often use Mars Sol Date (MSD) which is a sequential count of Mars solar days elapsed since about 1873-12-29 12:00 GMT.

In our solar system, Mars is the planet with time and calendar most like Earth's. On other planets, Sun-based time and calendars would work quite differently. For example, although Mercury's sidereal rotation period is 58.646 Earth days, Mercury revolves around the Sun so rapidly that an observer on Mercury's equator would see a sunrise only every 175.97 Earth days, i.e., a Mercury year is 0.5 of a Mercury day. Venus is more complicated, partly because its rotation is slightly retrograde: its year is 1.92 of its days. Gas giants like Jupiter are trickier still, as their polar and equatorial regions rotate at different rates, so that the length of a day depends on latitude. This effect is most pronounced on Neptune, where the day is about 12 hours at the poles and 18 hours at the equator.

Although the tz database does not support time on other planets, it is documented here in the hopes that support will be added eventually.

Sources for time on other planets: