The Unified Code for Units of Measure
Gunther Schadow, Clement J. McDonald
Copyright © 1999-2014 Regenstrief Institute, Inc. and The UCUM Organization, Indianapolis, IN. All rights reserved. See TermsOfUse for details.
What is it?
The Unified Code for Units of Measure (UCUM) is a code system intended to include all units of measures being contemporarily used in international science, engineering, and business. The purpose is to facilitate unambiguous electronic communication of quantities together with their units. The focus is on electronic communication, as opposed to communication between humans. A typical application of The Unified Code for Units of Measure are electronic data interchange (EDI) protocols, but there is nothing that prevents it from being used in other types of machine communication.
UCUM is based on the ISO 80000: 2009 Quantities and Units standards series that specify the use of System International (SI) units in publications. ISO 80000 standards series is developed by Technical Committee 12, International Organization of Standardization (ISO/TC12) Quantities and units in co-operation with Technical Committee 25, International Electrotechnical Committee (IEC/TC 25).
ISO 80000 consists of the following parts, under the general title Quantities and units:
- Part 1: General
- Part 2: Mathematical signs and symbols to be used in the natural sciences and technology
- Part 3: Space and time
- Part 4: Mechanics
- Part 5: Thermodynamics
- Part 6: Electromagnetism
- Part 7: Light
- Part 8: Acoustics
- Part 9: Physical chemistry and molecular physics
- Part 10: Atomic and nuclear physics
- Part 11: Characteristic numbers
- Part 12: Solid state physics
- Part 13: Information science and technology
- Part 14: Telebiometrics related to human physiology.
Thus, UCUM support EDI protocols for the quantities and units in the domains of knowledge specified in ISO 80000 standards parts 2-14 above
History of UCUM
The Unified Code for Units of Measure was inspired by and originally heavily based on ISO 2955-1983, ANSI X3.50-1986, and HL7's extensions called ISO+. The respective ISO and ANSI standards are both entitled Representation of [...] units in systems with limited character sets where ISO 2955 refers to SI and other units provided by ISO 1000-1981, while ANSI X3.50 extends ISO 2955 to include U.S. customary units. Because these standards carry the restriction of limited character sets in their names they seem to be of less value today where graphical user interface and laser printers are in wide-spread use, which is why the european standard ENV 12435 in its clause 7.3 declares ISO 2955 obsolete.
ENV 12435 is dedicated exclusively to the communication of measurements between humans in display and print, and does not provide codes that can be used in communication between systems. It does not even provide a specification that would allow communication of units from one system to the screen or printer of another system. The issue about displaying units in the common style defined by the 9th Conférence Générale des Poids et Mesures (CGPM) in 1947 is not just the character set. Although The Unicode Standard and its predecessor ISO/IEC 10646 is the richest character set ever it is still not enough to specify the presentation of units because there are important typographical details such as superscripts, subscripts, roman and italics.1
Why is it needed?
The real value of the restriction on the character set and typographical details, however, is not to cope with legacy systems and less powerful technology, but to facilitate unambiguous communication and interpretation of the meaning of units from one computer system to another. In this respect, ISO 2955 and ANSI X3.50 are not obsolete because there is no other standard that would fill in for inter-systems communication of units. However, ISO 2599 and ANSI X3.50 currently have severe defects:
- ISO 2955 and ANSI X3.50 contain numerous name conflicts, both direct conflicts (e.g., "a" being used for both year and are) and conflicts that are generated through combination of unit symbols with prefixes (e.g., "cd" means candela and centi-day and "PEV" means peta-volt and pico-electronvolt.)
- Neither ISO 2955 nor ANSI X3.50 cover all units that are currently used in practice. There are many more units in use than what is allowed by the Système International d'Unités (SI) and accompanying standards. For example, the older CGM-units dyne and erg are still used in the science of physiology. Although ANSI X3.50 extends ISO 2955 with some U.S. customary units, it is still not complete in this respect. For example it doesn't define the degree Fahrenheit.
- ANSI X3.50 is semantically ambiguous with respect to customary units, even if we do not consider the history and international aspects of customary units. Three systems of mass units are used in the U.S., avoirdupois used generally, apothecaries' used by pharmacists, and troy used in trade with Gold and other precious metals. ANSI X3.50 has no way to select any one of those specifically, which is bad in medicine, where both apothecaries' and avoirdupois weights are being used frequently.
ISO 2955 and all standards that do only look for the resolutions and recommendations of the CGPM and the Comité International des Poids et Mesures (CIPM) as published by the Bureau International des Poids et Mesures (BIPM) and various ISO standards (ISO 1000 and ISO 31) fail to recognize that the needs in practice are often different from the ideal propositions of the CGPM. Although not allowed by the CGPM and related ISO standards, many other units are used in international sciences, healthcare, engineering, and business, both meaningfully and some units of questionable meaning. A coding system that is to be useful in practice must cover the requirements and habits of the practice---even some of the bad habits.
None of the current standards attempt to specify a semantics of units that can be deployed in information systems with moderate requirements. Metrological standards such as those published by the BIPM are dedicated to maximal scientific correctness of reproducible definitions of units. These definitions make sense only to human specialists and can hardly be deployed to their full extent by any information system that is not dedicated to metrology. On the other hand, ISO 2955 and ANSI X3.50 provide no semantics at all for the codes they define.
The Unified Code for Units of Measure intends to provide a single coding system for units that is complete, free of all ambiguities, and that assigns to each defined unit a concise semantics. In communication it is not only important that all communicating parties have the same repertoir of signs, but also that all attach the same meaning to the signals they exchange. The common meaning must be computationally verifiable. The Unified Code for Units of Measure assumes a semantics for units based on dimensional analysis.2
In short, each unit is defined relative to a system of base units by a numeric factor and a vector of exponents by which the base units contribute to the unit to be defined. Although we can reflect all the meaning of units covered by dimensional analysis with this vector notation, the following tables do not show these vectors. One reason is that the vectors depend on the base system chosen and even on the ordering of the base units. The other reason is that these vectors are hard to understand to human readers while they can be easily derived computationally. Therefore we define new unit symbols using algebraic terms of other units. Those algebraic terms are also valid codes of The Unified Code for Units of Measure.
The Unified Code for Units of Measures is very stable in content and has already been adopted by some standard organizations such as DICOM, HL7 and has been referenced as best practice by the Open Geospatial Consortium in their Web Map Service (WMS) and Geography Markup Language (GML) implementation specifications. We are still looking for the best way to establish this specification as a widely used industry standard. The official status and the affiliation may change during that process. However, we try to keep as much as possible of the specification freely available and redistributable to assure the maximum use and benefit. We would also like to keep this specification maintainable and flexible to update. Although the initial version contains more than 250 terminal unit symbols (more than three times as many symbols as in ANSI X3.50), there are areas that are not covered completely yet.
The specification is maintained electronically so that the printed version is guaranteed to contain consistent and tested data that is free from severe name conflicts or random errors. The
- full specification is now available as an HTML document (whereas it used to be only a PDF file). The new XML format of the specification enabled us to make
- XML releases of the formal part of the specification, have better sorting and indexing capabilities, etc.
There are currently the following active projects in UCUM:
- Organization and Procedures Project
- Managing Requests for new entries
- Figuring out how Procedure Defined Units fit into UCUM's semantic framework
- Testing and Quality Assurance of UCUM implementations
- Providing translations, internationalization
- Education, promotion and outreach
- Managing relationships with other organizations
- Fighting ticket spam on this site is an unfortunate reality
There is a set of functional tests for UCUM implementations.
In addition, there are several open source implementations:
- Eclipse UOMo UCUM Implementation based on Eclipse and the Units of Measurement API
- An open-source implementation, instantly usable as a Java applet, that is configured at runtime over the Internet with the latest release of The Unified Code for Units of Measures.
- Eclipse OHF had an earlier implementation available in Java.
- JScience 5 based on the Units of Measurement API (formerly JSR-275)
- An XSLT implementation (link to be provided)
There are also some commercial implementations:
- HL7Connect - interface engine. Win32 COM UCUM implementation is free for any use.
All comments are welcome and are usually responded to within only few business days. We invite commenters to submit a tracked ticket (http://unitsofmeasure.org/trac/newticket). From now on we will also post all email exchanges with responses in order to maintain accountability for any changes and community input. See the CommentsArchive for these.
1) Interestingly the authors of ENV 12435 forgot to include superscripts in the minimum requirements as given by subclause 7.1.4 for which they do not specify an alternative.
2)A more extensive introduction into this semantics of units can be found in: Schadow G, McDonald CJ et al: Units of Measure in Clinical Information Systems. JAMIA. 6(2); Mar/Apr? 1999; p.~151--162. Available from: URL: http://www.jamia.org/cgi/reprint/6/2/151