Utf-8
character
(UCS transformation format 8) An ASCII-compatible multibyte Unicode and UCS encoding, used by Java and Plan 9.
The Unicode character set occupies a 16-bit code space. The most obvious Unicode encoding (known as UCS-2) consists of a sequence of 16-bit words. Such strings can contain bytes like ‘\0’ or ‘/’ which have a special meaning in filenames and other C library function parameters. In addition, the majority of Unix tools expects ASCII files and can’t read 16-bit words as characters without major modifications. For these reasons, UCS-2 is not a suitable external encoding of Unicode in filenames, text files, environment variables, etc.
The ISO 10646 Universal Character Set (UCS), a superset of Unicode, occupies a 31-bit code space and the obvious UCS-4 encoding for it (a sequence of 32-bit words) has the same problems.
The UTF-8 encoding of Unicode and UCS avoids the problems of fixed-length Unicode encodings because an ASCII file encoded in UTF is exactly same as the original ASCII file and all non-ASCII characters are guaranteed to have the most significant bit set (bit 0x80). This means that normal tools for text searching etc. work as expected.
UTF-8 is defined in RFC 2279.
[“File System Safe UCS Transformation Format (FSS_UTF)”, X/Open Preliminary Specification, X/Open Company Ltd., Document Number: P316. This information also appears in ISO/IEC 10646, Annex P].
Plan 9 UTF manual entry (ftp://ftp.uu.net/doc/obi/Bell.Labs/plan9pm/09utf.ps.Z).
(1998-07-29)
Read Also:
- Utgard  noun, Scandinavian Mythology. 1. a home of the Jotuns, outside Midgard and Asgard: probably synonymous with Jotunheim. noun 1. (Norse myth) one of the divisions of Jotunheim, land of the giants, ruled by Utgard-Loki 
- Utgard-Loki  noun, Scandinavian Mythology. 1. a Jotun appearing in the story of Thor’s voyage to Utgard: at first disguised under another name (Skrymir) noun 1. (Norse myth) the giant king of Utgard 
- U Thant  noun 1. 1909–74, Burmese statesman: secretary-general of the United Nations 1962–71. noun 1. U, U Thant. noun 1. See Thant noun 1. U (uː). 1909–74, Burmese diplomat; secretary-general of the United Nations (1962–71) 
- Uther  noun, Arthurian Romance. 1. king of Britain and father of Arthur. noun 1. (in Arthurian legend) a king of Britain and father of Arthur 
- Uthman  noun 1. died 656 ad, third caliph of Islam, who established an authoritative version of the Koran 
