X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=doc%2Fdev%2Fsystem-file-format.texi;h=a480195857f8d027cc910d9c9ac76b6d6ff26fb7;hb=008fe5fdec4f94535df888ff8cdd94f802a3660d;hp=972b1331b4fb0f0be54ff7b2118f8bb204a71293;hpb=665bff371384dafbbc328fabee9b564260561e44;p=pspp diff --git a/doc/dev/system-file-format.texi b/doc/dev/system-file-format.texi index 972b1331b4..a480195857 100644 --- a/doc/dev/system-file-format.texi +++ b/doc/dev/system-file-format.texi @@ -33,12 +33,40 @@ floating-point numbers, and translates as needed. However, only IEEE has actually been observed in system files, and it is likely that other formats are obsolete or were never used. -The PSPP system-missing value is represented by the largest possible -negative number in the floating point format (@code{-DBL_MAX}). Two -other values are important for use as missing values: @code{HIGHEST}, -represented by the largest possible positive number (@code{DBL_MAX}), -and @code{LOWEST}, represented by the second-largest negative number -(in IEEE 754 format, @code{0xffeffffffffffffe}). +System files use a few floating point values for special purposes: + +@table @asis +@item SYSMIS +The system-missing value is represented by the largest possible +negative number in the floating point format (@code{-DBL_MAX}). + +@item HIGHEST +HIGHEST is used as the high end of a missing value range with an +unbounded maximum. It is represented by the largest possible positive +number (@code{DBL_MAX}). + +@item LOWEST +LOWEST is used as the low end of a missing value range with an +unbounded minimum. It was originally represented by the +second-largest negative number (in IEEE 754 format, +@code{0xffeffffffffffffe}). System files written by SPSS 21 and later +instead use the largest negative number (@code{-DBL_MAX}), the same +value as SYSMIS. This does not lead to ambiguity because LOWEST +appears in system files only in missing value ranges, which never +contain SYSMIS. +@end table + +System files may use most character encodings based on an 8-bit unit. +UTF-16 and UTF-32, based on wider units, appear to be unacceptable. +@code{rec_type} in the file header record is sufficient to distinguish +between ASCII and EBCDIC based encodings. The best way to determine +the specific encoding in use is to consult the character encoding +record (@pxref{Character Encoding Record}), if present, and failing +that the @code{character_code} in the machine integer info record +(@pxref{Machine Integer Info Record}). The same encoding should be +used for the dictionary and the data in the file, although it is +possible to artificially synthesize files that use different encodings +(@pxref{Character Encoding Record}). System files are divided into records, each of which begins with a 4-byte record type, usually regarded as an @code{int32}. @@ -60,7 +88,8 @@ if present. Document record, if present. @item -Any records not explicitly included in this list, in any order. +Extension (type 7) records, in ascending numerical order of their +subtypes. @item Dictionary termination record. @@ -79,16 +108,19 @@ Each type of record is described separately below. * Machine Integer Info Record:: * Machine Floating-Point Info Record:: * Multiple Response Sets Records:: +* Extra Product Info Record:: * Variable Display Parameter Record:: * Long Variable Names Record:: * Very Long String Record:: * Character Encoding Record:: * Long String Value Labels Record:: +* Long String Missing Values Record:: * Data File and Variable Attributes Records:: * Extended Number of Cases Record:: * Miscellaneous Informational Records:: * Dictionary Termination Record:: * Data Record:: +* Encrypted System Files:: @end menu @node File Header Record @@ -102,7 +134,7 @@ char rec_type[4]; char prod_name[60]; int32 layout_code; int32 nominal_case_size; -int32 compressed; +int32 compression; int32 weight_index; int32 ncases; flt64 bias; @@ -114,7 +146,15 @@ char padding[3]; @table @code @item char rec_type[4]; -Record type code, set to @samp{$FL2}. +Record type code, either @samp{$FL2} for system files with +uncompressed data or data compressed with simple bytecode compression, +or @samp{$FL3} for system files with ZLIB compressed data. + +This is truly a character field that uses the character encoding as +other strings. Thus, in a file with an ASCII-based character encoding +this field contains @code{24 46 4c 32} or @code{24 46 4c 33}, and in a +file with an EBCDIC-based encoding this field contains @code{5b c6 d3 +f2}. (No EBCDIC-based ZLIB-compressed files have been observed.) @item char prod_name[60]; Product identification string. This always begins with the characters @@ -139,7 +179,10 @@ files written by some systems set this value to -1. In general, it is unsafe for systems reading system files to rely upon this value. @item int32 compressed; -Set to 1 if the data in the file is compressed, 0 otherwise. +Set to 0 if the data in the file is not compressed, 1 if the data is +compressed with simple bytecode compression, 2 if the data is ZLIB +compressed. This field has value 2 if and only if @code{rec_type} is +@samp{$FL3}. @item int32 weight_index; If one of the variables in the data set is used as a weighting @@ -184,6 +227,10 @@ field is arbitrarily set to @samp{00:00:00}. File label declared by the user, if any (@pxref{FILE LABEL,,,pspp, PSPP Users Guide}). Padded on the right with spaces. +A product that identifies itself as @code{VOXCO INTERVIEWER 4.3} uses +CR-only line ends in this field, rather than the more usual LF-only or +CR LF line ends. + @item char padding[3]; Ignored padding bytes to make the structure a multiple of 32 bits in length. Set to zeros. @@ -253,6 +300,10 @@ respectively. If the variable has a range for missing variables, set to -2; if the variable has a range for missing variables plus a single discrete value, set to -3. +A long string variable always has the value 0 here. A separate record +indicates missing values for long string variables (@pxref{Long String +Missing Values Record}). + @item int32 print; Print format for this variable. See below. @@ -265,6 +316,13 @@ the at-sign (@samp{@@}). Subsequent characters may also be digits, octothorpes (@samp{#}), dollar signs (@samp{$}), underscores (@samp{_}), or full stops (@samp{.}). The variable name is padded on the right with spaces. +The @samp{name} fields should be unique within a system file. System +files written by SPSS that contain very long string variables with +similar names sometimes contain duplicate names that are later +eliminated by resolving the very long string names (@pxref{Very Long +String Record}). PSPP handles duplicates by assigning them new, +unique names. + @item int32 label_len; This field is present only if @code{has_var_label} is set to 1. It is set to the length, in characters, of the variable label. The @@ -390,6 +448,11 @@ Format types are defined as follows: @end multitable @end quotation +A few system files have been observed in the wild with invalid +@code{write} fields, in particular with value 0. Readers should +probably treat invalid @code{print} or @code{write} fields as some +default format. + @node Value Labels Records @section Value Labels Records @@ -543,20 +606,53 @@ Floating point representation code. For IEEE 754 systems this is 1. IBM 370 sets this to 2, and DEC VAX E to 3. @item int32 compression_code; -Compression code. Always set to 1. +Compression code. Always set to 1, regardless of whether or how the +file is compressed. @item int32 endianness; Machine endianness. 1 indicates big-endian, 2 indicates little-endian. @item int32 character_code; -@anchor{character-code} -Character code. 1 indicates EBCDIC, 2 indicates 7-bit ASCII, 3 -indicates 8-bit ASCII, 4 indicates DEC Kanji. -Windows code page numbers are also valid. - -Experience has shown that in many files, this field is ignored or incorrect. -For a more reliable indication of the file's character encoding -see @ref{Character Encoding Record}. +@anchor{character-code} Character code. The following values have +been actually observed in system files: + +@table @asis +@item 1 +EBCDIC. + +@item 2 +7-bit ASCII. + +@item 1250 +The @code{windows-1250} code page for Central European and Eastern +European languages. + +@item 1252 +The @code{windows-1252} code page for Western European languages. + +@item 28591 +ISO 8859-1. + +@item 65001 +UTF-8. +@end table + +The following additional values are known to be defined: + +@table @asis +@item 3 +8-bit ``ASCII''. + +@item 4 +DEC Kanji. +@end table + +Other Windows code page numbers are known to be generally valid. + +Old versions of SPSS for Unix and Windows always wrote value 2 in this +field, regardless of the encoding in use. Newer versions also write +the character encoding as a string (see @ref{Character Encoding +Record}). @end table @node Machine Floating-Point Info Record @@ -644,7 +740,8 @@ following: @itemize @bullet @item -The set's name (an identifier that begins with @samp{$}). +The set's name (an identifier that begins with @samp{$}), in mixed +upper and lower case. @item An equals sign (@samp{=}). @@ -685,8 +782,8 @@ written if LABELSOURCE=VARLABEL was specified. A space. @item -The names of the variables in the set, each separated from the -previous by a single space. +The short names of the variables in the set, converted to lowercase, +each separated from the previous by a single space. @item A line feed (byte 0x0a). @@ -723,6 +820,44 @@ $d=E 1 2 34 13 third mdgroup k l m $e=E 11 6 choice 0 n o p @end example +@node Extra Product Info Record +@section Extra Product Info Record + +This optional record appears to contain a text string that describes +the program that wrote the file and the source of the data. (This is +redundant with the file label and product info found in the file +header record.) + +@example +/* @r{Header.} */ +int32 rec_type; +int32 subtype; +int32 size; +int32 count; + +/* @r{Exactly @code{count} bytes of data.} */ +char info[]; +@end example + +@table @code +@item int32 rec_type; +Record type. Always set to 7. + +@item int32 subtype; +Record subtype. Always set to 10. + +@item int32 size; +The size of each element in the @code{info} member. Always set to 1. + +@item int32 count; +The total number of bytes in @code{info}. + +@item char info[]; +A text string. A product that identifies itself as @code{VOXCO +INTERVIEWER 4.3} uses CR-only line ends in this field, rather than the +more usual LF-only or CR LF line ends. +@end table + @node Variable Display Parameter Record @section Variable Display Parameter Record @@ -774,8 +909,8 @@ Ordinal Scale Continuous Scale @end table -SPSS 14 sometimes writes a @code{measure} of 0. PSPP interprets this -as nominal scale. +SPSS sometimes writes a @code{measure} of 0. PSPP interprets this as +nominal scale. @item int32 width; The width of the display column for the variable in characters. @@ -955,12 +1090,29 @@ The size of each element in the @code{encoding} member. Always set to 1. The total number of bytes in @code{encoding}. @item char encoding[]; -The name of the character encoding. Normally this will be an official IANA characterset name or alias. +The name of the character encoding. Normally this will be an official +IANA character set name or alias. See @url{http://www.iana.org/assignments/character-sets}. +Character set names are not case-sensitive, but SPSS appears to write +them in all-uppercase. @end table -This record is not present in files generated by older software. -See also @ref{character-code}. +This record is not present in files generated by older software. See +also the @code{character_code} field in the machine integer info +record (@pxref{character-code}). + +When the character encoding record and the machine integer info record +are both present, all system files observed in practice indicate the +same character encoding, e.g.@: 1252 as @code{character_code} and +@code{windows-1252} as @code{encoding}, 65001 and @code{UTF-8}, etc. + +If, for testing purposes, a file is crafted with different +@code{character_code} and @code{encoding}, it seems that +@code{character_code} controls the encoding for all strings in the +system file before the dictionary termination record, including +strings in data (e.g.@: string missing values), and @code{encoding} +controls the encoding for strings following the dictionary termination +record. @node Long String Value Labels Record @section Long String Value Labels Record @@ -1035,6 +1187,74 @@ between 0 and 120, is the number of bytes in @code{label}. The @end table @end table +@node Long String Missing Values Record +@section Long String Missing Values Record + +This record, if present, specifies missing values for long string +variables. + +@example +/* @r{Header.} */ +int32 rec_type; +int32 subtype; +int32 size; +int32 count; + +/* @r{Repeated up to exactly @code{count} bytes.} */ +int32 var_name_len; +char var_name[]; +char n_missing_values; +long_string_missing_value values[]; +@end example + +@table @code +@item int32 rec_type; +Record type. Always set to 7. + +@item int32 subtype; +Record subtype. Always set to 22. + +@item int32 size; +Always set to 1. + +@item int32 count; +The number of bytes following the header until the next header. + +@item int32 var_name_len; +@itemx char var_name[]; +The number of bytes in the name of the long string variable that has +missing values, plus the variable name itself, which consists of +exactly @code{var_name_len} bytes. The variable name is not padded to +any particular boundary, nor is it null-terminated. + +@item char n_missing_values; +The number of missing values, either 1, 2, or 3. (This is, unusually, +a single byte instead of a 32-bit number.) + +@item long_string_missing_value values[]; +The missing values themselves. This array contains exactly +@code{n_missing_values} elements, each of which has the following +substructure: + +@example +int32 value_len; +char value[]; +@end example + +@table @code +@item int32 value_len; +The length of the missing value string, in bytes. This value should +be 8, because long string variables are at least 8 bytes wide (by +definition), only the first 8 bytes of a long string variable's +missing values are allowed to be non-spaces, and any spaces within the +first 8 bytes are included in the missing value here. + +@item char value[]; +The missing value string, exactly @code{value_len} bytes, without +any padding or null terminator. +@end table +@end table + @node Data File and Variable Attributes Records @section Data File and Variable Attributes Records @@ -1087,8 +1307,8 @@ element. In record type 18, this field contains a sequence of one or more variable attribute sets. If more than one variable attribute set is present, each one after the first is delimited from the previous by -@code{/}. Each variable attribute set consists of a (potentially -long) variable name, +@code{/}. Each variable attribute set consists of a long +variable name, followed by @code{:}, followed by an attribute set with the same syntax as on record type 17. @@ -1109,12 +1329,38 @@ VARIABLE ATTRIBUTE VARIABLES=dummy ATTRIBUTE=bert('123'). will contain a variable attribute record with the following contents: @example -00000000 07 00 00 00 12 00 00 00 01 00 00 00 22 00 00 00 |............"...| -00000010 64 75 6d 6d 79 3a 66 72 65 64 28 27 32 33 27 0a |dummy:fred('23'.| -00000020 27 33 34 27 0a 29 62 65 72 74 28 27 31 32 33 27 |'34'.)bert('123'| -00000030 0a 29 |.) | +0000 07 00 00 00 12 00 00 00 01 00 00 00 22 00 00 00 |............"...| +0010 64 75 6d 6d 79 3a 66 72 65 64 28 27 32 33 27 0a |dummy:fred('23'.| +0020 27 33 34 27 0a 29 62 65 72 74 28 27 31 32 33 27 |'34'.)bert('123'| +0030 0a 29 |.) | @end example +@menu +* Variable Roles:: +@end menu + +@node Variable Roles +@subsection Variable Roles + +A variable's role is represented as an attribute named @code{$@@Role}. +This attribute has a single element whose values and their meanings +are: + +@table @code +@item 0 +Input. This, the default, is the most common role. +@item 1 +Output. +@item 2 +Both. +@item 3 +None. +@item 4 +Partition. +@item 5 +Split. +@end table + @node Extended Number of Cases Record @section Extended Number of Cases Record @@ -1179,7 +1425,9 @@ Record type. Always set to 7. @item int32 subtype; Record subtype. May take any value. According to Aapi H@"am@"al@"ainen, value 5 indicates a set of grouped variables and 6 -indicates date info (probably related to USE). +indicates date info (probably related to USE). Subtype 24 appears to +contain XML that describes how data in the file should be displayed +on-screen. @item int32 size; Size of each piece of data in the data part. Should have the value 1, @@ -1216,22 +1464,23 @@ Ignored padding. Should be set to 0. @node Data Record @section Data Record -Data records must follow all other records in the system file. There must -be at least one data record in every system file. +The data record must follow all other records in the system file. +Every system file must have a data record that specifies data for at +least one case. The format of the data record varies depending on the +value of @code{compression} in the file header record: -The format of data records varies depending on whether the data is -compressed. Regardless, the data is arranged in a series of 8-byte -elements. - -When data is not compressed, -each element corresponds to +@table @asis +@item 0: no compression +Data is arranged as a series of 8-byte elements. +Each element corresponds to the variable declared in the respective variable record (@pxref{Variable Record}). Numeric values are given in @code{flt64} format; string values are literal characters string, padded on the right when necessary to fill out 8-byte units. -Compressed data is arranged in the following manner: the first 8 bytes -in the data section is divided into a series of 1-byte command +@item 1: bytecode compression +The first 8 bytes +of the data record is divided into a series of 1-byte command codes. These codes have meanings as described below: @table @asis @@ -1269,8 +1518,287 @@ An 8-byte string value that is all spaces. The system-missing value. @end table -When the end of the an 8-byte group of command bytes is reached, any -blocks of non-compressible values indicated by code 253 are skipped, -and the next element of command bytes is read and interpreted, until -the end of the file or a code with value 252 is reached. +The end of the 8-byte group of bytecodes is followed by any 8-byte +blocks of non-compressible values indicated by code 253. After that +follows another 8-byte group of bytecodes, then those bytecodes' +non-compressible values. The pattern repeats to the end of the file +or a code with value 252. + +@item 2: ZLIB compression +The data record consists of the following, in order: + +@itemize @bullet +@item +ZLIB data header, 24 bytes long. + +@item +One or more variable-length blocks of ZLIB compressed data. + +@item +ZLIB data trailer, with a 24-byte fixed header plus an additional 24 +bytes for each preceding ZLIB compressed data block. +@end itemize + +The ZLIB data header has the following format: + +@example +int64 zheader_ofs; +int64 ztrailer_ofs; +int64 ztrailer_len; +@end example + +@table @code +@item int64 zheader_ofs; +The offset, in bytes, of the beginning of this structure within the +system file. + +@item int64 ztrailer_ofs; +The offset, in bytes, of the first byte of the ZLIB data trailer. + +@item int64 ztrailer_len; +The number of bytes in the ZLIB data trailer. This and the previous +field sum to the size of the system file in bytes. +@end table + +The data header is followed by @code{(ztrailer_ofs - 24) / 24} ZLIB +compressed data blocks. Each ZLIB compressed data block begins with a +ZLIB header as specified in RFC@tie{}1950, e.g.@: hex bytes @code{78 +01} (the only header yet observed in practice). Each block +decompresses to a fixed number of bytes (in practice only +@code{0x3ff000}-byte blocks have been observed), except that the last +block of data may be shorter. The last ZLIB compressed data block +gends just before offset @code{ztrailer_ofs}. + +The result of ZLIB decompression is bytecode compressed data as +described above for compression format 1. + +The ZLIB data trailer begins with the following 24-byte fixed header: + +@example +int64 bias; +int64 zero; +int32 block_size; +int32 n_blocks; +@end example + +@table @code +@item int64 int_bias; +The compression bias as a negative integer, e.g.@: if @code{bias} in +the file header record is 100.0, then @code{int_bias} is @minus{}100 +(this is the only value yet observed in practice). + +@item int64 zero; +Always observed to be zero. + +@item int32 block_size; +The number of bytes in each ZLIB compressed data block, except +possibly the last, following decompression. Only @code{0x3ff000} has +been observed so far. + +@item int32 n_blocks; +The number of ZLIB compressed data blocks, always exactly +@code{(ztrailer_ofs - 24) / 24}. +@end table + +The fixed header is followed by @code{n_blocks} 24-byte ZLIB data +block descriptors, each of which describes the compressed data block +corresponding to its offset. Each block descriptor has the following +format: + +@example +int64 uncompressed_ofs; +int64 compressed_ofs; +int32 uncompressed_size; +int32 compressed_size; +@end example + +@table @code +@item int64 uncompressed_ofs; +The offset, in bytes, that this block of data would have in a similar +system file that uses compression format 1. This is +@code{zheader_ofs} in the first block descriptor, and in each +succeeding block descriptor it is the sum of the previous desciptor's +@code{uncompressed_ofs} and @code{uncompressed_size}. + +@item int64 compressed_ofs; +The offset, in bytes, of the actual beginning of this compressed data +block. This is @code{zheader_ofs + 24} in the first block descriptor, +and in each succeeding block descriptor it is the sum of the previous +descriptor's @code{compressed_ofs} and @code{compressed_size}. The +final block descriptor's @code{compressed_ofs} and +@code{compressed_size} sum to @code{ztrailer_ofs}. + +@item int32 uncompressed_size; +The number of bytes in this data block, after decompression. This is +@code{block_size} in every data block except the last, which may be +smaller. + +@item int32 compressed_size; +The number of bytes in this data block, as stored compressed in this +system file. +@end table +@end table + @setfilename ignored + +@node Encrypted System Files +@section Encrypted System Files + +SPSS 21 and later support an encrypted system file format. + +@quotation Warning +The SPSS encrypted file format is poorly designed. It is much cheaper +and faster to decrypt a file encrypted this way than if a well +designed alternative were used. If you must use this format, use a +10-byte randomly generated password. +@end quotation + +@subheading Encrypted File Format + +Encrypted system files begin with the following 36-byte fixed header: + +@example +0000 1c 00 00 00 00 00 00 00 45 4e 43 52 59 50 54 45 |........ENCRYPTE| +0010 44 53 41 56 15 00 00 00 00 00 00 00 00 00 00 00 |DSAV............| +0020 00 00 00 00 |....| +@end example + +Following the fixed header is a complete system file in the usual +format, except that each 16-byte block is encrypted with AES-256 in +ECB mode. The AES-256 key is derived from a password in the following +way: + +@enumerate +@item +Start from the literal password typed by the user. Truncate it to at +most 10 bytes, then append (between 1 and 22) null bytes until there +are exactly 32 bytes. Call this @var{password}. + +@item +Let @var{constant} be the following 73-byte constant: + +@example +0000 00 00 00 01 35 27 13 cc 53 a7 78 89 87 53 22 11 +0010 d6 5b 31 58 dc fe 2e 7e 94 da 2f 00 cc 15 71 80 +0020 0a 6c 63 53 00 38 c3 38 ac 22 f3 63 62 0e ce 85 +0030 3f b8 07 4c 4e 2b 77 c7 21 f5 1a 80 1d 67 fb e1 +0040 e1 83 07 d8 0d 00 00 01 00 +@end example + +@item +Compute CMAC-AES-256(@var{password}, @var{constant}). Call the +16-byte result @var{cmac}. + +@item +The 32-byte AES-256 key is @var{cmac} || @var{cmac}, that is, +@var{cmac} repeated twice. +@end enumerate + +@subsubheading Example + +Consider the password @samp{pspp}. @var{password} is: + +@example +0000 70 73 70 70 00 00 00 00 00 00 00 00 00 00 00 00 |pspp............| +0010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| +@end example + +@noindent +@var{cmac} is: + +@example +0000 3e da 09 8e 66 04 d4 fd f9 63 0c 2c a8 6f b0 45 +@end example + +@noindent +The AES-256 key is: + +@example +0000 3e da 09 8e 66 04 d4 fd f9 63 0c 2c a8 6f b0 45 +0010 3e da 09 8e 66 04 d4 fd f9 63 0c 2c a8 6f b0 45 +@end example + +@subheading Password Encoding + +SPSS also supports what it calls ``encrypted passwords.'' These are +not encrypted. They are encoded with a simple, fixed scheme. An +encoded password is always a multiple of 2 characters long, and never +longer than 20 characters. The characters in an encoded password are +always in the graphic ASCII range 33 through 126. Each successive +pair of characters in the password encodes a single byte in the +plaintext password. + +Use the following algorithm to decode a pair of characters: + +@enumerate +@item +Let @var{a} be the ASCII code of the first character, and @var{b} be +the ASCII code of the second character. + +@item +Let @var{ah} be the most significant 4 bits of @var{a}. Find the line +in the table below that has @var{ah} on the left side. The right side +of the line is a set of possible values for the most significant 4 +bits of the decoded byte. + +@display +@t{2 } @result{} @t{2367} +@t{3 } @result{} @t{0145} +@t{47} @result{} @t{89cd} +@t{56} @result{} @t{abef} +@end display + +@item +Let @var{bh} be the most significant 4 bits of @var{b}. Find the line +in the second table below that has @var{bh} on the left side. The +right side of the line is a set of possible values for the most +significant 4 bits of the decoded byte. Together with the results of +the previous step, only a single possibility is left. + +@display +@t{2 } @result{} @t{139b} +@t{3 } @result{} @t{028a} +@t{47} @result{} @t{46ce} +@t{56} @result{} @t{57df} +@end display + +@item +Let @var{al} be the least significant 4 bits of @var{a}. Find the +line in the table below that has @var{al} on the left side. The right +side of the line is a set of possible values for the least significant +4 bits of the decoded byte. + +@display +@t{03cf} @result{} @t{0145} +@t{12de} @result{} @t{2367} +@t{478b} @result{} @t{89cd} +@t{569a} @result{} @t{abef} +@end display + +@item +Let @var{bl} be the least significant 4 bits of @var{b}. Find the +line in the table below that has @var{bl} on the left side. The right +side of the line is a set of possible values for the least significant +4 bits of the decoded byte. Together with the results of the previous +step, only a single possibility is left. + +@display +@t{03cf} @result{} @t{028a} +@t{12de} @result{} @t{139b} +@t{478b} @result{} @t{46ce} +@t{569a} @result{} @t{57df} +@end display +@end enumerate + +@subsubheading Example + +Consider the encoded character pair @samp{-|}. @var{a} is +0x2d and @var{b} is 0x7c, so @var{ah} is 2, @var{bh} is 7, @var{al} is +0xd, and @var{bl} is 0xc. @var{ah} means that the most significant +four bits of the decoded character is 2, 3, 6, or 7, and @var{bh} +means that they are 4, 6, 0xc, or 0xe. The single possibility in +common is 6, so the most significant four bits are 6. Similarly, +@var{al} means that the least significant four bits are 2, 3, 6, or 7, +and @var{bl} means they are 0, 2, 8, or 0xa, so the least significant +four bits are 2. The decoded character is therefore 0x62, the letter +@samp{b}.