pintos-os.org Git - pspp/blob - doc/dev/system-file-format.texi

   1 @node System File Format
   2 @appendix System File Format
   3
   4 A system file encapsulates a set of cases and dictionary information
   5 that describes how they may be interpreted.  This chapter describes
   6 the format of a system file.
   7
   8 System files use four data types: 8-bit characters, 32-bit integers,
   9 64-bit integers,
  10 and 64-bit floating points, called here @code{char}, @code{int32},
  11 @code{int64}, and
  12 @code{flt64}, respectively.  Data is not necessarily aligned on a word
  13 or double-word boundary: the long variable name record (@pxref{Long
  14 Variable Names Record}) and very long string records (@pxref{Very Long
  15 String Record}) have arbitrary byte length and can therefore cause all
  16 data coming after them in the file to be misaligned.
  17
  18 Integer data in system files may be big-endian or little-endian.  A
  19 reader may detect the endianness of a system file by examining
  20 @code{layout_code} in the file header record
  21 (@pxref{layout_code,,@code{layout_code}}).
  22
  23 Floating-point data in system files may nominally be in IEEE 754, IBM,
  24 or VAX formats.  A reader may detect the floating-point format in use
  25 by examining @code{bias} in the file header record
  26 (@pxref{bias,,@code{bias}}).
  27
  28 PSPP detects big-endian and little-endian integer formats in system
  29 files and translates as necessary.  PSPP also detects the
  30 floating-point format in use, as well as the endianness of IEEE 754
  31 floating-point numbers, and translates as needed.  However, only IEEE
  32 754 numbers with the same endianness as integer data in the same file
  33 has actually been observed in system files, and it is likely that
  34 other formats are obsolete or were never used.
  35
  36 System files use a few floating point values for special purposes:
  37
  38 @table @asis
  39 @item SYSMIS
  40 The system-missing value is represented by the largest possible
  41 negative number in the floating point format (@code{-DBL_MAX}).
  42
  43 @item HIGHEST
  44 HIGHEST is used as the high end of a missing value range with an
  45 unbounded maximum.  It is represented by the largest possible positive
  46 number (@code{DBL_MAX}).
  47
  48 @item LOWEST
  49 LOWEST is used as the low end of a missing value range with an
  50 unbounded minimum.  It was originally represented by the
  51 second-largest negative number (in IEEE 754 format,
  52 @code{0xffeffffffffffffe}).  System files written by SPSS 21 and later
  53 instead use the largest negative number (@code{-DBL_MAX}), the same
  54 value as SYSMIS.  This does not lead to ambiguity because LOWEST
  55 appears in system files only in missing value ranges, which never
  56 contain SYSMIS.
  57 @end table
  58
  59 System files are divided into records, each of which begins with a
  60 4-byte record type, usually regarded as an @code{int32}.
  61
  62 The records must appear in the following order:
  63
  64 @itemize @bullet
  65 @item
  66 File header record.
  67
  68 @item
  69 Variable records.
  70
  71 @item
  72 All pairs of value labels records and value label variables records,
  73 if present.
  74
  75 @item
  76 Document record, if present.
  77
  78 @item
  79 Extension (type 7) records, in ascending numerical order of their
  80 subtypes.
  81
  82 @item
  83 Dictionary termination record.
  84
  85 @item
  86 Data record.
  87 @end itemize
  88
  89 Each type of record is described separately below.
  90
  91 @menu
  92 * File Header Record::
  93 * Variable Record::
  94 * Value Labels Records::
  95 * Document Record::
  96 * Machine Integer Info Record::
  97 * Machine Floating-Point Info Record::
  98 * Multiple Response Sets Records::
  99 * Extra Product Info Record::
 100 * Variable Display Parameter Record::
 101 * Long Variable Names Record::
 102 * Very Long String Record::
 103 * Character Encoding Record::
 104 * Long String Value Labels Record::
 105 * Long String Missing Values Record::
 106 * Data File and Variable Attributes Records::
 107 * Extended Number of Cases Record::
 108 * Miscellaneous Informational Records::
 109 * Dictionary Termination Record::
 110 * Data Record::
 111 @end menu
 112
 113 @node File Header Record
 114 @section File Header Record
 115
 116 The file header is always the first record in the file.  It has the
 117 following format:
 118
 119 @example
 120 char                rec_type[4];
 121 char                prod_name[60];
 122 int32               layout_code;
 123 int32               nominal_case_size;
 124 int32               compressed;
 125 int32               weight_index;
 126 int32               ncases;
 127 flt64               bias;
 128 char                creation_date[9];
 129 char                creation_time[8];
 130 char                file_label[64];
 131 char                padding[3];
 132 @end example
 133
 134 @table @code
 135 @item char rec_type[4];
 136 Record type code, set to @samp{$FL2}, that is, either @code{24 46 4c
 137 32} if the file uses an ASCII-based character encoding, or @code{5b c6
 138 d3 f2} if the file uses an EBCDIC-based character encoding.
 139
 140 @item char prod_name[60];
 141 Product identification string.  This always begins with the characters
 142 @samp{@@(#) SPSS DATA FILE}.  PSPP uses the remaining characters to
 143 give its version and the operating system name; for example, @samp{GNU
 144 pspp 0.1.4 - sparc-sun-solaris2.5.2}.  The string is truncated if it
 145 would be longer than 60 characters; otherwise it is padded on the right
 146 with spaces.
 147
 148 @anchor{layout_code}
 149 @item int32 layout_code;
 150 Normally set to 2, although a few system files have been spotted in
 151 the wild with a value of 3 here.  PSPP use this value to determine the
 152 file's integer endianness (@pxref{System File Format}).
 153
 154 @item int32 nominal_case_size;
 155 Number of data elements per case.  This is the number of variables,
 156 except that long string variables add extra data elements (one for every
 157 8 characters after the first 8).  However, string variables do not
 158 contribute to this value beyond the first 255 bytes.   Further, system
 159 files written by some systems set this value to -1.  In general, it is
 160 unsafe for systems reading system files to rely upon this value.
 161
 162 @item int32 compressed;
 163 Set to 1 if the data in the file is compressed, 0 otherwise.
 164
 165 @item int32 weight_index;
 166 If one of the variables in the data set is used as a weighting
 167 variable, set to the dictionary index of that variable, plus 1
 168 (@pxref{Dictionary Index}).  Otherwise, set to 0.
 169
 170 @item int32 ncases;
 171 Set to the number of cases in the file if it is known, or -1 otherwise.
 172
 173 In the general case it is not possible to determine the number of cases
 174 that will be output to a system file at the time that the header is
 175 written.  The way that this is dealt with is by writing the entire
 176 system file, including the header, then seeking back to the beginning of
 177 the file and writing just the @code{ncases} field.  For files in which
 178 this is not valid, the seek operation fails.  In this case,
 179 @code{ncases} remains -1.
 180
 181 @anchor{bias}
 182 @item flt64 bias;
 183 Compression bias, ordinarily set to 100.  Only integers between
 184 @code{1 - bias} and @code{251 - bias} can be compressed.
 185
 186 By assuming that its value is 100, PSPP uses @code{bias} to determine
 187 the file's floating-point format and endianness (@pxref{System File
 188 Format}).  If the compression bias is not 100, PSPP cannot auto-detect
 189 the floating-point format and assumes that it is IEEE 754 format with
 190 the same endianness as the system file's integers, which is correct
 191 for all known system files.
 192
 193 @item char creation_date[9];
 194 Date of creation of the system file, in @samp{dd mmm yy}
 195 format, with the month as standard English abbreviations, using an
 196 initial capital letter and following with lowercase.  If the date is not
 197 available then this field is arbitrarily set to @samp{01 Jan 70}.
 198
 199 @item char creation_time[8];
 200 Time of creation of the system file, in @samp{hh:mm:ss}
 201 format and using 24-hour time.  If the time is not available then this
 202 field is arbitrarily set to @samp{00:00:00}.
 203
 204 @item char file_label[64];
 205 File label declared by the user, if any (@pxref{FILE LABEL,,,pspp,
 206 PSPP Users Guide}).  Padded on the right with spaces.
 207
 208 A product that identifies itself as @code{VOXCO INTERVIEWER 4.3} uses
 209 CR-only line ends in this field, rather than the more usual LF-only or
 210 CR LF line ends.
 211
 212 @item char padding[3];
 213 Ignored padding bytes to make the structure a multiple of 32 bits in
 214 length.  Set to zeros.
 215 @end table
 216
 217 @node Variable Record
 218 @section Variable Record
 219
 220 There must be one variable record for each numeric variable and each
 221 string variable with width 8 bytes or less.  String variables wider
 222 than 8 bytes have one variable record for each 8 bytes, rounding up.
 223 The first variable record for a long string specifies the variable's
 224 correct dictionary information.  Subsequent variable records for a
 225 long string are filled with dummy information: a type of -1, no
 226 variable label or missing values, print and write formats that are
 227 ignored, and an empty string as name.  A few system files have been
 228 encountered that include a variable label on dummy variable records,
 229 so readers should take care to parse dummy variable records in the
 230 same way as other variable records.
 231
 232 @anchor{Dictionary Index}
 233 The @dfn{dictionary index} of a variable is its offset in the set of
 234 variable records, including dummy variable records for long string
 235 variables.  The first variable record has a dictionary index of 0, the
 236 second has a dictionary index of 1, and so on.
 237
 238 The system file format does not directly support string variables
 239 wider than 255 bytes.  Such very long string variables are represented
 240 by a number of narrower string variables.  @xref{Very Long String
 241 Record}, for details.
 242
 243 @example
 244 int32               rec_type;
 245 int32               type;
 246 int32               has_var_label;
 247 int32               n_missing_values;
 248 int32               print;
 249 int32               write;
 250 char                name[8];
 251
 252 /* @r{Present only if @code{has_var_label} is 1.} */
 253 int32               label_len;
 254 char                label[];
 255
 256 /* @r{Present only if @code{n_missing_values} is nonzero}. */
 257 flt64               missing_values[];
 258 @end example
 259
 260 @table @code
 261 @item int32 rec_type;
 262 Record type code.  Always set to 2.
 263
 264 @item int32 type;
 265 Variable type code.  Set to 0 for a numeric variable.  For a short
 266 string variable or the first part of a long string variable, this is set
 267 to the width of the string.  For the second and subsequent parts of a
 268 long string variable, set to -1, and the remaining fields in the
 269 structure are ignored.
 270
 271 @item int32 has_var_label;
 272 If this variable has a variable label, set to 1; otherwise, set to 0.
 273
 274 @item int32 n_missing_values;
 275 If the variable has no missing values, set to 0.  If the variable has
 276 one, two, or three discrete missing values, set to 1, 2, or 3,
 277 respectively.  If the variable has a range for missing variables, set to
 278 -2; if the variable has a range for missing variables plus a single
 279 discrete value, set to -3.
 280
 281 A long string variable always has the value 0 here.  A separate record
 282 indicates missing values for long string variables (@pxref{Long String
 283 Missing Values Record}).
 284
 285 @item int32 print;
 286 Print format for this variable.  See below.
 287
 288 @item int32 write;
 289 Write format for this variable.  See below.
 290
 291 @item char name[8];
 292 Variable name.  The variable name must begin with a capital letter or
 293 the at-sign (@samp{@@}).  Subsequent characters may also be digits, octothorpes
 294 (@samp{#}), dollar signs (@samp{$}), underscores (@samp{_}), or full
 295 stops (@samp{.}).  The variable name is padded on the right with spaces.
 296
 297 @item int32 label_len;
 298 This field is present only if @code{has_var_label} is set to 1.  It is
 299 set to the length, in characters, of the variable label.  The
 300 documented maximum length varies from 120 to 255 based on SPSS
 301 version, but some files have been seen with longer labels.  PSPP
 302 accepts longer labels and truncates them to 255 bytes on input.
 303
 304 @item char label[];
 305 This field is present only if @code{has_var_label} is set to 1.  It has
 306 length @code{label_len}, rounded up to the nearest multiple of 32 bits.
 307 The first @code{label_len} characters are the variable's variable label.
 308
 309 @item flt64 missing_values[];
 310 This field is present only if @code{n_missing_values} is nonzero.  It
 311 has the same number of 8-byte elements as the absolute value of
 312 @code{n_missing_values}.  Each element is interpreted as a number for
 313 numeric variables (with HIGHEST and LOWEST indicated as described in
 314 the chapter introduction).  For string variables of width less than 8
 315 bytes, elements are right-padded with spaces; for string variables
 316 wider than 8 bytes, only the first 8 bytes of each missing value are
 317 specified, with the remainder implicitly all spaces.
 318
 319 For discrete missing values, each element represents one missing
 320 value.  When a range is present, the first element denotes the minimum
 321 value in the range, and the second element denotes the maximum value
 322 in the range.  When a range plus a value are present, the third
 323 element denotes the additional discrete missing value.
 324 @end table
 325
 326 The @code{print} and @code{write} members of sysfile_variable are output
 327 formats coded into @code{int32} types.  The least-significant byte
 328 of the @code{int32} represents the number of decimal places, and the
 329 next two bytes in order of increasing significance represent field width
 330 and format type, respectively.  The most-significant byte is not
 331 used and should be set to zero.
 332
 333 Format types are defined as follows:
 334
 335 @quotation
 336 @multitable {Value} {@code{DATETIME}}
 337 @headitem Value
 338 @tab Meaning
 339 @item 0
 340 @tab Not used.
 341 @item 1
 342 @tab @code{A}
 343 @item 2
 344 @tab @code{AHEX}
 345 @item 3
 346 @tab @code{COMMA}
 347 @item 4
 348 @tab @code{DOLLAR}
 349 @item 5
 350 @tab @code{F}
 351 @item 6
 352 @tab @code{IB}
 353 @item 7
 354 @tab @code{PIBHEX}
 355 @item 8
 356 @tab @code{P}
 357 @item 9
 358 @tab @code{PIB}
 359 @item 10
 360 @tab @code{PK}
 361 @item 11
 362 @tab @code{RB}
 363 @item 12
 364 @tab @code{RBHEX}
 365 @item 13
 366 @tab Not used.
 367 @item 14
 368 @tab Not used.
 369 @item 15
 370 @tab @code{Z}
 371 @item 16
 372 @tab @code{N}
 373 @item 17
 374 @tab @code{E}
 375 @item 18
 376 @tab Not used.
 377 @item 19
 378 @tab Not used.
 379 @item 20
 380 @tab @code{DATE}
 381 @item 21
 382 @tab @code{TIME}
 383 @item 22
 384 @tab @code{DATETIME}
 385 @item 23
 386 @tab @code{ADATE}
 387 @item 24
 388 @tab @code{JDATE}
 389 @item 25
 390 @tab @code{DTIME}
 391 @item 26
 392 @tab @code{WKDAY}
 393 @item 27
 394 @tab @code{MONTH}
 395 @item 28
 396 @tab @code{MOYR}
 397 @item 29
 398 @tab @code{QYR}
 399 @item 30
 400 @tab @code{WKYR}
 401 @item 31
 402 @tab @code{PCT}
 403 @item 32
 404 @tab @code{DOT}
 405 @item 33
 406 @tab @code{CCA}
 407 @item 34
 408 @tab @code{CCB}
 409 @item 35
 410 @tab @code{CCC}
 411 @item 36
 412 @tab @code{CCD}
 413 @item 37
 414 @tab @code{CCE}
 415 @item 38
 416 @tab @code{EDATE}
 417 @item 39
 418 @tab @code{SDATE}
 419 @end multitable
 420 @end quotation
 421
 422 A few system files have been observed in the wild with invalid
 423 @code{write} fields, in particular with value 0.  Readers should
 424 probably treat invalid @code{print} or @code{write} fields as some
 425 default format.
 426
 427 @node Value Labels Records
 428 @section Value Labels Records
 429
 430 The value label records documented in this section are used for
 431 numeric and short string variables only.  Long string variables may
 432 have value labels, but their value labels are recorded using a
 433 different record type (@pxref{Long String Value Labels Record}).
 434
 435 The value label record has the following format:
 436
 437 @example
 438 int32               rec_type;
 439 int32               label_count;
 440
 441 /* @r{Repeated @code{label_cnt} times}. */
 442 char                value[8];
 443 char                label_len;
 444 char                label[];
 445 @end example
 446
 447 @table @code
 448 @item int32 rec_type;
 449 Record type.  Always set to 3.
 450
 451 @item int32 label_count;
 452 Number of value labels present in this record.
 453 @end table
 454
 455 The remaining fields are repeated @code{count} times.  Each
 456 repetition specifies one value label.
 457
 458 @table @code
 459 @item char value[8];
 460 A numeric value or a short string value padded as necessary to 8 bytes
 461 in length.  Its type and width cannot be determined until the
 462 following value label variables record (see below) is read.
 463
 464 @item char label_len;
 465 The label's length, in bytes.  The documented maximum length varies
 466 from 60 to 120 based on SPSS version.  PSPP supports value labels up
 467 to 255 bytes long.
 468
 469 @item char label[];
 470 @code{label_len} bytes of the actual label, followed by up to 7 bytes
 471 of padding to bring @code{label} and @code{label_len} together to a
 472 multiple of 8 bytes in length.
 473 @end table
 474
 475 The value label record is always immediately followed by a value label
 476 variables record with the following format:
 477
 478 @example
 479 int32               rec_type;
 480 int32               var_count;
 481 int32               vars[];
 482 @end example
 483
 484 @table @code
 485 @item int32 rec_type;
 486 Record type.  Always set to 4.
 487
 488 @item int32 var_count;
 489 Number of variables that the associated value labels from the value
 490 label record are to be applied.
 491
 492 @item int32 vars[];
 493 A list of dictionary indexes of variables to which to apply the value
 494 labels (@pxref{Dictionary Index}).  There are @code{var_count}
 495 elements.
 496
 497 String variables wider than 8 bytes may not be specified in this list.
 498 @end table
 499
 500 @node Document Record
 501 @section Document Record
 502
 503 The document record, if present, has the following format:
 504
 505 @example
 506 int32               rec_type;
 507 int32               n_lines;
 508 char                lines[][80];
 509 @end example
 510
 511 @table @code
 512 @item int32 rec_type;
 513 Record type.  Always set to 6.
 514
 515 @item int32 n_lines;
 516 Number of lines of documents present.
 517
 518 @item char lines[][80];
 519 Document lines.  The number of elements is defined by @code{n_lines}.
 520 Lines shorter than 80 characters are padded on the right with spaces.
 521 @end table
 522
 523 @node Machine Integer Info Record
 524 @section Machine Integer Info Record
 525
 526 The integer info record, if present, has the following format:
 527
 528 @example
 529 /* @r{Header.} */
 530 int32               rec_type;
 531 int32               subtype;
 532 int32               size;
 533 int32               count;
 534
 535 /* @r{Data.} */
 536 int32               version_major;
 537 int32               version_minor;
 538 int32               version_revision;
 539 int32               machine_code;
 540 int32               floating_point_rep;
 541 int32               compression_code;
 542 int32               endianness;
 543 int32               character_code;
 544 @end example
 545
 546 @table @code
 547 @item int32 rec_type;
 548 Record type.  Always set to 7.
 549
 550 @item int32 subtype;
 551 Record subtype.  Always set to 3.
 552
 553 @item int32 size;
 554 Size of each piece of data in the data part, in bytes.  Always set to 4.
 555
 556 @item int32 count;
 557 Number of pieces of data in the data part.  Always set to 8.
 558
 559 @item int32 version_major;
 560 PSPP major version number.  In version @var{x}.@var{y}.@var{z}, this
 561 is @var{x}.
 562
 563 @item int32 version_minor;
 564 PSPP minor version number.  In version @var{x}.@var{y}.@var{z}, this
 565 is @var{y}.
 566
 567 @item int32 version_revision;
 568 PSPP version revision number.  In version @var{x}.@var{y}.@var{z},
 569 this is @var{z}.
 570
 571 @item int32 machine_code;
 572 Machine code.  PSPP always set this field to value to -1, but other
 573 values may appear.
 574
 575 @item int32 floating_point_rep;
 576 Floating point representation code.  For IEEE 754 systems this is 1.
 577 IBM 370 sets this to 2, and DEC VAX E to 3.
 578
 579 @item int32 compression_code;
 580 Compression code.  Always set to 1.
 581
 582 @item int32 endianness;
 583 Machine endianness.  1 indicates big-endian, 2 indicates little-endian.
 584
 585 @item int32 character_code;
 586 @anchor{character-code} Character code.  The following values have
 587 been actually observed in system files:
 588
 589 @table @asis
 590 @item 1
 591 EBCDIC.
 592
 593 @item 2
 594 7-bit ASCII.
 595
 596 @item 1250
 597 The @code{windows-1250} code page for Central European and Eastern
 598 European languages.
 599
 600 @item 1252
 601 The @code{windows-1252} code page for Western European languages.
 602
 603 @item 28591
 604 ISO 8859-1.
 605
 606 @item 65001
 607 UTF-8.
 608 @end table
 609
 610 The following additional values are known to be defined:
 611
 612 @table @asis
 613 @item 3
 614 8-bit ``ASCII''.
 615
 616 @item 4
 617 DEC Kanji.
 618 @end table
 619
 620 Other Windows code page numbers are known to be generally valid.
 621
 622 Old versions of SPSS for Unix and Windows always wrote value 2 in this
 623 field, regardless of the encoding in use.  Newer versions also write
 624 the character encoding as a string (see @ref{Character Encoding
 625 Record}).
 626 @end table
 627
 628 @node Machine Floating-Point Info Record
 629 @section Machine Floating-Point Info Record
 630
 631 The floating-point info record, if present, has the following format:
 632
 633 @example
 634 /* @r{Header.} */
 635 int32               rec_type;
 636 int32               subtype;
 637 int32               size;
 638 int32               count;
 639
 640 /* @r{Data.} */
 641 flt64               sysmis;
 642 flt64               highest;
 643 flt64               lowest;
 644 @end example
 645
 646 @table @code
 647 @item int32 rec_type;
 648 Record type.  Always set to 7.
 649
 650 @item int32 subtype;
 651 Record subtype.  Always set to 4.
 652
 653 @item int32 size;
 654 Size of each piece of data in the data part, in bytes.  Always set to 8.
 655
 656 @item int32 count;
 657 Number of pieces of data in the data part.  Always set to 3.
 658
 659 @item flt64 sysmis;
 660 The system missing value.
 661
 662 @item flt64 highest;
 663 The value used for HIGHEST in missing values.
 664
 665 @item flt64 lowest;
 666 The value used for LOWEST in missing values.
 667 @end table
 668
 669 @node Multiple Response Sets Records
 670 @section Multiple Response Sets Records
 671
 672 The system file format has two different types of records that
 673 represent multiple response sets (@pxref{MRSETS,,,pspp, PSPP Users
 674 Guide}).  The first type of record describes multiple response sets
 675 that can be understood by SPSS before version 14.  The second type of
 676 record, with a closely related format, is used for multiple dichotomy
 677 sets that use the CATEGORYLABELS=COUNTEDVALUES feature added in
 678 version 14.
 679
 680 @example
 681 /* @r{Header.} */
 682 int32               rec_type;
 683 int32               subtype;
 684 int32               size;
 685 int32               count;
 686
 687 /* @r{Exactly @code{count} bytes of data.} */
 688 char                mrsets[];
 689 @end example
 690
 691 @table @code
 692 @item int32 rec_type;
 693 Record type.  Always set to 7.
 694
 695 @item int32 subtype;
 696 Record subtype.  Set to 7 for records that describe multiple response
 697 sets understood by SPSS before version 14, or to 19 for records that
 698 describe dichotomy sets that use the CATEGORYLABELS=COUNTEDVALUES
 699 feature added in version 14.
 700
 701 @item int32 size;
 702 The size of each element in the @code{mrsets} member. Always set to 1.
 703
 704 @item int32 count;
 705 The total number of bytes in @code{mrsets}.
 706
 707 @item char mrsets[];
 708 A series of multiple response sets, each of which consists of the
 709 following:
 710
 711 @itemize @bullet
 712 @item
 713 The set's name (an identifier that begins with @samp{$}), in mixed
 714 upper and lower case.
 715
 716 @item
 717 An equals sign (@samp{=}).
 718
 719 @item
 720 @samp{C} for a multiple category set, @samp{D} for a multiple
 721 dichotomy set with CATEGORYLABELS=VARLABELS, or @samp{E} for a
 722 multiple dichotomy set with CATEGORYLABELS=COUNTEDVALUES.
 723
 724 @item
 725 For a multiple dichotomy set with CATEGORYLABELS=COUNTEDVALUES, a
 726 space, followed by a number expressed as decimal digits, followed by a
 727 space.  If LABELSOURCE=VARLABEL was specified on MRSETS, then the
 728 number is 11; otherwise it is 1.@footnote{This part of the format may
 729 not be fully understood, because only a single example of each
 730 possibility has been examined.}
 731
 732 @item
 733 For either kind of multiple dichotomy set, the counted value, as a
 734 positive integer count specified as decimal digits, followed by a
 735 space, followed by as many string bytes as specified in the count.  If
 736 the set contains numeric variables, the string consists of the counted
 737 integer value expressed as decimal digits.  If the set contains string
 738 variables, the string contains the counted string value.  Either way,
 739 the string may be padded on the right with spaces (older versions of
 740 SPSS seem to always pad to a width of 8 bytes; newer versions don't).
 741
 742 @item
 743 A space.
 744
 745 @item
 746 The multiple response set's label, using the same format as for the
 747 counted value for multiple dichotomy sets.  A string of length 0 means
 748 that the set does not have a label.  A string of length 0 is also
 749 written if LABELSOURCE=VARLABEL was specified.
 750
 751 @item
 752 A space.
 753
 754 @item
 755 The short names of the variables in the set, converted to lowercase,
 756 each separated from the previous by a single space.
 757
 758 @item
 759 A line feed (byte 0x0a).
 760 @end itemize
 761 @end table
 762
 763 Example: Given appropriate variable definitions, consider the
 764 following MRSETS command:
 765
 766 @example
 767 MRSETS /MCGROUP NAME=$a LABEL='my mcgroup' VARIABLES=a b c
 768        /MDGROUP NAME=$b VARIABLES=g e f d VALUE=55
 769        /MDGROUP NAME=$c LABEL='mdgroup #2' VARIABLES=h i j VALUE='Yes'
 770        /MDGROUP NAME=$d LABEL='third mdgroup' CATEGORYLABELS=COUNTEDVALUES
 771         VARIABLES=k l m VALUE=34
 772        /MDGROUP NAME=$e CATEGORYLABELS=COUNTEDVALUES LABELSOURCE=VARLABEL
 773         VARIABLES=n o p VALUE='choice'.
 774 @end example
 775
 776 The above would generate the following multiple response set record of
 777 subtype 7:
 778
 779 @example
 780 $a=C 10 my mcgroup a b c
 781 $b=D2 55 0  g e f d
 782 $c=D3 Yes 10 mdgroup #2 h i j
 783 @end example
 784
 785 It would also generate the following multiple response set record with
 786 subtype 19:
 787
 788 @example
 789 $d=E 1 2 34 13 third mdgroup k l m
 790 $e=E 11 6 choice 0  n o p
 791 @end example
 792
 793 @node Extra Product Info Record
 794 @section Extra Product Info Record
 795
 796 This optional record appears to contain a text string that describes
 797 the program that wrote the file and the source of the data.  (This is
 798 redundant with the file label and product info found in the file
 799 header record.)
 800
 801 @example
 802 /* @r{Header.} */
 803 int32               rec_type;
 804 int32               subtype;
 805 int32               size;
 806 int32               count;
 807
 808 /* @r{Exactly @code{count} bytes of data.} */
 809 char                info[];
 810 @end example
 811
 812 @table @code
 813 @item int32 rec_type;
 814 Record type.  Always set to 7.
 815
 816 @item int32 subtype;
 817 Record subtype.  Always set to 10.
 818
 819 @item int32 size;
 820 The size of each element in the @code{info} member. Always set to 1.
 821
 822 @item int32 count;
 823 The total number of bytes in @code{info}.
 824
 825 @item char info[];
 826 A text string.  A product that identifies itself as @code{VOXCO
 827 INTERVIEWER 4.3} uses CR-only line ends in this field, rather than the
 828 more usual LF-only or CR LF line ends.
 829 @end table
 830
 831 @node Variable Display Parameter Record
 832 @section Variable Display Parameter Record
 833
 834 The variable display parameter record, if present, has the following
 835 format:
 836
 837 @example
 838 /* @r{Header.} */
 839 int32               rec_type;
 840 int32               subtype;
 841 int32               size;
 842 int32               count;
 843
 844 /* @r{Repeated @code{count} times}. */
 845 int32               measure;
 846 int32               width;           /* @r{Not always present.} */
 847 int32               alignment;
 848 @end example
 849
 850 @table @code
 851 @item int32 rec_type;
 852 Record type.  Always set to 7.
 853
 854 @item int32 subtype;
 855 Record subtype.  Always set to 11.
 856
 857 @item int32 size;
 858 The size of @code{int32}.  Always set to 4.
 859
 860 @item int32 count;
 861 The number of sets of variable display parameters (ordinarily the
 862 number of variables in the dictionary), times 2 or 3.
 863 @end table
 864
 865 The remaining members are repeated @code{count} times, in the same
 866 order as the variable records.  No element corresponds to variable
 867 records that continue long string variables.  The meanings of these
 868 members are as follows:
 869
 870 @table @code
 871 @item int32 measure;
 872 The measurement type of the variable:
 873 @table @asis
 874 @item 1
 875 Nominal Scale
 876 @item 2
 877 Ordinal Scale
 878 @item 3
 879 Continuous Scale
 880 @end table
 881
 882 SPSS sometimes writes a @code{measure} of 0.  PSPP interprets this as
 883 nominal scale.
 884
 885 @item int32 width;
 886 The width of the display column for the variable in characters.
 887
 888 This field is present if @var{count} is 3 times the number of
 889 variables in the dictionary.  It is omitted if @var{count} is 2 times
 890 the number of variables.
 891
 892 @item int32 alignment;
 893 The alignment of the variable for display purposes:
 894
 895 @table @asis
 896 @item 0
 897 Left aligned
 898 @item 1
 899 Right aligned
 900 @item 2
 901 Centre aligned
 902 @end table
 903 @end table
 904
 905 @node Long Variable Names Record
 906 @section Long Variable Names Record
 907
 908 If present, the long variable names record has the following format:
 909
 910 @example
 911 /* @r{Header.} */
 912 int32               rec_type;
 913 int32               subtype;
 914 int32               size;
 915 int32               count;
 916
 917 /* @r{Exactly @code{count} bytes of data.} */
 918 char                var_name_pairs[];
 919 @end example
 920
 921 @table @code
 922 @item int32 rec_type;
 923 Record type.  Always set to 7.
 924
 925 @item int32 subtype;
 926 Record subtype.  Always set to 13.
 927
 928 @item int32 size;
 929 The size of each element in the @code{var_name_pairs} member. Always set to 1.
 930
 931 @item int32 count;
 932 The total number of bytes in @code{var_name_pairs}.
 933
 934 @item char var_name_pairs[];
 935 A list of @var{key}--@var{value} tuples, where @var{key} is the name
 936 of a variable, and @var{value} is its long variable name.
 937 The @var{key} field is at most 8 bytes long and must match the
 938 name of a variable which appears in the variable record (@pxref{Variable
 939 Record}).
 940 The @var{value} field is at most 64 bytes long.
 941 The @var{key} and @var{value} fields are separated by a @samp{=} byte.
 942 Each tuple is separated by a byte whose value is 09.  There is no
 943 trailing separator following the last tuple.
 944 The total length is @code{count} bytes.
 945 @end table
 946
 947 @node Very Long String Record
 948 @section Very Long String Record
 949
 950 Old versions of SPSS limited string variables to a width of 255 bytes.
 951 For backward compatibility with these older versions, the system file
 952 format represents a string longer than 255 bytes, called a @dfn{very
 953 long string}, as a collection of strings no longer than 255 bytes
 954 each.  The strings concatenated to make a very long string are called
 955 its @dfn{segments}; for consistency, variables other than very long
 956 strings are considered to have a single segment.
 957
 958 A very long string with a width of @var{w} has @var{n} =
 959 (@var{w} + 251) / 252 segments, that is, one segment for every
 960 252 bytes of width, rounding up.  It would be logical, then, for each
 961 of the segments except the last to have a width of 252 and the last
 962 segment to have the remainder, but this is not the case.  In fact,
 963 each segment except the last has a width of 255 bytes.  The last
 964 segment has width @var{w} - (@var{n} - 1) * 252; some versions
 965 of SPSS make it slightly wider, but not wide enough to make the last
 966 segment require another 8 bytes of data.
 967
 968 Data is packed tightly into segments of a very long string, 255 bytes
 969 per segment.  Because 255 bytes of segment data are allocated for
 970 every 252 bytes of the very long string's width (approximately), some
 971 unused space is left over at the end of the allocated segments.  Data
 972 in unused space is ignored.
 973
 974 Example: Consider a very long string of width 20,000.  Such a very
 975 long string has 20,000 / 252 = 80 (rounding up) segments.  The first
 976 79 segments have width 255; the last segment has width 20,000 - 79 *
 977 252 = 92 or slightly wider (up to 96 bytes, the next multiple of 8).
 978 The very long string's data is actually stored in the 19,890 bytes in
 979 the first 78 segments, plus the first 110 bytes of the 79th segment
 980 (19,890 + 110 = 20,000).  The remaining 145 bytes of the 79th segment
 981 and all 92 bytes of the 80th segment are unused.
 982
 983 The very long string record explains how to stitch together segments
 984 to obtain very long string data.  For each of the very long string
 985 variables in the dictionary, it specifies the name of its first
 986 segment's variable and the very long string variable's actual width.
 987 The remaining segments immediately follow the named variable in the
 988 system file's dictionary.
 989
 990 The very long string record, which is present only if the system file
 991 contains very long string variables, has the following format:
 992
 993 @example
 994 /* @r{Header.} */
 995 int32               rec_type;
 996 int32               subtype;
 997 int32               size;
 998 int32               count;
 999
1000 /* @r{Exactly @code{count} bytes of data.} */
1001 char                string_lengths[];
1002 @end example
1003
1004 @table @code
1005 @item int32 rec_type;
1006 Record type.  Always set to 7.
1007
1008 @item int32 subtype;
1009 Record subtype.  Always set to 14.
1010
1011 @item int32 size;
1012 The size of each element in the @code{string_lengths} member. Always set to 1.
1013
1014 @item int32 count;
1015 The total number of bytes in @code{string_lengths}.
1016
1017 @item char string_lengths[];
1018 A list of @var{key}--@var{value} tuples, where @var{key} is the name
1019 of a variable, and @var{value} is its length.
1020 The @var{key} field is at most 8 bytes long and must match the
1021 name of a variable which appears in the variable record (@pxref{Variable
1022 Record}).
1023 The @var{value} field is exactly 5 bytes long. It is a zero-padded,
1024 ASCII-encoded string that is the length of the variable.
1025 The @var{key} and @var{value} fields are separated by a @samp{=} byte.
1026 Tuples are delimited by a two-byte sequence @{00, 09@}.
1027 After the last tuple, there may be a single byte 00, or @{00, 09@}.
1028 The total length is @code{count} bytes.
1029 @end table
1030
1031 @node Character Encoding Record
1032 @section Character Encoding Record
1033
1034 This record, if present, indicates the character encoding for string data,
1035 long variable names, variable labels, value labels and other strings in the
1036 file.
1037
1038 @example
1039 /* @r{Header.} */
1040 int32               rec_type;
1041 int32               subtype;
1042 int32               size;
1043 int32               count;
1044
1045 /* @r{Exactly @code{count} bytes of data.} */
1046 char                encoding[];
1047 @end example
1048
1049 @table @code
1050 @item int32 rec_type;
1051 Record type.  Always set to 7.
1052
1053 @item int32 subtype;
1054 Record subtype.  Always set to 20.
1055
1056 @item int32 size;
1057 The size of each element in the @code{encoding} member. Always set to 1.
1058
1059 @item int32 count;
1060 The total number of bytes in @code{encoding}.
1061
1062 @item char encoding[];
1063 The name of the character encoding.  Normally this will be an official
1064 IANA character set name or alias.
1065 See @url{http://www.iana.org/assignments/character-sets}.
1066 Character set names are not case-sensitive, but SPSS appears to write
1067 them in all-uppercase.
1068 @end table
1069
1070 This record is not present in files generated by older software.  See
1071 also the @code{character_code} field in the machine integer info
1072 record (@pxref{character-code}).
1073
1074 When the character encoding record and the machine integer info record
1075 are both present, all system files observed in practice indicate the
1076 same character encoding, e.g.@: 1252 as @code{character_code} and
1077 @code{windows-1252} as @code{encoding}, 65001 and @code{UTF-8}, etc.
1078
1079 If, for testing purposes, a file is crafted with different
1080 @code{character_code} and @code{encoding}, it seems that
1081 @code{character_code} controls the encoding for all strings in the
1082 system file before the dictionary termination record, including
1083 strings in data (e.g.@: string missing values), and @code{encoding}
1084 controls the encoding for strings following the dictionary termination
1085 record.
1086
1087 @node Long String Value Labels Record
1088 @section Long String Value Labels Record
1089
1090 This record, if present, specifies value labels for long string
1091 variables.
1092
1093 @example
1094 /* @r{Header.} */
1095 int32               rec_type;
1096 int32               subtype;
1097 int32               size;
1098 int32               count;
1099
1100 /* @r{Repeated up to exactly @code{count} bytes.} */
1101 int32               var_name_len;
1102 char                var_name[];
1103 int32               var_width;
1104 int32               n_labels;
1105 long_string_label   labels[];
1106 @end example
1107
1108 @table @code
1109 @item int32 rec_type;
1110 Record type.  Always set to 7.
1111
1112 @item int32 subtype;
1113 Record subtype.  Always set to 21.
1114
1115 @item int32 size;
1116 Always set to 1.
1117
1118 @item int32 count;
1119 The number of bytes following the header until the next header.
1120
1121 @item int32 var_name_len;
1122 @itemx char var_name[];
1123 The number of bytes in the name of the variable that has long string
1124 value labels, plus the variable name itself, which consists of exactly
1125 @code{var_name_len} bytes.  The variable name is not padded to any
1126 particular boundary, nor is it null-terminated.
1127
1128 @item int32 var_width;
1129 The width of the variable, in bytes, which will be between 9 and
1130 32767.
1131
1132 @item int32 n_labels;
1133 @itemx long_string_label labels[];
1134 The long string labels themselves.  The @code{labels} array contains
1135 exactly @code{n_labels} elements, each of which has the following
1136 substructure:
1137
1138 @example
1139 int32               value_len;
1140 char                value[];
1141 int32               label_len;
1142 char                label[];
1143 @end example
1144
1145 @table @code
1146 @item int32 value_len;
1147 @itemx char value[];
1148 The string value being labeled.  @code{value_len} is the number of
1149 bytes in @code{value}; it is equal to @code{var_width}.  The
1150 @code{value} array is not padded or null-terminated.
1151
1152 @item int32 label_len;
1153 @itemx char label[];
1154 The label for the string value.  @code{label_len}, which must be
1155 between 0 and 120, is the number of bytes in @code{label}.  The
1156 @code{label} array is not padded or null-terminated.
1157 @end table
1158 @end table
1159
1160 @node Long String Missing Values Record
1161 @section Long String Missing Values Record
1162
1163 This record, if present, specifies missing values for long string
1164 variables.
1165
1166 @example
1167 /* @r{Header.} */
1168 int32               rec_type;
1169 int32               subtype;
1170 int32               size;
1171 int32               count;
1172
1173 /* @r{Repeated up to exactly @code{count} bytes.} */
1174 int32               var_name_len;
1175 char                var_name[];
1176 char                n_missing_values;
1177 long_string_missing_value   values[];
1178 @end example
1179
1180 @table @code
1181 @item int32 rec_type;
1182 Record type.  Always set to 7.
1183
1184 @item int32 subtype;
1185 Record subtype.  Always set to 22.
1186
1187 @item int32 size;
1188 Always set to 1.
1189
1190 @item int32 count;
1191 The number of bytes following the header until the next header.
1192
1193 @item int32 var_name_len;
1194 @itemx char var_name[];
1195 The number of bytes in the name of the long string variable that has
1196 missing values, plus the variable name itself, which consists of
1197 exactly @code{var_name_len} bytes.  The variable name is not padded to
1198 any particular boundary, nor is it null-terminated.
1199
1200 @item char n_missing_values;
1201 The number of missing values, either 1, 2, or 3.  (This is, unusually,
1202 a single byte instead of a 32-bit number.)
1203
1204 @item long_string_missing_value values[];
1205 The missing values themselves.  This array contains exactly
1206 @code{n_missing_values} elements, each of which has the following
1207 substructure:
1208
1209 @example
1210 int32               value_len;
1211 char                value[];
1212 @end example
1213
1214 @table @code
1215 @item int32 value_len;
1216 The length of the missing value string, in bytes.  This value should
1217 be 8, because long string variables are at least 8 bytes wide (by
1218 definition), only the first 8 bytes of a long string variable's
1219 missing values are allowed to be non-spaces, and any spaces within the
1220 first 8 bytes are included in the missing value here.
1221
1222 @item char value[];
1223 The missing value string, exactly @code{value_len} bytes, without
1224 any padding or null terminator.
1225 @end table
1226 @end table
1227
1228 @node Data File and Variable Attributes Records
1229 @section Data File and Variable Attributes Records
1230
1231 The data file and variable attributes records represent custom
1232 attributes for the system file or for individual variables in the
1233 system file, as defined on the DATAFILE ATTRIBUTE (@pxref{DATAFILE
1234 ATTRIBUTE,,,pspp, PSPP Users Guide}) and VARIABLE ATTRIBUTE commands
1235 (@pxref{VARIABLE ATTRIBUTE,,,pspp, PSPP Users Guide}), respectively.
1236
1237 @example
1238 /* @r{Header.} */
1239 int32               rec_type;
1240 int32               subtype;
1241 int32               size;
1242 int32               count;
1243
1244 /* @r{Exactly @code{count} bytes of data.} */
1245 char                attributes[];
1246 @end example
1247
1248 @table @code
1249 @item int32 rec_type;
1250 Record type.  Always set to 7.
1251
1252 @item int32 subtype;
1253 Record subtype.  Always set to 17 for a data file attribute record or
1254 to 18 for a variable attributes record.
1255
1256 @item int32 size;
1257 The size of each element in the @code{attributes} member. Always set to 1.
1258
1259 @item int32 count;
1260 The total number of bytes in @code{attributes}.
1261
1262 @item char attributes[];
1263 The attributes, in a text-based format.
1264
1265 In record type 17, this field contains a single attribute set.  An
1266 attribute set is a sequence of one or more attributes concatenated
1267 together.  Each attribute consists of a name, which has the same
1268 syntax as a variable name, followed by, inside parentheses, a sequence
1269 of one or more values.  Each value consists of a string enclosed in
1270 single quotes (@code{'}) followed by a line feed (byte 0x0a).  A value
1271 may contain single quote characters, which are not themselves escaped
1272 or quoted or required to be present in pairs.  There is no apparent
1273 way to embed a line feed in a value.  There is no distinction between
1274 an attribute with a single value and an attribute array with one
1275 element.
1276
1277 In record type 18, this field contains a sequence of one or more
1278 variable attribute sets.  If more than one variable attribute set is
1279 present, each one after the first is delimited from the previous by
1280 @code{/}.  Each variable attribute set consists of a long
1281 variable name,
1282 followed by @code{:}, followed by an attribute set with the same
1283 syntax as on record type 17.
1284
1285 The total length is @code{count} bytes.
1286 @end table
1287
1288 @subheading Example
1289
1290 A system file produced with the following VARIABLE ATTRIBUTE commands
1291 in effect:
1292
1293 @example
1294 VARIABLE ATTRIBUTE VARIABLES=dummy ATTRIBUTE=fred[1]('23') fred[2]('34').
1295 VARIABLE ATTRIBUTE VARIABLES=dummy ATTRIBUTE=bert('123').
1296 @end example
1297
1298 @noindent
1299 will contain a variable attribute record with the following contents:
1300
1301 @example
1302 00000000  07 00 00 00 12 00 00 00  01 00 00 00 22 00 00 00  |............"...|
1303 00000010  64 75 6d 6d 79 3a 66 72  65 64 28 27 32 33 27 0a  |dummy:fred('23'.|
1304 00000020  27 33 34 27 0a 29 62 65  72 74 28 27 31 32 33 27  |'34'.)bert('123'|
1305 00000030  0a 29                                             |.)              |
1306 @end example
1307
1308 @menu
1309 * Variable Roles::
1310 @end menu
1311
1312 @node Variable Roles
1313 @subsection Variable Roles
1314
1315 A variable's role is represented as an attribute named @code{$@@Role}.
1316 This attribute has a single element whose values and their meanings
1317 are:
1318
1319 @table @code
1320 @item 0
1321 Input.  This, the default, is the most common role.
1322 @item 1
1323 Output.
1324 @item 2
1325 Both.
1326 @item 3
1327 None.
1328 @item 4
1329 Partition.
1330 @item 5
1331 Split.
1332 @end table
1333
1334 @node Extended Number of Cases Record
1335 @section Extended Number of Cases Record
1336
1337 The file header record expresses the number of cases in the system
1338 file as an int32 (@pxref{File Header Record}).  This record allows the
1339 number of cases in the system file to be expressed as a 64-bit number.
1340
1341 @example
1342 int32               rec_type;
1343 int32               subtype;
1344 int32               size;
1345 int32               count;
1346 int64               unknown;
1347 int64               ncases64;
1348 @end example
1349
1350 @table @code
1351 @item int32 rec_type;
1352 Record type.  Always set to 7.
1353
1354 @item int32 subtype;
1355 Record subtype.  Always set to 16.
1356
1357 @item int32 size;
1358 Size of each element.  Always set to 8.
1359
1360 @item int32 count;
1361 Number of pieces of data in the data part.  Alway set to 2.
1362
1363 @item int64 unknown;
1364 Meaning unknown.  Always set to 1.
1365
1366 @item int64 ncases64;
1367 Number of cases in the file as a 64-bit integer.  Presumably this
1368 could be -1 to indicate that the number of cases is unknown, for the
1369 same reason as @code{ncases} in the file header record, but this has
1370 not been observed in the wild.
1371 @end table
1372
1373 @node Miscellaneous Informational Records
1374 @section Miscellaneous Informational Records
1375
1376 Some specific types of miscellaneous informational records are
1377 documented here, but others are known to exist.  PSPP ignores unknown
1378 miscellaneous informational records when reading system files.
1379
1380 @example
1381 /* @r{Header.} */
1382 int32               rec_type;
1383 int32               subtype;
1384 int32               size;
1385 int32               count;
1386
1387 /* @r{Exactly @code{size * count} bytes of data.} */
1388 char                data[];
1389 @end example
1390
1391 @table @code
1392 @item int32 rec_type;
1393 Record type.  Always set to 7.
1394
1395 @item int32 subtype;
1396 Record subtype.  May take any value.  According to Aapi
1397 H@"am@"al@"ainen, value 5 indicates a set of grouped variables and 6
1398 indicates date info (probably related to USE).  Subtype 24 appears to
1399 contain XML that describes how data in the file should be displayed
1400 on-screen.
1401
1402 @item int32 size;
1403 Size of each piece of data in the data part.  Should have the value 1,
1404 4, or 8, for @code{char}, @code{int32}, and @code{flt64} format data,
1405 respectively.
1406
1407 @item int32 count;
1408 Number of pieces of data in the data part.
1409
1410 @item char data[];
1411 Arbitrary data.  There must be @code{size} times @code{count} bytes of
1412 data.
1413 @end table
1414
1415 @node Dictionary Termination Record
1416 @section Dictionary Termination Record
1417
1418 The dictionary termination record separates all other records from the
1419 data records.
1420
1421 @example
1422 int32               rec_type;
1423 int32               filler;
1424 @end example
1425
1426 @table @code
1427 @item int32 rec_type;
1428 Record type.  Always set to 999.
1429
1430 @item int32 filler;
1431 Ignored padding.  Should be set to 0.
1432 @end table
1433
1434 @node Data Record
1435 @section Data Record
1436
1437 Data records must follow all other records in the system file.  There must
1438 be at least one data record in every system file.
1439
1440 The format of data records varies depending on whether the data is
1441 compressed.  Regardless, the data is arranged in a series of 8-byte
1442 elements.
1443
1444 When data is not compressed,
1445 each element corresponds to
1446 the variable declared in the respective variable record (@pxref{Variable
1447 Record}).  Numeric values are given in @code{flt64} format; string
1448 values are literal characters string, padded on the right when
1449 necessary to fill out 8-byte units.
1450
1451 Compressed data is arranged in the following manner: the first 8 bytes
1452 in the data section is divided into a series of 1-byte command
1453 codes.  These codes have meanings as described below:
1454
1455 @table @asis
1456 @item 0
1457 Ignored.  If the program writing the system file accumulates compressed
1458 data in blocks of fixed length, 0 bytes can be used to pad out extra
1459 bytes remaining at the end of a fixed-size block.
1460
1461 @item 1 through 251
1462 A number with
1463 value @var{code} - @var{bias}, where
1464 @var{code} is the value of the compression code and @var{bias} is the
1465 variable @code{bias} from the file header.  For example,
1466 code 105 with bias 100.0 (the normal value) indicates a numeric variable
1467 of value 5.
1468 One file has been seen written by SPSS 14 that contained such a code
1469 in a @emph{string} field with the value 0 (after the bias is
1470 subtracted) as a way of encoding null bytes.
1471
1472 @item 252
1473 End of file.  This code may or may not appear at the end of the data
1474 stream.  PSPP always outputs this code but its use is not required.
1475
1476 @item 253
1477 A numeric or string value that is not
1478 compressible.  The value is stored in the 8 bytes following the
1479 current block of command bytes.  If this value appears twice in a block
1480 of command bytes, then it indicates the second group of 8 bytes following the
1481 command bytes, and so on.
1482
1483 @item 254
1484 An 8-byte string value that is all spaces.
1485
1486 @item 255
1487 The system-missing value.
1488 @end table
1489
1490 When the end of the an 8-byte group of command bytes is reached, any
1491 blocks of non-compressible values indicated by code 253 are skipped,
1492 and the next element of command bytes is read and interpreted, until
1493 the end of the file or a code with value 252 is reached.
1494 @setfilename ignored