doc/files.texi

   1 @node System and Portable File IO
   2 @chapter System and Portable File I/O
   3
   4 The commands in this chapter read, write, and examine system files and
   5 portable files.
   6
   7 @menu
   8 * APPLY DICTIONARY::            Apply system file dictionary to active dataset.
   9 * EXPORT::                      Write to a portable file.
  10 * GET::                         Read from a system file.
  11 * GET DATA::                    Read from foreign files.
  12 * IMPORT::                      Read from a portable file.
  13 * SAVE::                        Write to a system file.
  14 * SAVE TRANSLATE::              Write data in foreign file formats.
  15 * SYSFILE INFO::                Display system file dictionary.
  16 * XEXPORT::                     Write to a portable file, as a transformation.
  17 * XSAVE::                       Write to a system file, as a transformation.
  18 @end menu
  19
  20 @node APPLY DICTIONARY
  21 @section APPLY DICTIONARY
  22 @vindex APPLY DICTIONARY
  23
  24 @display
  25 APPLY DICTIONARY FROM=@{'file-name',file_handle@}.
  26 @end display
  27
  28 @cmd{APPLY DICTIONARY} applies the variable labels, value labels,
  29 and missing values taken from a file to corresponding
  30 variables in the active dataset.  In some cases it also updates the
  31 weighting variable.
  32
  33 Specify a system file or portable file's name, a data set name
  34 (@pxref{Datasets}), or a file handle name (@pxref{File Handles}).  The
  35 dictionary in the file will be read, but it will not replace the
  36 active dataset's dictionary.  The file's data will not be read.
  37
  38 Only variables with names that exist in both the active dataset and the
  39 system file are considered.  Variables with the same name but different
  40 types (numeric, string) will cause an error message.  Otherwise, the
  41 system file variables' attributes will replace those in their matching
  42 active dataset variables:
  43
  44 @itemize @bullet
  45 @item
  46 If a system file variable has a variable label, then it will replace
  47 the variable label of the active dataset variable.  If the system
  48 file variable does not have a variable label, then the active dataset
  49 variable's variable label, if any, will be retained.
  50
  51 @item
  52 If the system file variable has custom attributes (@pxref{VARIABLE
  53 ATTRIBUTE}), then those attributes replace the active dataset variable's
  54 custom attributes.  If the system file variable does not have custom
  55 attributes, then the active dataset variable's custom attributes, if any,
  56 will be retained.
  57
  58 @item
  59 If the active dataset variable is numeric or short string, then value
  60 labels and missing values, if any, will be copied to the active dataset
  61 variable.  If the system file variable does not have value labels or
  62 missing values, then those in the active dataset variable, if any, will not
  63 be disturbed.
  64 @end itemize
  65
  66 In addition to properties of variables, some properties of the active
  67 file dictionary as a whole are updated:
  68
  69 @itemize @bullet
  70 @item
  71 If the system file has custom attributes (@pxref{DATAFILE ATTRIBUTE}),
  72 then those attributes replace the active dataset variable's custom
  73 attributes.
  74
  75 @item
  76 If the active dataset has a weighting variable (@pxref{WEIGHT}), and the
  77 system file does not, or if the weighting variable in the system file
  78 does not exist in the active dataset, then the active dataset weighting
  79 variable, if any, is retained.  Otherwise, the weighting variable in
  80 the system file becomes the active dataset weighting variable.
  81 @end itemize
  82
  83 @cmd{APPLY DICTIONARY} takes effect immediately.  It does not read the
  84 active dataset.  The system file is not modified.
  85
  86 @node EXPORT
  87 @section EXPORT
  88 @vindex EXPORT
  89
  90 @display
  91 EXPORT
  92         /OUTFILE='file-name'
  93         /UNSELECTED=@{RETAIN,DELETE@}
  94         /DIGITS=n
  95         /DROP=var_list
  96         /KEEP=var_list
  97         /RENAME=(src_names=target_names)@dots{}
  98         /TYPE=@{COMM,TAPE@}
  99         /MAP
 100 @end display
 101
 102 The @cmd{EXPORT} procedure writes the active dataset's dictionary and
 103 data to a specified portable file.
 104
 105 By default, cases excluded with FILTER are written to the
 106 file.  These can be excluded by specifying DELETE on the UNSELECTED
 107 subcommand.  Specifying RETAIN makes the default explicit.
 108
 109 Portable files express real numbers in base 30.  Integers are always
 110 expressed to the maximum precision needed to make them exact.
 111 Non-integers are, by default, expressed to the machine's maximum
 112 natural precision (approximately 15 decimal digits on many machines).
 113 If many numbers require this many digits, the portable file may
 114 significantly increase in size.  As an alternative, the DIGITS
 115 subcommand may be used to specify the number of decimal digits of
 116 precision to write.  DIGITS applies only to non-integers.
 117
 118 The OUTFILE subcommand, which is the only required subcommand, specifies
 119 the portable file to be written as a file name string or
 120 a file handle (@pxref{File Handles}).
 121
 122 DROP, KEEP, and RENAME follow the same format as the SAVE procedure
 123 (@pxref{SAVE}).
 124
 125 The TYPE subcommand specifies the character set for use in the
 126 portable file.  Its value is currently not used.
 127
 128 The MAP subcommand is currently ignored.
 129
 130 @cmd{EXPORT} is a procedure.  It causes the active dataset to be read.
 131
 132 @node GET
 133 @section GET
 134 @vindex GET
 135
 136 @display
 137 GET
 138         /FILE=@{'file-name',file_handle@}
 139         /DROP=var_list
 140         /KEEP=var_list
 141         /RENAME=(src_names=target_names)@dots{}
 142 @end display
 143
 144 @cmd{GET} clears the current dictionary and active dataset and
 145 replaces them with the dictionary and data from a specified file.
 146
 147 The FILE subcommand is the only required subcommand.  Specify the system
 148 file or portable file to be read as a string file name or
 149 a file handle (@pxref{File Handles}).
 150
 151 By default, all the variables in a file are read.  The DROP
 152 subcommand can be used to specify a list of variables that are not to be
 153 read.  By contrast, the KEEP subcommand can be used to specify variable
 154 that are to be read, with all other variables not read.
 155
 156 Normally variables in a file retain the names that they were
 157 saved under.  Use the RENAME subcommand to change these names.  Specify,
 158 within parentheses, a list of variable names followed by an equals sign
 159 (@samp{=}) and the names that they should be renamed to.  Multiple
 160 parenthesized groups of variable names can be included on a single
 161 RENAME subcommand.  Variables' names may be swapped using a RENAME
 162 subcommand of the form @samp{/RENAME=(A B=B A)}.
 163
 164 Alternate syntax for the RENAME subcommand allows the parentheses to be
 165 eliminated.  When this is done, only a single variable may be renamed at
 166 once.  For instance, @samp{/RENAME=A=B}.  This alternate syntax is
 167 deprecated.
 168
 169 DROP, KEEP, and RENAME are executed in left-to-right order.
 170 Each may be present any number of times.  @cmd{GET} never modifies a
 171 file on disk.  Only the active dataset read from the file
 172 is affected by these subcommands.
 173
 174 @cmd{GET} does not cause the data to be read, only the dictionary.  The data
 175 is read later, when a procedure is executed.
 176
 177 Use of @cmd{GET} to read a portable file is a PSPP extension.
 178
 179 @node GET DATA
 180 @section GET DATA
 181 @vindex GET DATA
 182
 183 @display
 184 GET DATA
 185         /TYPE=@{GNM,PSQL,TXT@}
 186         @dots{}additional subcommands depending on TYPE@dots{}
 187 @end display
 188
 189 The @cmd{GET DATA} command is used to read files and other data
 190 sources created by other applications.  When this command is executed,
 191 the current dictionary and active dataset are replaced with variables
 192 and data read from the specified source.
 193
 194 The TYPE subcommand is mandatory and must be the first subcommand
 195 specified.  It determines the type of the file or source to read.
 196 PSPP currently supports the following file types:
 197
 198 @table @asis
 199 @item GNM
 200 Spreadsheet files created by Gnumeric (@url{http://gnumeric.org}).
 201
 202 @item PSQL
 203 Relations from PostgreSQL databases (@url{http://postgresql.org}).
 204
 205 @item TXT
 206 Textual data files in columnar and delimited formats.
 207 @end table
 208
 209 Each supported file type has additional subcommands, explained in
 210 separate sections below.
 211
 212 @menu
 213 * GET DATA /TYPE=GNM::
 214 * GET DATA /TYPE=PSQL::
 215 * GET DATA /TYPE=TXT::
 216 @end menu
 217
 218 @node GET DATA /TYPE=GNM
 219 @subsection Gnumeric Spreadsheet Files
 220
 221 @display
 222 GET DATA /TYPE=GNM
 223         /FILE=@{'file-name'@}
 224         /SHEET=@{NAME 'sheet-name', INDEX n@}
 225         /CELLRANGE=@{RANGE 'range', FULL@}
 226         /READNAMES=@{ON, OFF@}
 227         /ASSUMEDVARWIDTH=n.
 228 @end display
 229
 230 @cindex Gnumeric
 231 @cindex spreadsheet files
 232 To use GET DATA to read a spreadsheet file created by Gnumeric
 233 (@url{http://gnumeric.org}), specify TYPE=GNM to indicate the file's
 234 format and use FILE to indicate the Gnumeric file to be read.  All
 235 other subcommands are optional.
 236
 237 The format of each variable is determined by the format of the spreadsheet
 238 cell containing the first datum for the variable.
 239 If this cell is of string (text) format, then the width of the variable is
 240 determined from the length of the string it contains, unless the
 241 ASSUMEDVARWIDTH subcommand is given.
 242
 243
 244 The FILE subcommand is mandatory. Specify the name of the file
 245 to be read.
 246
 247 The SHEET subcommand specifies the sheet within the spreadsheet file to read.
 248 There are two forms of the SHEET subcommand.
 249 In the first form,
 250 @samp{/SHEET=name @var{sheet-name}}, the string @var{sheet-name} is the
 251 name of the sheet to read.
 252 In the second form, @samp{/SHEET=index @var{idx}}, @var{idx} is a
 253 integer which is the index of the sheet to read.
 254 The first sheet has the index 1.
 255 If the SHEET subcommand is omitted, then the command will read the
 256 first sheet in the file.
 257
 258 The CELLRANGE subcommand specifies the range of cells within the sheet to read.
 259 If the subcommand is given as @samp{/CELLRANGE=FULL}, then the entire
 260 sheet  is read.
 261 To read only part of a sheet, use the form
 262 @samp{/CELLRANGE=range '@var{top-left-cell}:@var{bottom-right-cell}'}.
 263 For example, the subcommand @samp{/CELLRANGE=range 'C3:P19'} reads
 264 columns C--P, and rows 3--19 inclusive.
 265 If no CELLRANGE subcommand is given, then the entire sheet is read.
 266
 267 If @samp{/READNAMES=ON} is specified, then the contents of cells of
 268 the first row are used as the names of the variables in which to store
 269 the data from subsequent rows.
 270 If the READNAMES command is omitted, or if @samp{/READNAMES=OFF} is
 271 used, then the variables  receive automatically assigned names.
 272
 273 The ASSUMEDVARWIDTH subcommand specifies the maximum width of string
 274 variables read  from the file.
 275 If omitted, the default value is determined from the length of the
 276 string in the first spreadsheet cell for each variable.
 277
 278
 279 @node GET DATA /TYPE=PSQL
 280 @subsection Postgres Database Queries
 281
 282 @display
 283 GET DATA /TYPE=PSQL
 284          /CONNECT=@{connection info@}
 285          /SQL=@{query@}
 286          [/ASSUMEDVARWIDTH=n]
 287          [/UNENCRYPTED]
 288          [/BSIZE=n].
 289 @end display
 290
 291 @cindex postgres
 292 @cindex databases
 293
 294 The PSQL type is used to import data from a postgres database server.
 295 The server may be located locally or remotely.
 296 Variables are automatically created based on the table column names
 297 or the names specified in the SQL query.
 298 Postgres data types of high precision, will loose precision when
 299 imported into PSPP.
 300 Not all the postgres data types are able to be represented in PSPP.
 301 If a datum cannot be represented a warning will be issued and that
 302 datum will be set to SYSMIS.
 303
 304 The CONNECT subcommand is mandatory.
 305 It is a string specifying the parameters of the database server from
 306 which the data should be fetched.
 307 The format of the string is given in the postgres manual
 308 @url{http://www.postgresql.org/docs/8.0/static/libpq.html#LIBPQ-CONNECT}.
 309
 310 The SQL subcommand is mandatory.
 311 It must be a valid SQL string to retrieve data from the database.
 312
 313 The ASSUMEDVARWIDTH subcommand specifies the maximum width of string
 314 variables read  from the database.
 315 If omitted, the default value is determined from the length of the
 316 string in the first value read for each variable.
 317
 318 The UNENCRYPTED subcommand allows data to be retrieved over an insecure
 319 connection.
 320 If the connection is not encrypted, and the UNENCRYPTED subcommand is not
 321 given, then an error will occur.
 322 Whether or not the connection is
 323 encrypted depends upon the underlying psql library and the
 324 capabilities of the database server.
 325
 326 The BSIZE subcommand serves only to optimise the speed of data transfer.
 327 It specifies an upper limit on
 328 number of cases to fetch from the database at once.
 329 The default value is 4096.
 330 If your SQL statement fetches a large number of cases but only a small number of
 331 variables, then the data transfer may be faster if you increase this value.
 332 Conversely, if the number of variables is large, or if the machine on which
 333 PSPP is running has only a
 334 small amount of memory, then a smaller value will be better.
 335
 336
 337 The following syntax is an example:
 338 @example
 339 GET DATA /TYPE=PSQL
 340      /CONNECT='host=example.com port=5432 dbname=product user=fred passwd=xxxx'
 341      /SQL='select * from manufacturer'.
 342 @end example
 343
 344
 345 @node GET DATA /TYPE=TXT
 346 @subsection Textual Data Files
 347
 348 @display
 349 GET DATA /TYPE=TXT
 350         /FILE=@{'file-name',file_handle@}
 351         [/ARRANGEMENT=@{DELIMITED,FIXED@}]
 352         [/FIRSTCASE=@{first_case@}]
 353         [/IMPORTCASE=@{ALL,FIRST max_cases,PERCENT percent@}]
 354         @dots{}additional subcommands depending on ARRANGEMENT@dots{}
 355 @end display
 356
 357 @cindex text files
 358 @cindex data files
 359 When TYPE=TXT is specified, GET DATA reads data in a delimited or
 360 fixed columnar format, much like DATA LIST (@pxref{DATA LIST}).
 361
 362 The FILE subcommand is mandatory.  Specify the file to be read as
 363 a string file name or (for textual data
 364 only) a file handle (@pxref{File Handles}).
 365
 366 The ARRANGEMENT subcommand determines the file's basic format.
 367 DELIMITED, the default setting, specifies that fields in the input
 368 data are separated by spaces, tabs, or other user-specified
 369 delimiters.  FIXED specifies that fields in the input data appear at
 370 particular fixed column positions within records of a case.
 371
 372 By default, cases are read from the input file starting from the first
 373 line.  To skip lines at the beginning of an input file, set FIRSTCASE
 374 to the number of the first line to read: 2 to skip the first line, 3
 375 to skip the first two lines, and so on.
 376
 377 IMPORTCASE can be used to limit the number of cases read from the
 378 input file.  With the default setting, ALL, all cases in the file are
 379 read.  Specify FIRST @i{max_cases} to read at most @i{max_cases} cases
 380 from the file.  Use PERCENT @i{percent} to read only @i{percent}
 381 percent, approximately, of the cases contained in the file.  (The
 382 percentage is approximate, because there is no way to accurately count
 383 the number of cases in the file without reading the entire file.  The
 384 number of cases in some kinds of unusual files cannot be estimated;
 385 PSPP will read all cases in such files.)
 386
 387 FIRSTCASE and IMPORTCASE may be used with delimited and fixed-format
 388 data.  The remaining subcommands, which apply only to one of the two  file
 389 arrangements, are described below.
 390
 391 @menu
 392 * GET DATA /TYPE=TXT /ARRANGEMENT=DELIMITED::
 393 * GET DATA /TYPE=TXT /ARRANGEMENT=FIXED::
 394 @end menu
 395
 396 @node GET DATA /TYPE=TXT /ARRANGEMENT=DELIMITED
 397 @subsubsection Reading Delimited Data
 398
 399 @display
 400 GET DATA /TYPE=TXT
 401         /FILE=@{'file-name',file_handle@}
 402         [/ARRANGEMENT=@{DELIMITED,FIXED@}]
 403         [/FIRSTCASE=@{first_case@}]
 404         [/IMPORTCASE=@{ALL,FIRST max_cases,PERCENT percent@}]
 405
 406         /DELIMITERS="delimiters"
 407         [/QUALIFIER="quotes" [/ESCAPE]]
 408         [/DELCASE=@{LINE,VARIABLES n_variables@}]
 409         /VARIABLES=del_var [del_var]@dots{}
 410 where each del_var takes the form:
 411         variable format
 412 @end display
 413
 414 The GET DATA command with TYPE=TXT and ARRANGEMENT=DELIMITED reads
 415 input data from text files in delimited format, where fields are
 416 separated by a set of user-specified delimiters.  Its capabilities are
 417 similar to those of DATA LIST FREE (@pxref{DATA LIST FREE}), with a
 418 few enhancements.
 419
 420 The required FILE subcommand and optional FIRSTCASE and IMPORTCASE
 421 subcommands are described above (@pxref{GET DATA /TYPE=TXT}).
 422
 423 DELIMITERS, which is required, specifies the set of characters that
 424 may separate fields.  Each character in the string specified on
 425 DELIMITERS separates one field from the next.  The end of a line also
 426 separates fields, regardless of DELIMITERS.  Two consecutive
 427 delimiters in the input yield an empty field, as does a delimiter at
 428 the end of a line.  A space character as a delimiter is an exception:
 429 consecutive spaces do not yield an empty field and neither does any
 430 number of spaces at the end of a line.
 431
 432 To use a tab as a delimiter, specify @samp{\t} at the beginning of the
 433 DELIMITERS string.  To use a backslash as a delimiter, specify
 434 @samp{\\} as the first delimiter or, if a tab should also be a
 435 delimiter, immediately following @samp{\t}.  To read a data file in
 436 which each field appears on a separate line, specify the empty string
 437 for DELIMITERS.
 438
 439 The optional QUALIFIER subcommand names one or more characters that
 440 can be used to quote values within fields in the input.  A field that
 441 begins with one of the specified quote characters ends at the next
 442 matching quote.  Intervening delimiters become part of the field,
 443 instead of terminating it.  The ability to specify more than one quote
 444 character is a PSPP extension.
 445
 446 By default, a character specified on QUALIFIER cannot itself be
 447 embedded within a field that it quotes, because the quote character
 448 always terminates the quoted field.  With ESCAPE, however, a doubled
 449 quote character within a quoted field inserts a single instance of the
 450 quote into the field.  For example, if @samp{'} is specified on
 451 QUALIFIER, then without ESCAPE @code{'a''b'} specifies a pair of
 452 fields that contain @samp{a} and @samp{b}, but with ESCAPE it
 453 specifies a single field that contains @samp{a'b}.  ESCAPE is a PSPP
 454 extension.
 455
 456 The DELCASE subcommand controls how data may be broken across lines in
 457 the data file.  With LINE, the default setting, each line must contain
 458 all the data for exactly one case.  For additional flexibility, to
 459 allow a single case to be split among lines or multiple cases to be
 460 contained on a single line, specify VARIABLES @i{n_variables}, where
 461 @i{n_variables} is the number of variables per case.
 462
 463 The VARIABLES subcommand is required and must be the last subcommand.
 464 Specify the name of each variable and its input format (@pxref{Input
 465 and Output Formats}) in the order they should be read from the input
 466 file.
 467
 468 @subsubheading Examples
 469
 470 @noindent
 471 On a Unix-like system, the @samp{/etc/passwd} file has a format
 472 similar to this:
 473
 474 @example
 475 root:$1$nyeSP5gD$pDq/:0:0:,,,:/root:/bin/bash
 476 blp:$1$BrP/pFg4$g7OG:1000:1000:Ben Pfaff,,,:/home/blp:/bin/bash
 477 john:$1$JBuq/Fioq$g4A:1001:1001:John Darrington,,,:/home/john:/bin/bash
 478 jhs:$1$D3li4hPL$88X1:1002:1002:Jason Stover,,,:/home/jhs:/bin/csh
 479 @end example
 480
 481 @noindent
 482 The following syntax reads a file in the format used by
 483 @samp{/etc/passwd}:
 484
 485 @c If you change this example, change the regression test in
 486 @c tests/language/data-io/get-data.at to match.
 487 @example
 488 GET DATA /TYPE=TXT /FILE='/etc/passwd' /DELIMITERS=':'
 489         /VARIABLES=username A20
 490                    password A40
 491                    uid F10
 492                    gid F10
 493                    gecos A40
 494                    home A40
 495                    shell A40.
 496 @end example
 497
 498 @noindent
 499 Consider the following data on used cars:
 500
 501 @example
 502 model   year    mileage price   type    age
 503 Civic   2002    29883   15900   Si      2
 504 Civic   2003    13415   15900   EX      1
 505 Civic   1992    107000  3800    n/a     12
 506 Accord  2002    26613   17900   EX      1
 507 @end example
 508
 509 @noindent
 510 The following syntax can be used to read the used car data:
 511
 512 @c If you change this example, change the regression test in
 513 @c tests/language/data-io/get-data.at to match.
 514 @example
 515 GET DATA /TYPE=TXT /FILE='cars.data' /DELIMITERS=' ' /FIRSTCASE=2
 516         /VARIABLES=model A8
 517                    year F4
 518                    mileage F6
 519                    price F5
 520                    type A4
 521                    age F2.
 522 @end example
 523
 524 @noindent
 525 Consider the following information on animals in a pet store:
 526
 527 @example
 528 'Pet''s Name', "Age", "Color", "Date Received", "Price", "Height", "Type"
 529 , (Years), , , (Dollars), ,
 530 "Rover", 4.5, Brown, "12 Feb 2004", 80, '1''4"', "Dog"
 531 "Charlie", , Gold, "5 Apr 2007", 12.3, "3""", "Fish"
 532 "Molly", 2, Black, "12 Dec 2006", 25, '5"', "Cat"
 533 "Gilly", , White, "10 Apr 2007", 10, "3""", "Guinea Pig"
 534 @end example
 535
 536 @noindent
 537 The following syntax can be used to read the pet store data:
 538
 539 @c If you change this example, change the regression test in
 540 @c tests/language/data-io/get-data.at to match.
 541 @example
 542 GET DATA /TYPE=TXT /FILE='pets.data' /DELIMITERS=', ' /QUALIFIER='''"' /ESCAPE
 543         /FIRSTCASE=3
 544         /VARIABLES=name A10
 545                    age F3.1
 546                    color A5
 547                    received EDATE10
 548                    price F5.2
 549                    height a5
 550                    type a10.
 551 @end example
 552
 553 @node GET DATA /TYPE=TXT /ARRANGEMENT=FIXED
 554 @subsubsection Reading Fixed Columnar Data
 555
 556 @display
 557 GET DATA /TYPE=TXT
 558         /FILE=@{'file-name',file_handle@}
 559         [/ARRANGEMENT=@{DELIMITED,FIXED@}]
 560         [/FIRSTCASE=@{first_case@}]
 561         [/IMPORTCASE=@{ALL,FIRST max_cases,PERCENT percent@}]
 562
 563         [/FIXCASE=n]
 564         /VARIABLES fixed_var [fixed_var]@dots{}
 565             [/rec# fixed_var [fixed_var]@dots{}]@dots{}
 566 where each fixed_var takes the form:
 567         variable start-end format
 568 @end display
 569
 570 The GET DATA command with TYPE=TXT and ARRANGEMENT=FIXED reads input
 571 data from text files in fixed format, where each field is located in
 572 particular fixed column positions within records of a case.  Its
 573 capabilities are similar to those of DATA LIST FIXED (@pxref{DATA LIST
 574 FIXED}), with a few enhancements.
 575
 576 The required FILE subcommand and optional FIRSTCASE and IMPORTCASE
 577 subcommands are described above (@pxref{GET DATA /TYPE=TXT}).
 578
 579 The optional FIXCASE subcommand may be used to specify the positive
 580 integer number of input lines that make up each case.  The default
 581 value is 1.
 582
 583 The VARIABLES subcommand, which is required, specifies the positions
 584 at which each variable can be found.  For each variable, specify its
 585 name, followed by its start and end column separated by @samp{-}
 586 (e.g.@: @samp{0-9}), followed by the input format type (e.g.@:
 587 @samp{F}).  For this command, columns are numbered starting from 0 at
 588 the left column.  Introduce the variables in the second and later
 589 lines of a case by a slash followed by the number of the line within
 590 the case, e.g.@: @samp{/2} for the second line.
 591
 592 @subsubheading Examples
 593
 594 @noindent
 595 Consider the following data on used cars:
 596
 597 @example
 598 model   year    mileage price   type    age
 599 Civic   2002    29883   15900   Si      2
 600 Civic   2003    13415   15900   EX      1
 601 Civic   1992    107000  3800    n/a     12
 602 Accord  2002    26613   17900   EX      1
 603 @end example
 604
 605 @noindent
 606 The following syntax can be used to read the used car data:
 607
 608 @c If you change this example, change the regression test in
 609 @c tests/language/data-io/get-data.at to match.
 610 @example
 611 GET DATA /TYPE=TXT /FILE='cars.data' /ARRANGEMENT=FIXED /FIRSTCASE=2
 612         /VARIABLES=model 0-7 A
 613                    year 8-15 F
 614                    mileage 16-23 F
 615                    price 24-31 F
 616                    type 32-40 A
 617                    age 40-47 F.
 618 @end example
 619
 620 @node IMPORT
 621 @section IMPORT
 622 @vindex IMPORT
 623
 624 @display
 625 IMPORT
 626         /FILE='file-name'
 627         /TYPE=@{COMM,TAPE@}
 628         /DROP=var_list
 629         /KEEP=var_list
 630         /RENAME=(src_names=target_names)@dots{}
 631 @end display
 632
 633 The @cmd{IMPORT} transformation clears the active dataset dictionary and
 634 data and
 635 replaces them with a dictionary and data from a system file or
 636 portable file.
 637
 638 The FILE subcommand, which is the only required subcommand, specifies
 639 the portable file to be read as a file name string or a file handle
 640 (@pxref{File Handles}).
 641
 642 The TYPE subcommand is currently not used.
 643
 644 DROP, KEEP, and RENAME follow the syntax used by @cmd{GET} (@pxref{GET}).
 645
 646 @cmd{IMPORT} does not cause the data to be read, only the dictionary.  The
 647 data is read later, when a procedure is executed.
 648
 649 Use of @cmd{IMPORT} to read a system file is a PSPP extension.
 650
 651 @node SAVE
 652 @section SAVE
 653 @vindex SAVE
 654
 655 @display
 656 SAVE
 657         /OUTFILE=@{'file-name',file_handle@}
 658         /UNSELECTED=@{RETAIN,DELETE@}
 659         /@{COMPRESSED,UNCOMPRESSED@}
 660         /PERMISSIONS=@{WRITEABLE,READONLY@}
 661         /DROP=var_list
 662         /KEEP=var_list
 663         /VERSION=version
 664         /RENAME=(src_names=target_names)@dots{}
 665         /NAMES
 666         /MAP
 667 @end display
 668
 669 The @cmd{SAVE} procedure causes the dictionary and data in the active
 670 dataset to
 671 be written to a system file.
 672
 673 OUTFILE is the only required subcommand.  Specify the system file
 674 to be written as a string file name or a file handle
 675 (@pxref{File Handles}).
 676
 677 By default, cases excluded with FILTER are written to the system file.
 678 These can be excluded by specifying DELETE on the UNSELECTED
 679 subcommand.  Specifying RETAIN makes the default explicit.
 680
 681 The COMPRESS and UNCOMPRESS subcommand determine whether the saved
 682 system file is compressed.  By default, system files are compressed.
 683 This default can be changed with the SET command (@pxref{SET}).
 684
 685 The PERMISSIONS subcommand specifies permissions for the new system
 686 file.  WRITEABLE, the default, creates the file with read and write
 687 permission.  READONLY creates the file for read-only access.
 688
 689 By default, all the variables in the active dataset dictionary are written
 690 to the system file.  The DROP subcommand can be used to specify a list
 691 of variables not to be written.  In contrast, KEEP specifies variables
 692 to be written, with all variables not specified not written.
 693
 694 Normally variables are saved to a system file under the same names they
 695 have in the active dataset.  Use the RENAME subcommand to change these names.
 696 Specify, within parentheses, a list of variable names followed by an
 697 equals sign (@samp{=}) and the names that they should be renamed to.
 698 Multiple parenthesized groups of variable names can be included on a
 699 single RENAME subcommand.  Variables' names may be swapped using a
 700 RENAME subcommand of the form @samp{/RENAME=(A B=B A)}.
 701
 702 Alternate syntax for the RENAME subcommand allows the parentheses to be
 703 eliminated.  When this is done, only a single variable may be renamed at
 704 once.  For instance, @samp{/RENAME=A=B}.  This alternate syntax is
 705 deprecated.
 706
 707 DROP, KEEP, and RENAME are performed in left-to-right order.  They
 708 each may be present any number of times.  @cmd{SAVE} never modifies
 709 the active dataset.  DROP, KEEP, and RENAME only affect the system file
 710 written to disk.
 711
 712 The VERSION subcommand specifies the version of the file format. Valid
 713 versions are 2 and 3.  The default version is 3.  In version 2 system
 714 files, variable names longer than 8 bytes will be truncated.  The two
 715 versions are otherwise identical.
 716
 717 The NAMES and MAP subcommands are currently ignored.
 718
 719 @cmd{SAVE} causes the data to be read.  It is a procedure.
 720
 721 @node SAVE TRANSLATE
 722 @section SAVE TRANSLATE
 723 @vindex SAVE TRANSLATE
 724
 725 @display
 726 SAVE TRANSLATE
 727         /OUTFILE=@{'file-name',file_handle@}
 728         /TYPE=@{CSV,TAB@}
 729         [/REPLACE]
 730         [/MISSING=@{IGNORE,RECODE@}]
 731
 732         [/DROP=var_list]
 733         [/KEEP=var_list]
 734         [/RENAME=(src_names=target_names)@dots{}]
 735         [/UNSELECTED=@{RETAIN,DELETE@}]
 736         [/MAP]
 737
 738         @dots{}additional subcommands depending on TYPE@dots{}
 739 @end display
 740
 741 The @cmd{SAVE TRANSLATE} command is used to save data into various
 742 formats understood by other applications.
 743
 744 The OUTFILE and TYPE subcommands are mandatory.  OUTFILE specifies the
 745 file to be written, as a string file name or a file handle
 746 (@pxref{File Handles}).  TYPE determines the type of the file or
 747 source to read.  It must be one of the following:
 748
 749 @table @asis
 750 @item CSV
 751 Comma-separated value format,
 752
 753 @item TAB
 754 Tab-delimited format.
 755 @end table
 756
 757 By default, SAVE TRANSLATE will not overwrite an existing file.  Use
 758 REPLACE to force an existing file to be overwritten.
 759
 760 With MISSING=IGNORE, the default, SAVE TRANSLATE treats user-missing
 761 values as if they were not missing.  Specify MISSING=RECODE to output
 762 numeric user-missing values like system-missing values and string
 763 user-missing values as all spaces.
 764
 765 By default, all the variables in the active dataset dictionary are saved
 766 to the system file, but DROP or KEEP can select a subset of variable
 767 to save.  The RENAME subcommand can also be used to change the names
 768 under which variables are saved.  UNSELECTED determines whether cases
 769 filtered out by the FILTER command are written to the output file.
 770 These subcommands have the same syntax and meaning as on the
 771 @cmd{SAVE} command (@pxref{SAVE}).
 772
 773 Each supported file type has additional subcommands, explained in
 774 separate sections below.
 775
 776 @cmd{SAVE TRANSLATE} causes the data to be read.  It is a procedure.
 777
 778 @menu
 779 * SAVE TRANSLATE /TYPE=CSV and TYPE=TAB::
 780 @end menu
 781
 782 @node SAVE TRANSLATE /TYPE=CSV and TYPE=TAB
 783 @subsection Writing Comma- and Tab-Separated Data Files
 784
 785 @display
 786 SAVE TRANSLATE
 787         /OUTFILE=@{'file-name',file_handle@}
 788         /TYPE=CSV
 789         [/REPLACE]
 790         [/MISSING=@{IGNORE,RECODE@}]
 791
 792         [/DROP=var_list]
 793         [/KEEP=var_list]
 794         [/RENAME=(src_names=target_names)@dots{}]
 795         [/UNSELECTED=@{RETAIN,DELETE@}]
 796
 797         [/FIELDNAMES]
 798         [/CELLS=@{VALUES,LABELS@}]
 799         [/TEXTOPTIONS DELIMITER='delimiter']
 800         [/TEXTOPTIONS QUALIFIER='qualifier']
 801         [/TEXTOPTIONS DECIMAL=@{DOT,COMMA@}]
 802         [/TEXTOPTIONS FORMAT=@{PLAIN,VARIABLE@}]
 803 @end display
 804
 805 The SAVE TRANSLATE command with TYPE=CSV or TYPE=TAB writes data in a
 806 comma- or tab-separated value format similar to that described by
 807 RFC@tie{}4180.  Each variable becomes one output column, and each case
 808 becomes one line of output.  If FIELDNAMES is specified, an additional
 809 line at the top of the output file lists variable names.
 810
 811 The CELLS and TEXTOPTIONS FORMAT settings determine how values are
 812 written to the output file:
 813
 814 @table @asis
 815 @item CELLS=VALUES FORMAT=PLAIN (the default settings)
 816 Writes variables to the output in ``plain'' formats that ignore the
 817 details of variable formats.  Numeric values are written as plain
 818 decimal numbers with enough digits to indicate their exact values in
 819 machine representation.  Numeric values include @samp{e} followed by
 820 an exponent if the exponent value would be less than -4 or greater
 821 than 16.  Dates are written in MM/DD/YYYY format and times in HH:MM:SS
 822 format.  WKDAY and MONTH values are written as decimal numbers.
 823
 824 Numeric values use, by default, the decimal point character set with
 825 SET DECIMAL (@pxref{SET DECIMAL}).  Use DECIMAL=DOT or DECIMAL=COMMA
 826 to force a particular decimal point character.
 827
 828 @item CELLS=VALUES FORMAT=VARIABLE
 829 Writes variables using their print formats.  Leading and trailing
 830 spaces are removed from numeric values, and trailing spaces are
 831 removed from string values.
 832
 833 @item CELLS=LABEL FORMAT=PLAIN
 834 @itemx CELLS=LABEL FORMAT=VARIABLE
 835 Writes value labels where they exist, and otherwise writes the values
 836 themselves as described above.
 837 @end table
 838
 839 Regardless of CELLS and TEXTOPTIONS FORMAT, numeric system-missing
 840 values are output as a single space.
 841
 842 For TYPE=TAB, tab characters delimit values.  For TYPE=CSV, the
 843 TEXTOPTIONS DELIMITER and DECIMAL settings determine the character
 844 that separate values within a line.  If DELIMITER is specified, then
 845 the specified string separate values.  If DELIMITER is not specified,
 846 then the default is a comma with DECIMAL=DOT or a semicolon with
 847 DECIMAL=COMMA.  If DECIMAL is not given either, it is implied by the
 848 decimal point character set with SET DECIMAL (@pxref{SET DECIMAL}).
 849
 850 The TEXTOPTIONS QUALIFIER setting specifies a character that is output
 851 before and after a value that contains the delimiter character or the
 852 qualifier character.  The default is a double quote (@samp{@@}).  A
 853 qualifier character that appears within a value is doubled.
 854
 855 @node SYSFILE INFO
 856 @section SYSFILE INFO
 857 @vindex SYSFILE INFO
 858
 859 @display
 860 SYSFILE INFO FILE='file-name'.
 861 @end display
 862
 863 @cmd{SYSFILE INFO} reads the dictionary in a system file and
 864 displays the information in its dictionary.
 865
 866 Specify a file name or file handle.  @cmd{SYSFILE INFO} reads that file as
 867 a system file and displays information on its dictionary.
 868
 869 @cmd{SYSFILE INFO} does not affect the current active dataset.
 870
 871 @node XEXPORT
 872 @section XEXPORT
 873 @vindex XEXPORT
 874
 875 @display
 876 XEXPORT
 877         /OUTFILE='file-name'
 878         /DIGITS=n
 879         /DROP=var_list
 880         /KEEP=var_list
 881         /RENAME=(src_names=target_names)@dots{}
 882         /TYPE=@{COMM,TAPE@}
 883         /MAP
 884 @end display
 885
 886 The @cmd{EXPORT} transformation writes the active dataset dictionary and
 887 data to a specified portable file.
 888
 889 This transformation is a PSPP extension.
 890
 891 It is similar to the @cmd{EXPORT} procedure, with two differences:
 892
 893 @itemize
 894 @item
 895 @cmd{XEXPORT} is a transformation, not a procedure.  It is executed when
 896 the data is read by a procedure or procedure-like command.
 897
 898 @item
 899 @cmd{XEXPORT} does not support the UNSELECTED subcommand.
 900 @end itemize
 901
 902 @xref{EXPORT}, for more information.
 903
 904 @node XSAVE
 905 @section XSAVE
 906 @vindex XSAVE
 907
 908 @display
 909 XSAVE
 910         /OUTFILE='file-name'
 911         /@{COMPRESSED,UNCOMPRESSED@}
 912         /PERMISSIONS=@{WRITEABLE,READONLY@}
 913         /DROP=var_list
 914         /KEEP=var_list
 915         /VERSION=version
 916         /RENAME=(src_names=target_names)@dots{}
 917         /NAMES
 918         /MAP
 919 @end display
 920
 921 The @cmd{XSAVE} transformation writes the active dataset's dictionary and
 922 data to a system file.  It is similar to the @cmd{SAVE}
 923 procedure, with two differences:
 924
 925 @itemize
 926 @item
 927 @cmd{XSAVE} is a transformation, not a procedure.  It is executed when
 928 the data is read by a procedure or procedure-like command.
 929
 930 @item
 931 @cmd{XSAVE} does not support the UNSELECTED subcommand.
 932 @end itemize
 933
 934 @xref{SAVE}, for more information.