pintos-os.org Git - pspp/blob - doc/files.texi

   1 @node System and Portable File IO
   2 @chapter System and Portable File I/O
   3
   4 The commands in this chapter read, write, and examine system files and
   5 portable files.
   6
   7 @menu
   8 * APPLY DICTIONARY::            Apply system file dictionary to active dataset.
   9 * EXPORT::                      Write to a portable file.
  10 * GET::                         Read from a system file.
  11 * GET DATA::                    Read from foreign files.
  12 * IMPORT::                      Read from a portable file.
  13 * SAVE::                        Write to a system file.
  14 * SAVE TRANSLATE::              Write data in foreign file formats.
  15 * SYSFILE INFO::                Display system file dictionary.
  16 * XEXPORT::                     Write to a portable file, as a transformation.
  17 * XSAVE::                       Write to a system file, as a transformation.
  18 @end menu
  19
  20 @node APPLY DICTIONARY
  21 @section APPLY DICTIONARY
  22 @vindex APPLY DICTIONARY
  23
  24 @display
  25 APPLY DICTIONARY FROM=@{'file-name',file_handle@}.
  26 @end display
  27
  28 @cmd{APPLY DICTIONARY} applies the variable labels, value labels,
  29 and missing values taken from a file to corresponding
  30 variables in the active dataset.  In some cases it also updates the
  31 weighting variable.
  32
  33 Specify a system file or portable file's name, a data set name
  34 (@pxref{Datasets}), or a file handle name (@pxref{File Handles}).  The
  35 dictionary in the file will be read, but it will not replace the
  36 active dataset's dictionary.  The file's data will not be read.
  37
  38 Only variables with names that exist in both the active dataset and the
  39 system file are considered.  Variables with the same name but different
  40 types (numeric, string) will cause an error message.  Otherwise, the
  41 system file variables' attributes will replace those in their matching
  42 active dataset variables:
  43
  44 @itemize @bullet
  45 @item
  46 If a system file variable has a variable label, then it will replace
  47 the variable label of the active dataset variable.  If the system
  48 file variable does not have a variable label, then the active dataset
  49 variable's variable label, if any, will be retained.
  50
  51 @item
  52 If the system file variable has custom attributes (@pxref{VARIABLE
  53 ATTRIBUTE}), then those attributes replace the active dataset variable's
  54 custom attributes.  If the system file variable does not have custom
  55 attributes, then the active dataset variable's custom attributes, if any,
  56 will be retained.
  57
  58 @item
  59 If the active dataset variable is numeric or short string, then value
  60 labels and missing values, if any, will be copied to the active dataset
  61 variable.  If the system file variable does not have value labels or
  62 missing values, then those in the active dataset variable, if any, will not
  63 be disturbed.
  64 @end itemize
  65
  66 In addition to properties of variables, some properties of the active
  67 file dictionary as a whole are updated:
  68
  69 @itemize @bullet
  70 @item
  71 If the system file has custom attributes (@pxref{DATAFILE ATTRIBUTE}),
  72 then those attributes replace the active dataset variable's custom
  73 attributes.
  74
  75 @item
  76 If the active dataset has a weighting variable (@pxref{WEIGHT}), and the
  77 system file does not, or if the weighting variable in the system file
  78 does not exist in the active dataset, then the active dataset weighting
  79 variable, if any, is retained.  Otherwise, the weighting variable in
  80 the system file becomes the active dataset weighting variable.
  81 @end itemize
  82
  83 @cmd{APPLY DICTIONARY} takes effect immediately.  It does not read the
  84 active dataset.  The system file is not modified.
  85
  86 @node EXPORT
  87 @section EXPORT
  88 @vindex EXPORT
  89
  90 @display
  91 EXPORT
  92         /OUTFILE='file-name'
  93         /UNSELECTED=@{RETAIN,DELETE@}
  94         /DIGITS=n
  95         /DROP=var_list
  96         /KEEP=var_list
  97         /RENAME=(src_names=target_names)@dots{}
  98         /TYPE=@{COMM,TAPE@}
  99         /MAP
 100 @end display
 101
 102 The @cmd{EXPORT} procedure writes the active dataset's dictionary and
 103 data to a specified portable file.
 104
 105 By default, cases excluded with FILTER are written to the
 106 file.  These can be excluded by specifying DELETE on the UNSELECTED
 107 subcommand.  Specifying RETAIN makes the default explicit.
 108
 109 Portable files express real numbers in base 30.  Integers are always
 110 expressed to the maximum precision needed to make them exact.
 111 Non-integers are, by default, expressed to the machine's maximum
 112 natural precision (approximately 15 decimal digits on many machines).
 113 If many numbers require this many digits, the portable file may
 114 significantly increase in size.  As an alternative, the DIGITS
 115 subcommand may be used to specify the number of decimal digits of
 116 precision to write.  DIGITS applies only to non-integers.
 117
 118 The OUTFILE subcommand, which is the only required subcommand, specifies
 119 the portable file to be written as a file name string or
 120 a file handle (@pxref{File Handles}).
 121
 122 DROP, KEEP, and RENAME follow the same format as the SAVE procedure
 123 (@pxref{SAVE}).
 124
 125 The TYPE subcommand specifies the character set for use in the
 126 portable file.  Its value is currently not used.
 127
 128 The MAP subcommand is currently ignored.
 129
 130 @cmd{EXPORT} is a procedure.  It causes the active dataset to be read.
 131
 132 @node GET
 133 @section GET
 134 @vindex GET
 135
 136 @display
 137 GET
 138         /FILE=@{'file-name',file_handle@}
 139         /DROP=var_list
 140         /KEEP=var_list
 141         /RENAME=(src_names=target_names)@dots{}
 142         /ENCODING='encoding'
 143 @end display
 144
 145 @cmd{GET} clears the current dictionary and active dataset and
 146 replaces them with the dictionary and data from a specified file.
 147
 148 The FILE subcommand is the only required subcommand.  Specify the system
 149 file or portable file to be read as a string file name or
 150 a file handle (@pxref{File Handles}).
 151
 152 By default, all the variables in a file are read.  The DROP
 153 subcommand can be used to specify a list of variables that are not to be
 154 read.  By contrast, the KEEP subcommand can be used to specify variable
 155 that are to be read, with all other variables not read.
 156
 157 Normally variables in a file retain the names that they were
 158 saved under.  Use the RENAME subcommand to change these names.  Specify,
 159 within parentheses, a list of variable names followed by an equals sign
 160 (@samp{=}) and the names that they should be renamed to.  Multiple
 161 parenthesized groups of variable names can be included on a single
 162 RENAME subcommand.  Variables' names may be swapped using a RENAME
 163 subcommand of the form @samp{/RENAME=(A B=B A)}.
 164
 165 Alternate syntax for the RENAME subcommand allows the parentheses to be
 166 eliminated.  When this is done, only a single variable may be renamed at
 167 once.  For instance, @samp{/RENAME=A=B}.  This alternate syntax is
 168 deprecated.
 169
 170 DROP, KEEP, and RENAME are executed in left-to-right order.
 171 Each may be present any number of times.  @cmd{GET} never modifies a
 172 file on disk.  Only the active dataset read from the file
 173 is affected by these subcommands.
 174
 175 PSPP tries to automatically detect the encoding of string data in the
 176 file.  Sometimes, however, this does not work well encoding,
 177 especially for files written by old versions of SPSS or PSPP.  Specify
 178 the ENCODING subcommand with an IANA character set name as its string
 179 argument to override the default.  The ENCODING subcommand is a PSPP
 180 extension.
 181
 182 @cmd{GET} does not cause the data to be read, only the dictionary.  The data
 183 is read later, when a procedure is executed.
 184
 185 Use of @cmd{GET} to read a portable file is a PSPP extension.
 186
 187 @node GET DATA
 188 @section GET DATA
 189 @vindex GET DATA
 190
 191 @display
 192 GET DATA
 193         /TYPE=@{GNM,ODS,PSQL,TXT@}
 194         @dots{}additional subcommands depending on TYPE@dots{}
 195 @end display
 196
 197 The @cmd{GET DATA} command is used to read files and other data
 198 sources created by other applications.  When this command is executed,
 199 the current dictionary and active dataset are replaced with variables
 200 and data read from the specified source.
 201
 202 The TYPE subcommand is mandatory and must be the first subcommand
 203 specified.  It determines the type of the file or source to read.
 204 PSPP currently supports the following file types:
 205
 206 @table @asis
 207 @item GNM
 208 Spreadsheet files created by Gnumeric (@url{http://gnumeric.org}).
 209
 210 @item ODS
 211 Spreadsheet files in OpenDocument format.
 212
 213 @item PSQL
 214 Relations from PostgreSQL databases (@url{http://postgresql.org}).
 215
 216 @item TXT
 217 Textual data files in columnar and delimited formats.
 218 @end table
 219
 220 Each supported file type has additional subcommands, explained in
 221 separate sections below.
 222
 223 @menu
 224 * GET DATA /TYPE=GNM/ODS::     Spreadsheets
 225 * GET DATA /TYPE=PSQL::        Databases
 226 * GET DATA /TYPE=TXT::         Delimited Text Files
 227 @end menu
 228
 229 @node GET DATA /TYPE=GNM/ODS
 230 @subsection Spreadsheet Files
 231
 232 @display
 233 GET DATA /TYPE=@{GNM, ODS@}
 234         /FILE=@{'file-name'@}
 235         /SHEET=@{NAME 'sheet-name', INDEX n@}
 236         /CELLRANGE=@{RANGE 'range', FULL@}
 237         /READNAMES=@{ON, OFF@}
 238         /ASSUMEDVARWIDTH=n.
 239 @end display
 240
 241 @cindex Gnumeric
 242 @cindex OpenDocument
 243 @cindex spreadsheet files
 244
 245 Gnumeric spreadsheets (@url{http://gnumeric.org}), and spreadsheets
 246 in OpenDocument format
 247 (@url{http://libreplanet.org/wiki/Group:OpenDocument/Software})
 248 can be read using the GET DATA command.
 249 Use the TYPE subcommand to indicate the file's format.
 250 /TYPE=GNM indicates Gnumeric files,
 251 /TYPE=ODS indicates OpenDocument.
 252 The FILE subcommand is mandatory.
 253 Use it to specify the name file to be read.
 254 All other subcommands are optional.
 255
 256 The format of each variable is determined by the format of the spreadsheet
 257 cell containing the first datum for the variable.
 258 If this cell is of string (text) format, then the width of the variable is
 259 determined from the length of the string it contains, unless the
 260 ASSUMEDVARWIDTH subcommand is given.
 261
 262 The SHEET subcommand specifies the sheet within the spreadsheet file to read.
 263 There are two forms of the SHEET subcommand.
 264 In the first form,
 265 @samp{/SHEET=name @var{sheet-name}}, the string @var{sheet-name} is the
 266 name of the sheet to read.
 267 In the second form, @samp{/SHEET=index @var{idx}}, @var{idx} is a
 268 integer which is the index of the sheet to read.
 269 The first sheet has the index 1.
 270 If the SHEET subcommand is omitted, then the command will read the
 271 first sheet in the file.
 272
 273 The CELLRANGE subcommand specifies the range of cells within the sheet to read.
 274 If the subcommand is given as @samp{/CELLRANGE=FULL}, then the entire
 275 sheet  is read.
 276 To read only part of a sheet, use the form
 277 @samp{/CELLRANGE=range '@var{top-left-cell}:@var{bottom-right-cell}'}.
 278 For example, the subcommand @samp{/CELLRANGE=range 'C3:P19'} reads
 279 columns C--P, and rows 3--19 inclusive.
 280 If no CELLRANGE subcommand is given, then the entire sheet is read.
 281
 282 If @samp{/READNAMES=ON} is specified, then the contents of cells of
 283 the first row are used as the names of the variables in which to store
 284 the data from subsequent rows.  This is the default.
 285 If @samp{/READNAMES=OFF} is
 286 used, then the variables  receive automatically assigned names.
 287
 288 The ASSUMEDVARWIDTH subcommand specifies the maximum width of string
 289 variables read  from the file.
 290 If omitted, the default value is determined from the length of the
 291 string in the first spreadsheet cell for each variable.
 292
 293
 294 @node GET DATA /TYPE=PSQL
 295 @subsection Postgres Database Queries
 296
 297 @display
 298 GET DATA /TYPE=PSQL
 299          /CONNECT=@{connection info@}
 300          /SQL=@{query@}
 301          [/ASSUMEDVARWIDTH=n]
 302          [/UNENCRYPTED]
 303          [/BSIZE=n].
 304 @end display
 305
 306 @cindex postgres
 307 @cindex databases
 308
 309 The PSQL type is used to import data from a postgres database server.
 310 The server may be located locally or remotely.
 311 Variables are automatically created based on the table column names
 312 or the names specified in the SQL query.
 313 Postgres data types of high precision, will loose precision when
 314 imported into PSPP.
 315 Not all the postgres data types are able to be represented in PSPP.
 316 If a datum cannot be represented a warning will be issued and that
 317 datum will be set to SYSMIS.
 318
 319 The CONNECT subcommand is mandatory.
 320 It is a string specifying the parameters of the database server from
 321 which the data should be fetched.
 322 The format of the string is given in the postgres manual
 323 @url{http://www.postgresql.org/docs/8.0/static/libpq.html#LIBPQ-CONNECT}.
 324
 325 The SQL subcommand is mandatory.
 326 It must be a valid SQL string to retrieve data from the database.
 327
 328 The ASSUMEDVARWIDTH subcommand specifies the maximum width of string
 329 variables read  from the database.
 330 If omitted, the default value is determined from the length of the
 331 string in the first value read for each variable.
 332
 333 The UNENCRYPTED subcommand allows data to be retrieved over an insecure
 334 connection.
 335 If the connection is not encrypted, and the UNENCRYPTED subcommand is not
 336 given, then an error will occur.
 337 Whether or not the connection is
 338 encrypted depends upon the underlying psql library and the
 339 capabilities of the database server.
 340
 341 The BSIZE subcommand serves only to optimise the speed of data transfer.
 342 It specifies an upper limit on
 343 number of cases to fetch from the database at once.
 344 The default value is 4096.
 345 If your SQL statement fetches a large number of cases but only a small number of
 346 variables, then the data transfer may be faster if you increase this value.
 347 Conversely, if the number of variables is large, or if the machine on which
 348 PSPP is running has only a
 349 small amount of memory, then a smaller value will be better.
 350
 351
 352 The following syntax is an example:
 353 @example
 354 GET DATA /TYPE=PSQL
 355      /CONNECT='host=example.com port=5432 dbname=product user=fred passwd=xxxx'
 356      /SQL='select * from manufacturer'.
 357 @end example
 358
 359
 360 @node GET DATA /TYPE=TXT
 361 @subsection Textual Data Files
 362
 363 @display
 364 GET DATA /TYPE=TXT
 365         /FILE=@{'file-name',file_handle@}
 366         [/ARRANGEMENT=@{DELIMITED,FIXED@}]
 367         [/FIRSTCASE=@{first_case@}]
 368         [/IMPORTCASE=@{ALL,FIRST max_cases,PERCENT percent@}]
 369         @dots{}additional subcommands depending on ARRANGEMENT@dots{}
 370 @end display
 371
 372 @cindex text files
 373 @cindex data files
 374 When TYPE=TXT is specified, GET DATA reads data in a delimited or
 375 fixed columnar format, much like DATA LIST (@pxref{DATA LIST}).
 376
 377 The FILE subcommand is mandatory.  Specify the file to be read as
 378 a string file name or (for textual data
 379 only) a file handle (@pxref{File Handles}).
 380
 381 The ARRANGEMENT subcommand determines the file's basic format.
 382 DELIMITED, the default setting, specifies that fields in the input
 383 data are separated by spaces, tabs, or other user-specified
 384 delimiters.  FIXED specifies that fields in the input data appear at
 385 particular fixed column positions within records of a case.
 386
 387 By default, cases are read from the input file starting from the first
 388 line.  To skip lines at the beginning of an input file, set FIRSTCASE
 389 to the number of the first line to read: 2 to skip the first line, 3
 390 to skip the first two lines, and so on.
 391
 392 IMPORTCASE can be used to limit the number of cases read from the
 393 input file.  With the default setting, ALL, all cases in the file are
 394 read.  Specify FIRST @i{max_cases} to read at most @i{max_cases} cases
 395 from the file.  Use PERCENT @i{percent} to read only @i{percent}
 396 percent, approximately, of the cases contained in the file.  (The
 397 percentage is approximate, because there is no way to accurately count
 398 the number of cases in the file without reading the entire file.  The
 399 number of cases in some kinds of unusual files cannot be estimated;
 400 PSPP will read all cases in such files.)
 401
 402 FIRSTCASE and IMPORTCASE may be used with delimited and fixed-format
 403 data.  The remaining subcommands, which apply only to one of the two  file
 404 arrangements, are described below.
 405
 406 @menu
 407 * GET DATA /TYPE=TXT /ARRANGEMENT=DELIMITED::
 408 * GET DATA /TYPE=TXT /ARRANGEMENT=FIXED::
 409 @end menu
 410
 411 @node GET DATA /TYPE=TXT /ARRANGEMENT=DELIMITED
 412 @subsubsection Reading Delimited Data
 413
 414 @display
 415 GET DATA /TYPE=TXT
 416         /FILE=@{'file-name',file_handle@}
 417         [/ARRANGEMENT=@{DELIMITED,FIXED@}]
 418         [/FIRSTCASE=@{first_case@}]
 419         [/IMPORTCASE=@{ALL,FIRST max_cases,PERCENT percent@}]
 420
 421         /DELIMITERS="delimiters"
 422         [/QUALIFIER="quotes" [/ESCAPE]]
 423         [/DELCASE=@{LINE,VARIABLES n_variables@}]
 424         /VARIABLES=del_var [del_var]@dots{}
 425 where each del_var takes the form:
 426         variable format
 427 @end display
 428
 429 The GET DATA command with TYPE=TXT and ARRANGEMENT=DELIMITED reads
 430 input data from text files in delimited format, where fields are
 431 separated by a set of user-specified delimiters.  Its capabilities are
 432 similar to those of DATA LIST FREE (@pxref{DATA LIST FREE}), with a
 433 few enhancements.
 434
 435 The required FILE subcommand and optional FIRSTCASE and IMPORTCASE
 436 subcommands are described above (@pxref{GET DATA /TYPE=TXT}).
 437
 438 DELIMITERS, which is required, specifies the set of characters that
 439 may separate fields.  Each character in the string specified on
 440 DELIMITERS separates one field from the next.  The end of a line also
 441 separates fields, regardless of DELIMITERS.  Two consecutive
 442 delimiters in the input yield an empty field, as does a delimiter at
 443 the end of a line.  A space character as a delimiter is an exception:
 444 consecutive spaces do not yield an empty field and neither does any
 445 number of spaces at the end of a line.
 446
 447 To use a tab as a delimiter, specify @samp{\t} at the beginning of the
 448 DELIMITERS string.  To use a backslash as a delimiter, specify
 449 @samp{\\} as the first delimiter or, if a tab should also be a
 450 delimiter, immediately following @samp{\t}.  To read a data file in
 451 which each field appears on a separate line, specify the empty string
 452 for DELIMITERS.
 453
 454 The optional QUALIFIER subcommand names one or more characters that
 455 can be used to quote values within fields in the input.  A field that
 456 begins with one of the specified quote characters ends at the next
 457 matching quote.  Intervening delimiters become part of the field,
 458 instead of terminating it.  The ability to specify more than one quote
 459 character is a PSPP extension.
 460
 461 By default, a character specified on QUALIFIER cannot itself be
 462 embedded within a field that it quotes, because the quote character
 463 always terminates the quoted field.  With ESCAPE, however, a doubled
 464 quote character within a quoted field inserts a single instance of the
 465 quote into the field.  For example, if @samp{'} is specified on
 466 QUALIFIER, then without ESCAPE @code{'a''b'} specifies a pair of
 467 fields that contain @samp{a} and @samp{b}, but with ESCAPE it
 468 specifies a single field that contains @samp{a'b}.  ESCAPE is a PSPP
 469 extension.
 470
 471 The DELCASE subcommand controls how data may be broken across lines in
 472 the data file.  With LINE, the default setting, each line must contain
 473 all the data for exactly one case.  For additional flexibility, to
 474 allow a single case to be split among lines or multiple cases to be
 475 contained on a single line, specify VARIABLES @i{n_variables}, where
 476 @i{n_variables} is the number of variables per case.
 477
 478 The VARIABLES subcommand is required and must be the last subcommand.
 479 Specify the name of each variable and its input format (@pxref{Input
 480 and Output Formats}) in the order they should be read from the input
 481 file.
 482
 483 @subsubheading Examples
 484
 485 @noindent
 486 On a Unix-like system, the @samp{/etc/passwd} file has a format
 487 similar to this:
 488
 489 @example
 490 root:$1$nyeSP5gD$pDq/:0:0:,,,:/root:/bin/bash
 491 blp:$1$BrP/pFg4$g7OG:1000:1000:Ben Pfaff,,,:/home/blp:/bin/bash
 492 john:$1$JBuq/Fioq$g4A:1001:1001:John Darrington,,,:/home/john:/bin/bash
 493 jhs:$1$D3li4hPL$88X1:1002:1002:Jason Stover,,,:/home/jhs:/bin/csh
 494 @end example
 495
 496 @noindent
 497 The following syntax reads a file in the format used by
 498 @samp{/etc/passwd}:
 499
 500 @c If you change this example, change the regression test in
 501 @c tests/language/data-io/get-data.at to match.
 502 @example
 503 GET DATA /TYPE=TXT /FILE='/etc/passwd' /DELIMITERS=':'
 504         /VARIABLES=username A20
 505                    password A40
 506                    uid F10
 507                    gid F10
 508                    gecos A40
 509                    home A40
 510                    shell A40.
 511 @end example
 512
 513 @noindent
 514 Consider the following data on used cars:
 515
 516 @example
 517 model   year    mileage price   type    age
 518 Civic   2002    29883   15900   Si      2
 519 Civic   2003    13415   15900   EX      1
 520 Civic   1992    107000  3800    n/a     12
 521 Accord  2002    26613   17900   EX      1
 522 @end example
 523
 524 @noindent
 525 The following syntax can be used to read the used car data:
 526
 527 @c If you change this example, change the regression test in
 528 @c tests/language/data-io/get-data.at to match.
 529 @example
 530 GET DATA /TYPE=TXT /FILE='cars.data' /DELIMITERS=' ' /FIRSTCASE=2
 531         /VARIABLES=model A8
 532                    year F4
 533                    mileage F6
 534                    price F5
 535                    type A4
 536                    age F2.
 537 @end example
 538
 539 @noindent
 540 Consider the following information on animals in a pet store:
 541
 542 @example
 543 'Pet''s Name', "Age", "Color", "Date Received", "Price", "Height", "Type"
 544 , (Years), , , (Dollars), ,
 545 "Rover", 4.5, Brown, "12 Feb 2004", 80, '1''4"', "Dog"
 546 "Charlie", , Gold, "5 Apr 2007", 12.3, "3""", "Fish"
 547 "Molly", 2, Black, "12 Dec 2006", 25, '5"', "Cat"
 548 "Gilly", , White, "10 Apr 2007", 10, "3""", "Guinea Pig"
 549 @end example
 550
 551 @noindent
 552 The following syntax can be used to read the pet store data:
 553
 554 @c If you change this example, change the regression test in
 555 @c tests/language/data-io/get-data.at to match.
 556 @example
 557 GET DATA /TYPE=TXT /FILE='pets.data' /DELIMITERS=', ' /QUALIFIER='''"' /ESCAPE
 558         /FIRSTCASE=3
 559         /VARIABLES=name A10
 560                    age F3.1
 561                    color A5
 562                    received EDATE10
 563                    price F5.2
 564                    height a5
 565                    type a10.
 566 @end example
 567
 568 @node GET DATA /TYPE=TXT /ARRANGEMENT=FIXED
 569 @subsubsection Reading Fixed Columnar Data
 570
 571 @display
 572 GET DATA /TYPE=TXT
 573         /FILE=@{'file-name',file_handle@}
 574         [/ARRANGEMENT=@{DELIMITED,FIXED@}]
 575         [/FIRSTCASE=@{first_case@}]
 576         [/IMPORTCASE=@{ALL,FIRST max_cases,PERCENT percent@}]
 577
 578         [/FIXCASE=n]
 579         /VARIABLES fixed_var [fixed_var]@dots{}
 580             [/rec# fixed_var [fixed_var]@dots{}]@dots{}
 581 where each fixed_var takes the form:
 582         variable start-end format
 583 @end display
 584
 585 The GET DATA command with TYPE=TXT and ARRANGEMENT=FIXED reads input
 586 data from text files in fixed format, where each field is located in
 587 particular fixed column positions within records of a case.  Its
 588 capabilities are similar to those of DATA LIST FIXED (@pxref{DATA LIST
 589 FIXED}), with a few enhancements.
 590
 591 The required FILE subcommand and optional FIRSTCASE and IMPORTCASE
 592 subcommands are described above (@pxref{GET DATA /TYPE=TXT}).
 593
 594 The optional FIXCASE subcommand may be used to specify the positive
 595 integer number of input lines that make up each case.  The default
 596 value is 1.
 597
 598 The VARIABLES subcommand, which is required, specifies the positions
 599 at which each variable can be found.  For each variable, specify its
 600 name, followed by its start and end column separated by @samp{-}
 601 (e.g.@: @samp{0-9}), followed by an input format type (e.g.@:
 602 @samp{F}) or a full format specification (e.g.@: @samp{DOLLAR12.2}).
 603 For this command, columns are numbered starting from 0 at
 604 the left column.  Introduce the variables in the second and later
 605 lines of a case by a slash followed by the number of the line within
 606 the case, e.g.@: @samp{/2} for the second line.
 607
 608 @subsubheading Examples
 609
 610 @noindent
 611 Consider the following data on used cars:
 612
 613 @example
 614 model   year    mileage price   type    age
 615 Civic   2002    29883   15900   Si      2
 616 Civic   2003    13415   15900   EX      1
 617 Civic   1992    107000  3800    n/a     12
 618 Accord  2002    26613   17900   EX      1
 619 @end example
 620
 621 @noindent
 622 The following syntax can be used to read the used car data:
 623
 624 @c If you change this example, change the regression test in
 625 @c tests/language/data-io/get-data.at to match.
 626 @example
 627 GET DATA /TYPE=TXT /FILE='cars.data' /ARRANGEMENT=FIXED /FIRSTCASE=2
 628         /VARIABLES=model 0-7 A
 629                    year 8-15 F
 630                    mileage 16-23 F
 631                    price 24-31 F
 632                    type 32-40 A
 633                    age 40-47 F.
 634 @end example
 635
 636 @node IMPORT
 637 @section IMPORT
 638 @vindex IMPORT
 639
 640 @display
 641 IMPORT
 642         /FILE='file-name'
 643         /TYPE=@{COMM,TAPE@}
 644         /DROP=var_list
 645         /KEEP=var_list
 646         /RENAME=(src_names=target_names)@dots{}
 647 @end display
 648
 649 The @cmd{IMPORT} transformation clears the active dataset dictionary and
 650 data and
 651 replaces them with a dictionary and data from a system file or
 652 portable file.
 653
 654 The FILE subcommand, which is the only required subcommand, specifies
 655 the portable file to be read as a file name string or a file handle
 656 (@pxref{File Handles}).
 657
 658 The TYPE subcommand is currently not used.
 659
 660 DROP, KEEP, and RENAME follow the syntax used by @cmd{GET} (@pxref{GET}).
 661
 662 @cmd{IMPORT} does not cause the data to be read, only the dictionary.  The
 663 data is read later, when a procedure is executed.
 664
 665 Use of @cmd{IMPORT} to read a system file is a PSPP extension.
 666
 667 @node SAVE
 668 @section SAVE
 669 @vindex SAVE
 670
 671 @display
 672 SAVE
 673         /OUTFILE=@{'file-name',file_handle@}
 674         /UNSELECTED=@{RETAIN,DELETE@}
 675         /@{COMPRESSED,UNCOMPRESSED@}
 676         /PERMISSIONS=@{WRITEABLE,READONLY@}
 677         /DROP=var_list
 678         /KEEP=var_list
 679         /VERSION=version
 680         /RENAME=(src_names=target_names)@dots{}
 681         /NAMES
 682         /MAP
 683 @end display
 684
 685 The @cmd{SAVE} procedure causes the dictionary and data in the active
 686 dataset to
 687 be written to a system file.
 688
 689 OUTFILE is the only required subcommand.  Specify the system file
 690 to be written as a string file name or a file handle
 691 (@pxref{File Handles}).
 692
 693 By default, cases excluded with FILTER are written to the system file.
 694 These can be excluded by specifying DELETE on the UNSELECTED
 695 subcommand.  Specifying RETAIN makes the default explicit.
 696
 697 The COMPRESS and UNCOMPRESS subcommand determine whether the saved
 698 system file is compressed.  By default, system files are compressed.
 699 This default can be changed with the SET command (@pxref{SET}).
 700
 701 The PERMISSIONS subcommand specifies permissions for the new system
 702 file.  WRITEABLE, the default, creates the file with read and write
 703 permission.  READONLY creates the file for read-only access.
 704
 705 By default, all the variables in the active dataset dictionary are written
 706 to the system file.  The DROP subcommand can be used to specify a list
 707 of variables not to be written.  In contrast, KEEP specifies variables
 708 to be written, with all variables not specified not written.
 709
 710 Normally variables are saved to a system file under the same names they
 711 have in the active dataset.  Use the RENAME subcommand to change these names.
 712 Specify, within parentheses, a list of variable names followed by an
 713 equals sign (@samp{=}) and the names that they should be renamed to.
 714 Multiple parenthesized groups of variable names can be included on a
 715 single RENAME subcommand.  Variables' names may be swapped using a
 716 RENAME subcommand of the form @samp{/RENAME=(A B=B A)}.
 717
 718 Alternate syntax for the RENAME subcommand allows the parentheses to be
 719 eliminated.  When this is done, only a single variable may be renamed at
 720 once.  For instance, @samp{/RENAME=A=B}.  This alternate syntax is
 721 deprecated.
 722
 723 DROP, KEEP, and RENAME are performed in left-to-right order.  They
 724 each may be present any number of times.  @cmd{SAVE} never modifies
 725 the active dataset.  DROP, KEEP, and RENAME only affect the system file
 726 written to disk.
 727
 728 The VERSION subcommand specifies the version of the file format. Valid
 729 versions are 2 and 3.  The default version is 3.  In version 2 system
 730 files, variable names longer than 8 bytes will be truncated.  The two
 731 versions are otherwise identical.
 732
 733 The NAMES and MAP subcommands are currently ignored.
 734
 735 @cmd{SAVE} causes the data to be read.  It is a procedure.
 736
 737 @node SAVE TRANSLATE
 738 @section SAVE TRANSLATE
 739 @vindex SAVE TRANSLATE
 740
 741 @display
 742 SAVE TRANSLATE
 743         /OUTFILE=@{'file-name',file_handle@}
 744         /TYPE=@{CSV,TAB@}
 745         [/REPLACE]
 746         [/MISSING=@{IGNORE,RECODE@}]
 747
 748         [/DROP=var_list]
 749         [/KEEP=var_list]
 750         [/RENAME=(src_names=target_names)@dots{}]
 751         [/UNSELECTED=@{RETAIN,DELETE@}]
 752         [/MAP]
 753
 754         @dots{}additional subcommands depending on TYPE@dots{}
 755 @end display
 756
 757 The @cmd{SAVE TRANSLATE} command is used to save data into various
 758 formats understood by other applications.
 759
 760 The OUTFILE and TYPE subcommands are mandatory.  OUTFILE specifies the
 761 file to be written, as a string file name or a file handle
 762 (@pxref{File Handles}).  TYPE determines the type of the file or
 763 source to read.  It must be one of the following:
 764
 765 @table @asis
 766 @item CSV
 767 Comma-separated value format,
 768
 769 @item TAB
 770 Tab-delimited format.
 771 @end table
 772
 773 By default, SAVE TRANSLATE will not overwrite an existing file.  Use
 774 REPLACE to force an existing file to be overwritten.
 775
 776 With MISSING=IGNORE, the default, SAVE TRANSLATE treats user-missing
 777 values as if they were not missing.  Specify MISSING=RECODE to output
 778 numeric user-missing values like system-missing values and string
 779 user-missing values as all spaces.
 780
 781 By default, all the variables in the active dataset dictionary are saved
 782 to the system file, but DROP or KEEP can select a subset of variable
 783 to save.  The RENAME subcommand can also be used to change the names
 784 under which variables are saved.  UNSELECTED determines whether cases
 785 filtered out by the FILTER command are written to the output file.
 786 These subcommands have the same syntax and meaning as on the
 787 @cmd{SAVE} command (@pxref{SAVE}).
 788
 789 Each supported file type has additional subcommands, explained in
 790 separate sections below.
 791
 792 @cmd{SAVE TRANSLATE} causes the data to be read.  It is a procedure.
 793
 794 @menu
 795 * SAVE TRANSLATE /TYPE=CSV and TYPE=TAB::
 796 @end menu
 797
 798 @node SAVE TRANSLATE /TYPE=CSV and TYPE=TAB
 799 @subsection Writing Comma- and Tab-Separated Data Files
 800
 801 @display
 802 SAVE TRANSLATE
 803         /OUTFILE=@{'file-name',file_handle@}
 804         /TYPE=CSV
 805         [/REPLACE]
 806         [/MISSING=@{IGNORE,RECODE@}]
 807
 808         [/DROP=var_list]
 809         [/KEEP=var_list]
 810         [/RENAME=(src_names=target_names)@dots{}]
 811         [/UNSELECTED=@{RETAIN,DELETE@}]
 812
 813         [/FIELDNAMES]
 814         [/CELLS=@{VALUES,LABELS@}]
 815         [/TEXTOPTIONS DELIMITER='delimiter']
 816         [/TEXTOPTIONS QUALIFIER='qualifier']
 817         [/TEXTOPTIONS DECIMAL=@{DOT,COMMA@}]
 818         [/TEXTOPTIONS FORMAT=@{PLAIN,VARIABLE@}]
 819 @end display
 820
 821 The SAVE TRANSLATE command with TYPE=CSV or TYPE=TAB writes data in a
 822 comma- or tab-separated value format similar to that described by
 823 RFC@tie{}4180.  Each variable becomes one output column, and each case
 824 becomes one line of output.  If FIELDNAMES is specified, an additional
 825 line at the top of the output file lists variable names.
 826
 827 The CELLS and TEXTOPTIONS FORMAT settings determine how values are
 828 written to the output file:
 829
 830 @table @asis
 831 @item CELLS=VALUES FORMAT=PLAIN (the default settings)
 832 Writes variables to the output in ``plain'' formats that ignore the
 833 details of variable formats.  Numeric values are written as plain
 834 decimal numbers with enough digits to indicate their exact values in
 835 machine representation.  Numeric values include @samp{e} followed by
 836 an exponent if the exponent value would be less than -4 or greater
 837 than 16.  Dates are written in MM/DD/YYYY format and times in HH:MM:SS
 838 format.  WKDAY and MONTH values are written as decimal numbers.
 839
 840 Numeric values use, by default, the decimal point character set with
 841 SET DECIMAL (@pxref{SET DECIMAL}).  Use DECIMAL=DOT or DECIMAL=COMMA
 842 to force a particular decimal point character.
 843
 844 @item CELLS=VALUES FORMAT=VARIABLE
 845 Writes variables using their print formats.  Leading and trailing
 846 spaces are removed from numeric values, and trailing spaces are
 847 removed from string values.
 848
 849 @item CELLS=LABEL FORMAT=PLAIN
 850 @itemx CELLS=LABEL FORMAT=VARIABLE
 851 Writes value labels where they exist, and otherwise writes the values
 852 themselves as described above.
 853 @end table
 854
 855 Regardless of CELLS and TEXTOPTIONS FORMAT, numeric system-missing
 856 values are output as a single space.
 857
 858 For TYPE=TAB, tab characters delimit values.  For TYPE=CSV, the
 859 TEXTOPTIONS DELIMITER and DECIMAL settings determine the character
 860 that separate values within a line.  If DELIMITER is specified, then
 861 the specified string separate values.  If DELIMITER is not specified,
 862 then the default is a comma with DECIMAL=DOT or a semicolon with
 863 DECIMAL=COMMA.  If DECIMAL is not given either, it is implied by the
 864 decimal point character set with SET DECIMAL (@pxref{SET DECIMAL}).
 865
 866 The TEXTOPTIONS QUALIFIER setting specifies a character that is output
 867 before and after a value that contains the delimiter character or the
 868 qualifier character.  The default is a double quote (@samp{@@}).  A
 869 qualifier character that appears within a value is doubled.
 870
 871 @node SYSFILE INFO
 872 @section SYSFILE INFO
 873 @vindex SYSFILE INFO
 874
 875 @display
 876 SYSFILE INFO FILE='file-name'.
 877 @end display
 878
 879 @cmd{SYSFILE INFO} reads the dictionary in a system file and
 880 displays the information in its dictionary.
 881
 882 Specify a file name or file handle.  @cmd{SYSFILE INFO} reads that file as
 883 a system file and displays information on its dictionary.
 884
 885 @cmd{SYSFILE INFO} does not affect the current active dataset.
 886
 887 @node XEXPORT
 888 @section XEXPORT
 889 @vindex XEXPORT
 890
 891 @display
 892 XEXPORT
 893         /OUTFILE='file-name'
 894         /DIGITS=n
 895         /DROP=var_list
 896         /KEEP=var_list
 897         /RENAME=(src_names=target_names)@dots{}
 898         /TYPE=@{COMM,TAPE@}
 899         /MAP
 900 @end display
 901
 902 The @cmd{EXPORT} transformation writes the active dataset dictionary and
 903 data to a specified portable file.
 904
 905 This transformation is a PSPP extension.
 906
 907 It is similar to the @cmd{EXPORT} procedure, with two differences:
 908
 909 @itemize
 910 @item
 911 @cmd{XEXPORT} is a transformation, not a procedure.  It is executed when
 912 the data is read by a procedure or procedure-like command.
 913
 914 @item
 915 @cmd{XEXPORT} does not support the UNSELECTED subcommand.
 916 @end itemize
 917
 918 @xref{EXPORT}, for more information.
 919
 920 @node XSAVE
 921 @section XSAVE
 922 @vindex XSAVE
 923
 924 @display
 925 XSAVE
 926         /OUTFILE='file-name'
 927         /@{COMPRESSED,UNCOMPRESSED@}
 928         /PERMISSIONS=@{WRITEABLE,READONLY@}
 929         /DROP=var_list
 930         /KEEP=var_list
 931         /VERSION=version
 932         /RENAME=(src_names=target_names)@dots{}
 933         /NAMES
 934         /MAP
 935 @end display
 936
 937 The @cmd{XSAVE} transformation writes the active dataset's dictionary and
 938 data to a system file.  It is similar to the @cmd{SAVE}
 939 procedure, with two differences:
 940
 941 @itemize
 942 @item
 943 @cmd{XSAVE} is a transformation, not a procedure.  It is executed when
 944 the data is read by a procedure or procedure-like command.
 945
 946 @item
 947 @cmd{XSAVE} does not support the UNSELECTED subcommand.
 948 @end itemize
 949
 950 @xref{SAVE}, for more information.