pintos-os.org Git - pspp/blob - doc/files.texi

   1 @node System and Portable File IO
   2 @chapter System and Portable File I/O
   3
   4 The commands in this chapter read, write, and examine system files and
   5 portable files.
   6
   7 @menu
   8 * APPLY DICTIONARY::            Apply system file dictionary to active dataset.
   9 * EXPORT::                      Write to a portable file.
  10 * GET::                         Read from a system file.
  11 * GET DATA::                    Read from foreign files.
  12 * IMPORT::                      Read from a portable file.
  13 * SAVE::                        Write to a system file.
  14 * SAVE TRANSLATE::              Write data in foreign file formats.
  15 * SYSFILE INFO::                Display system file dictionary.
  16 * XEXPORT::                     Write to a portable file, as a transformation.
  17 * XSAVE::                       Write to a system file, as a transformation.
  18 @end menu
  19
  20 @node APPLY DICTIONARY
  21 @section APPLY DICTIONARY
  22 @vindex APPLY DICTIONARY
  23
  24 @display
  25 APPLY DICTIONARY FROM=@{'@var{file_name}',@var{file_handle}@}.
  26 @end display
  27
  28 @cmd{APPLY DICTIONARY} applies the variable labels, value labels,
  29 and missing values taken from a file to corresponding
  30 variables in the active dataset.  In some cases it also updates the
  31 weighting variable.
  32
  33 Specify a system file or portable file's name, a data set name
  34 (@pxref{Datasets}), or a file handle name (@pxref{File Handles}).  The
  35 dictionary in the file will be read, but it will not replace the
  36 active dataset's dictionary.  The file's data will not be read.
  37
  38 Only variables with names that exist in both the active dataset and the
  39 system file are considered.  Variables with the same name but different
  40 types (numeric, string) will cause an error message.  Otherwise, the
  41 system file variables' attributes will replace those in their matching
  42 active dataset variables:
  43
  44 @itemize @bullet
  45 @item
  46 If a system file variable has a variable label, then it will replace
  47 the variable label of the active dataset variable.  If the system
  48 file variable does not have a variable label, then the active dataset
  49 variable's variable label, if any, will be retained.
  50
  51 @item
  52 If the system file variable has custom attributes (@pxref{VARIABLE
  53 ATTRIBUTE}), then those attributes replace the active dataset variable's
  54 custom attributes.  If the system file variable does not have custom
  55 attributes, then the active dataset variable's custom attributes, if any,
  56 will be retained.
  57
  58 @item
  59 If the active dataset variable is numeric or short string, then value
  60 labels and missing values, if any, will be copied to the active dataset
  61 variable.  If the system file variable does not have value labels or
  62 missing values, then those in the active dataset variable, if any, will not
  63 be disturbed.
  64 @end itemize
  65
  66 In addition to properties of variables, some properties of the active
  67 file dictionary as a whole are updated:
  68
  69 @itemize @bullet
  70 @item
  71 If the system file has custom attributes (@pxref{DATAFILE ATTRIBUTE}),
  72 then those attributes replace the active dataset variable's custom
  73 attributes.
  74
  75 @item
  76 If the active dataset has a weighting variable (@pxref{WEIGHT}), and the
  77 system file does not, or if the weighting variable in the system file
  78 does not exist in the active dataset, then the active dataset weighting
  79 variable, if any, is retained.  Otherwise, the weighting variable in
  80 the system file becomes the active dataset weighting variable.
  81 @end itemize
  82
  83 @cmd{APPLY DICTIONARY} takes effect immediately.  It does not read the
  84 active dataset.  The system file is not modified.
  85
  86 @node EXPORT
  87 @section EXPORT
  88 @vindex EXPORT
  89
  90 @display
  91 EXPORT
  92         /OUTFILE='@var{file_name}'
  93         /UNSELECTED=@{RETAIN,DELETE@}
  94         /DIGITS=@var{n}
  95         /DROP=@var{var_list}
  96         /KEEP=@var{var_list}
  97         /RENAME=(@var{src_names}=@var{target_names})@dots{}
  98         /TYPE=@{COMM,TAPE@}
  99         /MAP
 100 @end display
 101
 102 The @cmd{EXPORT} procedure writes the active dataset's dictionary and
 103 data to a specified portable file.
 104
 105 By default, cases excluded with FILTER are written to the
 106 file.  These can be excluded by specifying DELETE on the @subcmd{UNSELECTED}
 107 subcommand.  Specifying RETAIN makes the default explicit.
 108
 109 Portable files express real numbers in base 30.  Integers are always
 110 expressed to the maximum precision needed to make them exact.
 111 Non-integers are, by default, expressed to the machine's maximum
 112 natural precision (approximately 15 decimal digits on many machines).
 113 If many numbers require this many digits, the portable file may
 114 significantly increase in size.  As an alternative, the @subcmd{DIGITS}
 115 subcommand may be used to specify the number of decimal digits of
 116 precision to write.  @subcmd{DIGITS} applies only to non-integers.
 117
 118 The @subcmd{OUTFILE} subcommand, which is the only required subcommand, specifies
 119 the portable file to be written as a file name string or
 120 a file handle (@pxref{File Handles}).
 121
 122 @subcmd{DROP}, @subcmd{KEEP}, and @subcmd{RENAME} follow the same format as the
 123 @subcmd{SAVE} procedure (@pxref{SAVE}).
 124
 125 The @subcmd{TYPE} subcommand specifies the character set for use in the
 126 portable file.  Its value is currently not used.
 127
 128 The @subcmd{MAP} subcommand is currently ignored.
 129
 130 @cmd{EXPORT} is a procedure.  It causes the active dataset to be read.
 131
 132 @node GET
 133 @section GET
 134 @vindex GET
 135
 136 @display
 137 GET
 138         /FILE=@{'@var{file_name}',@var{file_handle}@}
 139         /DROP=@var{var_list}
 140         /KEEP=@var{var_list}
 141         /RENAME=(@var{src_names}=@var{target_names})@dots{}
 142         /ENCODING='@var{encoding}'
 143 @end display
 144
 145 @cmd{GET} clears the current dictionary and active dataset and
 146 replaces them with the dictionary and data from a specified file.
 147
 148 The @subcmd{FILE} subcommand is the only required subcommand.  Specify
 149 the SPSS system file, SPSS/PC+ system file, or SPSS portable file to
 150 be read as a string file name or a file handle (@pxref{File Handles}).
 151
 152 By default, all the variables in a file are read.  The DROP
 153 subcommand can be used to specify a list of variables that are not to be
 154 read.  By contrast, the @subcmd{KEEP} subcommand can be used to specify
 155 variable that are to be read, with all other variables not read.
 156
 157 Normally variables in a file retain the names that they were
 158 saved under.  Use the @subcmd{RENAME} subcommand to change these names.
 159 Specify,
 160 within parentheses, a list of variable names followed by an equals sign
 161 (@samp{=}) and the names that they should be renamed to.  Multiple
 162 parenthesized groups of variable names can be included on a single
 163 @subcmd{RENAME} subcommand.
 164 Variables' names may be swapped using a @subcmd{RENAME}
 165 subcommand of the form @subcmd{/RENAME=(@var{A} @var{B}=@var{B} @var{A})}.
 166
 167 Alternate syntax for the @subcmd{RENAME} subcommand allows the parentheses to be
 168 eliminated.  When this is done, only a single variable may be renamed at
 169 once.  For instance, @subcmd{/RENAME=@var{A}=@var{B}}.  This alternate syntax is
 170 deprecated.
 171
 172 @subcmd{DROP}, @subcmd{KEEP}, and @subcmd{RENAME} are executed in left-to-right order.
 173 Each may be present any number of times.  @cmd{GET} never modifies a
 174 file on disk.  Only the active dataset read from the file
 175 is affected by these subcommands.
 176
 177 @pspp{} automatically detects the encoding of string data in the file,
 178 when possible.  The character encoding of old SPSS system files cannot
 179 always be guessed correctly, and SPSS/PC+ system files do not include
 180 any indication of their encoding.  Specify the @subcmd{ENCODING}
 181 subcommand with an @acronym{IANA} character set name as its string
 182 argument to override the default.  Use @cmd{SYSFILE INFO} to analyze
 183 the encodings that might be valid for a system file.  The
 184 @subcmd{ENCODING} subcommand is a @pspp{} extension.
 185
 186 @cmd{GET} does not cause the data to be read, only the dictionary.  The data
 187 is read later, when a procedure is executed.
 188
 189 Use of @cmd{GET} to read a portable file is a @pspp{} extension.
 190
 191 @node GET DATA
 192 @section GET DATA
 193 @vindex GET DATA
 194
 195 @display
 196 GET DATA
 197         /TYPE=@{GNM,ODS,PSQL,TXT@}
 198         @dots{}additional subcommands depending on TYPE@dots{}
 199 @end display
 200
 201 The @cmd{GET DATA} command is used to read files and other data
 202 sources created by other applications.  When this command is executed,
 203 the current dictionary and active dataset are replaced with variables
 204 and data read from the specified source.
 205
 206 The @subcmd{TYPE} subcommand is mandatory and must be the first subcommand
 207 specified.  It determines the type of the file or source to read.
 208 @pspp{} currently supports the following file types:
 209
 210 @table @asis
 211 @item GNM
 212 Spreadsheet files created by Gnumeric (@url{http://gnumeric.org}).
 213
 214 @item ODS
 215 Spreadsheet files in OpenDocument format (@url{http://opendocumentformat.org}).
 216
 217 @item PSQL
 218 Relations from PostgreSQL databases (@url{http://postgresql.org}).
 219
 220 @item TXT
 221 Textual data files in columnar and delimited formats.
 222 @end table
 223
 224 Each supported file type has additional subcommands, explained in
 225 separate sections below.
 226
 227 @menu
 228 * GET DATA /TYPE=GNM/ODS::     Spreadsheets
 229 * GET DATA /TYPE=PSQL::        Databases
 230 * GET DATA /TYPE=TXT::         Delimited Text Files
 231 @end menu
 232
 233 @node GET DATA /TYPE=GNM/ODS
 234 @subsection Spreadsheet Files
 235
 236 @display
 237 GET DATA /TYPE=@{GNM, ODS@}
 238         /FILE=@{'@var{file_name}'@}
 239         /SHEET=@{NAME '@var{sheet_name}', INDEX @var{n}@}
 240         /CELLRANGE=@{RANGE '@var{range}', FULL@}
 241         /READNAMES=@{ON, OFF@}
 242         /ASSUMEDSTRWIDTH=@var{n}.
 243 @end display
 244
 245 @cindex Gnumeric
 246 @cindex OpenDocument
 247 @cindex spreadsheet files
 248
 249 Gnumeric spreadsheets (@url{http://gnumeric.org}), and spreadsheets
 250 in OpenDocument format
 251 (@url{http://libreplanet.org/wiki/Group:OpenDocument/Software})
 252 can be read using the @cmd{GET DATA} command.
 253 Use the @subcmd{TYPE} subcommand to indicate the file's format.
 254 /TYPE=GNM indicates Gnumeric files,
 255 /TYPE=ODS indicates OpenDocument.
 256 The @subcmd{FILE} subcommand is mandatory.
 257 Use it to specify the name file to be read.
 258 All other subcommands are optional.
 259
 260 The format of each variable is determined by the format of the spreadsheet
 261 cell containing the first datum for the variable.
 262 If this cell is of string (text) format, then the width of the variable is
 263 determined from the length of the string it contains, unless the
 264 @subcmd{ASSUMEDSTRWIDTH} subcommand is given.
 265
 266 The @subcmd{SHEET} subcommand specifies the sheet within the spreadsheet file to read.
 267 There are two forms of the @subcmd{SHEET} subcommand.
 268 In the first form,
 269 @subcmd{/SHEET=name @var{sheet_name}}, the string @var{sheet_name} is the
 270 name of the sheet to read.
 271 In the second form, @subcmd{/SHEET=index @var{idx}}, @var{idx} is a
 272 integer which is the index of the sheet to read.
 273 The first sheet has the index 1.
 274 If the @subcmd{SHEET} subcommand is omitted, then the command will read the
 275 first sheet in the file.
 276
 277 The @subcmd{CELLRANGE} subcommand specifies the range of cells within the sheet to read.
 278 If the subcommand is given as @subcmd{/CELLRANGE=FULL}, then the entire
 279 sheet  is read.
 280 To read only part of a sheet, use the form
 281 @subcmd{/CELLRANGE=range '@var{top_left_cell}:@var{bottom_right_cell}'}.
 282 For example, the subcommand @subcmd{/CELLRANGE=range 'C3:P19'} reads
 283 columns C--P, and rows 3--19 inclusive.
 284 If no @subcmd{CELLRANGE} subcommand is given, then the entire sheet is read.
 285
 286 If @subcmd{/READNAMES=ON} is specified, then the contents of cells of
 287 the first row are used as the names of the variables in which to store
 288 the data from subsequent rows.  This is the default.
 289 If @subcmd{/READNAMES=OFF} is
 290 used, then the variables  receive automatically assigned names.
 291
 292 The @subcmd{ASSUMEDSTRWIDTH} subcommand specifies the maximum width of string
 293 variables read  from the file.
 294 If omitted, the default value is determined from the length of the
 295 string in the first spreadsheet cell for each variable.
 296
 297
 298 @node GET DATA /TYPE=PSQL
 299 @subsection Postgres Database Queries
 300
 301 @display
 302 GET DATA /TYPE=PSQL
 303          /CONNECT=@{@var{connection info}@}
 304          /SQL=@{@var{query}@}
 305          [/ASSUMEDSTRWIDTH=@var{w}]
 306          [/UNENCRYPTED]
 307          [/BSIZE=@var{n}].
 308 @end display
 309
 310 @cindex postgres
 311 @cindex databases
 312
 313 The PSQL type is used to import data from a postgres database server.
 314 The server may be located locally or remotely.
 315 Variables are automatically created based on the table column names
 316 or the names specified in the SQL query.
 317 Postgres data types of high precision, will loose precision when
 318 imported into @pspp{}.
 319 Not all the postgres data types are able to be represented in @pspp{}.
 320 If a datum cannot be represented a warning will be issued and that
 321 datum will be set to SYSMIS.
 322
 323 The @subcmd{CONNECT} subcommand is mandatory.
 324 It is a string specifying the parameters of the database server from
 325 which the data should be fetched.
 326 The format of the string is given in the postgres manual
 327 @url{http://www.postgresql.org/docs/8.0/static/libpq.html#LIBPQ-CONNECT}.
 328
 329 The @subcmd{SQL} subcommand is mandatory.
 330 It must be a valid SQL string to retrieve data from the database.
 331
 332 The @subcmd{ASSUMEDSTRWIDTH} subcommand specifies the maximum width of string
 333 variables read  from the database.
 334 If omitted, the default value is determined from the length of the
 335 string in the first value read for each variable.
 336
 337 The @subcmd{UNENCRYPTED} subcommand allows data to be retrieved over an insecure
 338 connection.
 339 If the connection is not encrypted, and the @subcmd{UNENCRYPTED} subcommand is
 340 not given, then an error will occur.
 341 Whether or not the connection is
 342 encrypted depends upon the underlying psql library and the
 343 capabilities of the database server.
 344
 345 The @subcmd{BSIZE} subcommand serves only to optimise the speed of data transfer.
 346 It specifies an upper limit on
 347 number of cases to fetch from the database at once.
 348 The default value is 4096.
 349 If your SQL statement fetches a large number of cases but only a small number of
 350 variables, then the data transfer may be faster if you increase this value.
 351 Conversely, if the number of variables is large, or if the machine on which
 352 @pspp{} is running has only a
 353 small amount of memory, then a smaller value will be better.
 354
 355
 356 The following syntax is an example:
 357 @example
 358 GET DATA /TYPE=PSQL
 359      /CONNECT='host=example.com port=5432 dbname=product user=fred passwd=xxxx'
 360      /SQL='select * from manufacturer'.
 361 @end example
 362
 363
 364 @node GET DATA /TYPE=TXT
 365 @subsection Textual Data Files
 366
 367 @display
 368 GET DATA /TYPE=TXT
 369         /FILE=@{'@var{file_name}',@var{file_handle}@}
 370         [ENCODING='@var{encoding}']
 371         [/ARRANGEMENT=@{DELIMITED,FIXED@}]
 372         [/FIRSTCASE=@{@var{first_case}@}]
 373         [/IMPORTCASE=@{ALL,FIRST @var{max_cases},PERCENT @var{percent}@}]
 374         @dots{}additional subcommands depending on ARRANGEMENT@dots{}
 375 @end display
 376
 377 @cindex text files
 378 @cindex data files
 379 When TYPE=TXT is specified, GET DATA reads data in a delimited or
 380 fixed columnar format, much like DATA LIST (@pxref{DATA LIST}).
 381
 382 The @subcmd{FILE} subcommand is mandatory.  Specify the file to be read as
 383 a string file name or (for textual data only) a
 384 file handle (@pxref{File Handles}).
 385
 386 The @subcmd{ENCODING} subcommand specifies the character encoding of
 387 the file to be read.  @xref{INSERT}, for information on supported
 388 encodings.
 389
 390 The @subcmd{ARRANGEMENT} subcommand determines the file's basic format.
 391 DELIMITED, the default setting, specifies that fields in the input
 392 data are separated by spaces, tabs, or other user-specified
 393 delimiters.  FIXED specifies that fields in the input data appear at
 394 particular fixed column positions within records of a case.
 395
 396 By default, cases are read from the input file starting from the first
 397 line.  To skip lines at the beginning of an input file, set @subcmd{FIRSTCASE}
 398 to the number of the first line to read: 2 to skip the first line, 3
 399 to skip the first two lines, and so on.
 400
 401 @subcmd{IMPORTCASE} can be used to limit the number of cases read from the
 402 input file.  With the default setting, ALL, all cases in the file are
 403 read.  Specify FIRST @var{max_cases} to read at most @var{max_cases} cases
 404 from the file.  Use @subcmd{PERCENT @var{percent}} to read only @var{percent}
 405 percent, approximately, of the cases contained in the file.  (The
 406 percentage is approximate, because there is no way to accurately count
 407 the number of cases in the file without reading the entire file.  The
 408 number of cases in some kinds of unusual files cannot be estimated;
 409 @pspp{} will read all cases in such files.)
 410
 411 @subcmd{FIRSTCASE} and @subcmd{IMPORTCASE} may be used with delimited and fixed-format
 412 data.  The remaining subcommands, which apply only to one of the two  file
 413 arrangements, are described below.
 414
 415 @menu
 416 * GET DATA /TYPE=TXT /ARRANGEMENT=DELIMITED::
 417 * GET DATA /TYPE=TXT /ARRANGEMENT=FIXED::
 418 @end menu
 419
 420 @node GET DATA /TYPE=TXT /ARRANGEMENT=DELIMITED
 421 @subsubsection Reading Delimited Data
 422
 423 @display
 424 GET DATA /TYPE=TXT
 425         /FILE=@{'@var{file_name}',@var{file_handle}@}
 426         [/ARRANGEMENT=@{DELIMITED,FIXED@}]
 427         [/FIRSTCASE=@{@var{first_case}@}]
 428         [/IMPORTCASE=@{ALL,FIRST @var{max_cases},PERCENT @var{percent}@}]
 429
 430         /DELIMITERS="@var{delimiters}"
 431         [/QUALIFIER="@var{quotes}"
 432         [/DELCASE=@{LINE,VARIABLES @var{n_variables}@}]
 433         /VARIABLES=@var{del_var1} [@var{del_var2}]@dots{}
 434 where each @var{del_var} takes the form:
 435         variable format
 436 @end display
 437
 438 The GET DATA command with TYPE=TXT and ARRANGEMENT=DELIMITED reads
 439 input data from text files in delimited format, where fields are
 440 separated by a set of user-specified delimiters.  Its capabilities are
 441 similar to those of DATA LIST FREE (@pxref{DATA LIST FREE}), with a
 442 few enhancements.
 443
 444 The required @subcmd{FILE} subcommand and optional @subcmd{FIRSTCASE} and @subcmd{IMPORTCASE}
 445 subcommands are described above (@pxref{GET DATA /TYPE=TXT}).
 446
 447 @subcmd{DELIMITERS}, which is required, specifies the set of characters that
 448 may separate fields.  Each character in the string specified on
 449 @subcmd{DELIMITERS} separates one field from the next.  The end of a line also
 450 separates fields, regardless of @subcmd{DELIMITERS}.  Two consecutive
 451 delimiters in the input yield an empty field, as does a delimiter at
 452 the end of a line.  A space character as a delimiter is an exception:
 453 consecutive spaces do not yield an empty field and neither does any
 454 number of spaces at the end of a line.
 455
 456 To use a tab as a delimiter, specify @samp{\t} at the beginning of the
 457 @subcmd{DELIMITERS} string.  To use a backslash as a delimiter, specify
 458 @samp{\\} as the first delimiter or, if a tab should also be a
 459 delimiter, immediately following @samp{\t}.  To read a data file in
 460 which each field appears on a separate line, specify the empty string
 461 for @subcmd{DELIMITERS}.
 462
 463 The optional @subcmd{QUALIFIER} subcommand names one or more characters that
 464 can be used to quote values within fields in the input.  A field that
 465 begins with one of the specified quote characters ends at the next
 466 matching quote.  Intervening delimiters become part of the field,
 467 instead of terminating it.  The ability to specify more than one quote
 468 character is a @pspp{} extension.
 469
 470 The character specified on @subcmd{QUALIFIER} can be embedded within a
 471 field that it quotes by doubling the qualifier.  For example, if
 472 @samp{'} is specified on @subcmd{QUALIFIER}, then @code{'a''b'}
 473 specifies a field that contains @samp{a'b}.
 474
 475 The @subcmd{DELCASE} subcommand controls how data may be broken across lines in
 476 the data file.  With LINE, the default setting, each line must contain
 477 all the data for exactly one case.  For additional flexibility, to
 478 allow a single case to be split among lines or multiple cases to be
 479 contained on a single line, specify VARIABLES @i{n_variables}, where
 480 @i{n_variables} is the number of variables per case.
 481
 482 The @subcmd{VARIABLES} subcommand is required and must be the last subcommand.
 483 Specify the name of each variable and its input format (@pxref{Input
 484 and Output Formats}) in the order they should be read from the input
 485 file.
 486
 487 @subsubheading Examples
 488
 489 @noindent
 490 On a Unix-like system, the @samp{/etc/passwd} file has a format
 491 similar to this:
 492
 493 @example
 494 root:$1$nyeSP5gD$pDq/:0:0:,,,:/root:/bin/bash
 495 blp:$1$BrP/pFg4$g7OG:1000:1000:Ben Pfaff,,,:/home/blp:/bin/bash
 496 john:$1$JBuq/Fioq$g4A:1001:1001:John Darrington,,,:/home/john:/bin/bash
 497 jhs:$1$D3li4hPL$88X1:1002:1002:Jason Stover,,,:/home/jhs:/bin/csh
 498 @end example
 499
 500 @noindent
 501 The following syntax reads a file in the format used by
 502 @samp{/etc/passwd}:
 503
 504 @c If you change this example, change the regression test in
 505 @c tests/language/data-io/get-data.at to match.
 506 @example
 507 GET DATA /TYPE=TXT /FILE='/etc/passwd' /DELIMITERS=':'
 508         /VARIABLES=username A20
 509                    password A40
 510                    uid F10
 511                    gid F10
 512                    gecos A40
 513                    home A40
 514                    shell A40.
 515 @end example
 516
 517 @noindent
 518 Consider the following data on used cars:
 519
 520 @example
 521 model   year    mileage price   type    age
 522 Civic   2002    29883   15900   Si      2
 523 Civic   2003    13415   15900   EX      1
 524 Civic   1992    107000  3800    n/a     12
 525 Accord  2002    26613   17900   EX      1
 526 @end example
 527
 528 @noindent
 529 The following syntax can be used to read the used car data:
 530
 531 @c If you change this example, change the regression test in
 532 @c tests/language/data-io/get-data.at to match.
 533 @example
 534 GET DATA /TYPE=TXT /FILE='cars.data' /DELIMITERS=' ' /FIRSTCASE=2
 535         /VARIABLES=model A8
 536                    year F4
 537                    mileage F6
 538                    price F5
 539                    type A4
 540                    age F2.
 541 @end example
 542
 543 @noindent
 544 Consider the following information on animals in a pet store:
 545
 546 @example
 547 'Pet''s Name', "Age", "Color", "Date Received", "Price", "Height", "Type"
 548 , (Years), , , (Dollars), ,
 549 "Rover", 4.5, Brown, "12 Feb 2004", 80, '1''4"', "Dog"
 550 "Charlie", , Gold, "5 Apr 2007", 12.3, "3""", "Fish"
 551 "Molly", 2, Black, "12 Dec 2006", 25, '5"', "Cat"
 552 "Gilly", , White, "10 Apr 2007", 10, "3""", "Guinea Pig"
 553 @end example
 554
 555 @noindent
 556 The following syntax can be used to read the pet store data:
 557
 558 @c If you change this example, change the regression test in
 559 @c tests/language/data-io/get-data.at to match.
 560 @example
 561 GET DATA /TYPE=TXT /FILE='pets.data' /DELIMITERS=', ' /QUALIFIER='''"' /ESCAPE
 562         /FIRSTCASE=3
 563         /VARIABLES=name A10
 564                    age F3.1
 565                    color A5
 566                    received EDATE10
 567                    price F5.2
 568                    height a5
 569                    type a10.
 570 @end example
 571
 572 @node GET DATA /TYPE=TXT /ARRANGEMENT=FIXED
 573 @subsubsection Reading Fixed Columnar Data
 574
 575 @c (modify-syntax-entry ?_ "w")
 576 @c (modify-syntax-entry ?' "'")
 577 @c (modify-syntax-entry ?@ "'")
 578
 579 @display
 580 GET DATA /TYPE=TXT
 581         /FILE=@{'file_name',@var{file_handle}@}
 582         [/ARRANGEMENT=@{DELIMITED,FIXED@}]
 583         [/FIRSTCASE=@{@var{first_case}@}]
 584         [/IMPORTCASE=@{ALL,FIRST @var{max_cases},PERCENT @var{percent}@}]
 585
 586         [/FIXCASE=@var{n}]
 587         /VARIABLES @var{fixed_var} [@var{fixed_var}]@dots{}
 588             [/rec# @var{fixed_var} [@var{fixed_var}]@dots{}]@dots{}
 589 where each @var{fixed_var} takes the form:
 590         @var{variable} @var{start}-@var{end} @var{format}
 591 @end display
 592
 593 The @cmd{GET DATA} command with TYPE=TXT and ARRANGEMENT=FIXED reads input
 594 data from text files in fixed format, where each field is located in
 595 particular fixed column positions within records of a case.  Its
 596 capabilities are similar to those of DATA LIST FIXED (@pxref{DATA LIST
 597 FIXED}), with a few enhancements.
 598
 599 The required @subcmd{FILE} subcommand and optional @subcmd{FIRSTCASE} and @subcmd{IMPORTCASE}
 600 subcommands are described above (@pxref{GET DATA /TYPE=TXT}).
 601
 602 The optional @subcmd{FIXCASE} subcommand may be used to specify the positive
 603 integer number of input lines that make up each case.  The default
 604 value is 1.
 605
 606 The @subcmd{VARIABLES} subcommand, which is required, specifies the positions
 607 at which each variable can be found.  For each variable, specify its
 608 name, followed by its start and end column separated by @samp{-}
 609 (e.g.@: @samp{0-9}), followed by an input format type (e.g.@:
 610 @samp{F}) or a full format specification (e.g.@: @samp{DOLLAR12.2}).
 611 For this command, columns are numbered starting from 0 at
 612 the left column.  Introduce the variables in the second and later
 613 lines of a case by a slash followed by the number of the line within
 614 the case, e.g.@: @samp{/2} for the second line.
 615
 616 @subsubheading Examples
 617
 618 @noindent
 619 Consider the following data on used cars:
 620
 621 @example
 622 model   year    mileage price   type    age
 623 Civic   2002    29883   15900   Si      2
 624 Civic   2003    13415   15900   EX      1
 625 Civic   1992    107000  3800    n/a     12
 626 Accord  2002    26613   17900   EX      1
 627 @end example
 628
 629 @noindent
 630 The following syntax can be used to read the used car data:
 631
 632 @c If you change this example, change the regression test in
 633 @c tests/language/data-io/get-data.at to match.
 634 @example
 635 GET DATA /TYPE=TXT /FILE='cars.data' /ARRANGEMENT=FIXED /FIRSTCASE=2
 636         /VARIABLES=model 0-7 A
 637                    year 8-15 F
 638                    mileage 16-23 F
 639                    price 24-31 F
 640                    type 32-40 A
 641                    age 40-47 F.
 642 @end example
 643
 644 @node IMPORT
 645 @section IMPORT
 646 @vindex IMPORT
 647
 648 @display
 649 IMPORT
 650         /FILE='@var{file_name}'
 651         /TYPE=@{COMM,TAPE@}
 652         /DROP=@var{var_list}
 653         /KEEP=@var{var_list}
 654         /RENAME=(@var{src_names}=@var{target_names})@dots{}
 655 @end display
 656
 657 The @cmd{IMPORT} transformation clears the active dataset dictionary and
 658 data and
 659 replaces them with a dictionary and data from a system file or
 660 portable file.
 661
 662 The @subcmd{FILE} subcommand, which is the only required subcommand, specifies
 663 the portable file to be read as a file name string or a file handle
 664 (@pxref{File Handles}).
 665
 666 The @subcmd{TYPE} subcommand is currently not used.
 667
 668 @subcmd{DROP}, @subcmd{KEEP}, and @subcmd{RENAME} follow the syntax used by @cmd{GET} (@pxref{GET}).
 669
 670 @cmd{IMPORT} does not cause the data to be read; only the dictionary.  The
 671 data is read later, when a procedure is executed.
 672
 673 Use of @cmd{IMPORT} to read a system file is a @pspp{} extension.
 674
 675 @node SAVE
 676 @section SAVE
 677 @vindex SAVE
 678
 679 @display
 680 SAVE
 681         /OUTFILE=@{'@var{file_name}',@var{file_handle}@}
 682         /UNSELECTED=@{RETAIN,DELETE@}
 683         /@{UNCOMPRESSED,COMPRESSED,ZCOMPRESSED@}
 684         /PERMISSIONS=@{WRITEABLE,READONLY@}
 685         /DROP=@var{var_list}
 686         /KEEP=@var{var_list}
 687         /VERSION=@var{version}
 688         /RENAME=(@var{src_names}=@var{target_names})@dots{}
 689         /NAMES
 690         /MAP
 691 @end display
 692
 693 The @cmd{SAVE} procedure causes the dictionary and data in the active
 694 dataset to
 695 be written to a system file.
 696
 697 OUTFILE is the only required subcommand.  Specify the system file
 698 to be written as a string file name or a file handle
 699 (@pxref{File Handles}).
 700
 701 By default, cases excluded with FILTER are written to the system file.
 702 These can be excluded by specifying @subcmd{DELETE} on the @subcmd{UNSELECTED}
 703 subcommand.  Specifying @subcmd{RETAIN} makes the default explicit.
 704
 705 The @subcmd{UNCOMPRESSED}, @subcmd{COMPRESSED}, and
 706 @subcmd{ZCOMPRESSED} subcommand determine the system file's
 707 compression level:
 708
 709 @table @code
 710 @item UNCOMPRESSED
 711 Data is not compressed.  Each numeric value uses 8 bytes of disk
 712 space.  Each string value uses one byte per column width, rounded up
 713 to a multiple of 8 bytes.
 714
 715 @item COMPRESSED
 716 Data is compressed with a simple algorithm.  Each integer numeric
 717 value between @minus{}99 and 151, inclusive, or system missing value
 718 uses one byte of disk space.  Each 8-byte segment of a string that
 719 consists only of spaces uses 1 byte.  Any other numeric value or
 720 8-byte string segment uses 9 bytes of disk space.
 721
 722 @item ZCOMPRESSED
 723 Data is compressed with the ``deflate'' compression algorithm
 724 specified in RFC@tie{}1951 (the same algorithm used by
 725 @command{gzip}).  Files written with this compression level cannot be
 726 read by PSPP 0.8.1 or earlier or by SPSS 20 or earlier.
 727 @end table
 728
 729 @subcmd{COMPRESSED} is the default compression level.  The SET command
 730 (@pxref{SET}) can change this default.
 731
 732 The @subcmd{PERMISSIONS} subcommand specifies permissions for the new system
 733 file.  WRITEABLE, the default, creates the file with read and write
 734 permission.  READONLY creates the file for read-only access.
 735
 736 By default, all the variables in the active dataset dictionary are written
 737 to the system file.  The @subcmd{DROP} subcommand can be used to specify a list
 738 of variables not to be written.  In contrast, KEEP specifies variables
 739 to be written, with all variables not specified not written.
 740
 741 Normally variables are saved to a system file under the same names they
 742 have in the active dataset.  Use the @subcmd{RENAME} subcommand to change these names.
 743 Specify, within parentheses, a list of variable names followed by an
 744 equals sign (@samp{=}) and the names that they should be renamed to.
 745 Multiple parenthesized groups of variable names can be included on a
 746 single @subcmd{RENAME} subcommand.  Variables' names may be swapped using a
 747 @subcmd{RENAME} subcommand of the
 748 form @subcmd{/RENAME=(@var{A} @var{B}=@var{B} @var{A})}.
 749
 750 Alternate syntax for the @subcmd{RENAME} subcommand allows the parentheses to be
 751 eliminated.  When this is done, only a single variable may be renamed at
 752 once.  For instance, @subcmd{/RENAME=@var{A}=@var{B}}.  This alternate syntax is
 753 deprecated.
 754
 755 @subcmd{DROP}, @subcmd{KEEP}, and @subcmd{RENAME} are performed in
 756 left-to-right order.  They
 757 each may be present any number of times.  @cmd{SAVE} never modifies
 758 the active dataset.  @subcmd{DROP}, @subcmd{KEEP}, and @subcmd{RENAME} only
 759 affect the system file written to disk.
 760
 761 The @subcmd{VERSION} subcommand specifies the version of the file format. Valid
 762 versions are 2 and 3.  The default version is 3.  In version 2 system
 763 files, variable names longer than 8 bytes will be truncated.  The two
 764 versions are otherwise identical.
 765
 766 The @subcmd{NAMES} and @subcmd{MAP} subcommands are currently ignored.
 767
 768 @cmd{SAVE} causes the data to be read.  It is a procedure.
 769
 770 @node SAVE TRANSLATE
 771 @section SAVE TRANSLATE
 772 @vindex SAVE TRANSLATE
 773
 774 @display
 775 SAVE TRANSLATE
 776         /OUTFILE=@{'@var{file_name}',@var{file_handle}@}
 777         /TYPE=@{CSV,TAB@}
 778         [/REPLACE]
 779         [/MISSING=@{IGNORE,RECODE@}]
 780
 781         [/DROP=@var{var_list}]
 782         [/KEEP=@var{var_list}]
 783         [/RENAME=(@var{src_names}=@var{target_names})@dots{}]
 784         [/UNSELECTED=@{RETAIN,DELETE@}]
 785         [/MAP]
 786
 787         @dots{}additional subcommands depending on TYPE@dots{}
 788 @end display
 789
 790 The @cmd{SAVE TRANSLATE} command is used to save data into various
 791 formats understood by other applications.
 792
 793 The @subcmd{OUTFILE} and @subcmd{TYPE} subcommands are mandatory.
 794 @subcmd{OUTFILE} specifies the file to be written, as a string file name or a file handle
 795 (@pxref{File Handles}).  @subcmd{TYPE} determines the type of the file or
 796 source to read.  It must be one of the following:
 797
 798 @table @asis
 799 @item CSV
 800 Comma-separated value format,
 801
 802 @item TAB
 803 Tab-delimited format.
 804 @end table
 805
 806 By default, @cmd{SAVE TRANSLATE} will not overwrite an existing file.  Use
 807 @subcmd{REPLACE} to force an existing file to be overwritten.
 808
 809 With MISSING=IGNORE, the default, @subcmd{SAVE TRANSLATE} treats user-missing
 810 values as if they were not missing.  Specify MISSING=RECODE to output
 811 numeric user-missing values like system-missing values and string
 812 user-missing values as all spaces.
 813
 814 By default, all the variables in the active dataset dictionary are saved
 815 to the system file, but @subcmd{DROP} or @subcmd{KEEP} can select a subset of variable
 816 to save.  The @subcmd{RENAME} subcommand can also be used to change the names
 817 under which variables are saved.  @subcmd{UNSELECTED} determines whether cases
 818 filtered out by the @cmd{FILTER} command are written to the output file.
 819 These subcommands have the same syntax and meaning as on the
 820 @cmd{SAVE} command (@pxref{SAVE}).
 821
 822 Each supported file type has additional subcommands, explained in
 823 separate sections below.
 824
 825 @cmd{SAVE TRANSLATE} causes the data to be read.  It is a procedure.
 826
 827 @menu
 828 * SAVE TRANSLATE /TYPE=CSV and TYPE=TAB::
 829 @end menu
 830
 831 @node SAVE TRANSLATE /TYPE=CSV and TYPE=TAB
 832 @subsection Writing Comma- and Tab-Separated Data Files
 833
 834 @display
 835 SAVE TRANSLATE
 836         /OUTFILE=@{'@var{file_name}',@var{file_handle}@}
 837         /TYPE=CSV
 838         [/REPLACE]
 839         [/MISSING=@{IGNORE,RECODE@}]
 840
 841         [/DROP=@var{var_list}]
 842         [/KEEP=@var{var_list}]
 843         [/RENAME=(@var{src_names}=@var{target_names})@dots{}]
 844         [/UNSELECTED=@{RETAIN,DELETE@}]
 845
 846         [/FIELDNAMES]
 847         [/CELLS=@{VALUES,LABELS@}]
 848         [/TEXTOPTIONS DELIMITER='@var{delimiter}']
 849         [/TEXTOPTIONS QUALIFIER='@var{qualifier}']
 850         [/TEXTOPTIONS DECIMAL=@{DOT,COMMA@}]
 851         [/TEXTOPTIONS FORMAT=@{PLAIN,VARIABLE@}]
 852 @end display
 853
 854 The SAVE TRANSLATE command with TYPE=CSV or TYPE=TAB writes data in a
 855 comma- or tab-separated value format similar to that described by
 856 RFC@tie{}4180.  Each variable becomes one output column, and each case
 857 becomes one line of output.  If FIELDNAMES is specified, an additional
 858 line at the top of the output file lists variable names.
 859
 860 The CELLS and TEXTOPTIONS FORMAT settings determine how values are
 861 written to the output file:
 862
 863 @table @asis
 864 @item CELLS=VALUES FORMAT=PLAIN (the default settings)
 865 Writes variables to the output in ``plain'' formats that ignore the
 866 details of variable formats.  Numeric values are written as plain
 867 decimal numbers with enough digits to indicate their exact values in
 868 machine representation.  Numeric values include @samp{e} followed by
 869 an exponent if the exponent value would be less than -4 or greater
 870 than 16.  Dates are written in MM/DD/YYYY format and times in HH:MM:SS
 871 format.  WKDAY and MONTH values are written as decimal numbers.
 872
 873 Numeric values use, by default, the decimal point character set with
 874 SET DECIMAL (@pxref{SET DECIMAL}).  Use DECIMAL=DOT or DECIMAL=COMMA
 875 to force a particular decimal point character.
 876
 877 @item CELLS=VALUES FORMAT=VARIABLE
 878 Writes variables using their print formats.  Leading and trailing
 879 spaces are removed from numeric values, and trailing spaces are
 880 removed from string values.
 881
 882 @item CELLS=LABEL FORMAT=PLAIN
 883 @itemx CELLS=LABEL FORMAT=VARIABLE
 884 Writes value labels where they exist, and otherwise writes the values
 885 themselves as described above.
 886 @end table
 887
 888 Regardless of CELLS and TEXTOPTIONS FORMAT, numeric system-missing
 889 values are output as a single space.
 890
 891 For TYPE=TAB, tab characters delimit values.  For TYPE=CSV, the
 892 TEXTOPTIONS DELIMITER and DECIMAL settings determine the character
 893 that separate values within a line.  If DELIMITER is specified, then
 894 the specified string separate values.  If DELIMITER is not specified,
 895 then the default is a comma with DECIMAL=DOT or a semicolon with
 896 DECIMAL=COMMA.  If DECIMAL is not given either, it is implied by the
 897 decimal point character set with SET DECIMAL (@pxref{SET DECIMAL}).
 898
 899 The TEXTOPTIONS QUALIFIER setting specifies a character that is output
 900 before and after a value that contains the delimiter character or the
 901 qualifier character.  The default is a double quote (@samp{"}).  A
 902 qualifier character that appears within a value is doubled.
 903
 904 @node SYSFILE INFO
 905 @section SYSFILE INFO
 906 @vindex SYSFILE INFO
 907
 908 @display
 909 SYSFILE INFO FILE='@var{file_name}' [ENCODING='@var{encoding}'].
 910 @end display
 911
 912 @cmd{SYSFILE INFO} reads the dictionary in an SPSS system file,
 913 SPSS/PC+ system file, or SPSS portable file, and displays the
 914 information in its dictionary.
 915
 916 Specify a file name or file handle.  @cmd{SYSFILE INFO} reads that
 917 file and displays information on its dictionary.
 918
 919 @pspp{} automatically detects the encoding of string data in the file,
 920 when possible.  The character encoding of old SPSS system files cannot
 921 always be guessed correctly, and SPSS/PC+ system files do not include
 922 any indication of their encoding.  Specify the @subcmd{ENCODING}
 923 subcommand with an @acronym{IANA} character set name as its string
 924 argument to override the default, or specify @code{ENCODING='DETECT'}
 925 to analyze and report possibly valid encodings for the system file.
 926 The @subcmd{ENCODING} subcommand is a @pspp{} extension.
 927
 928 @cmd{SYSFILE INFO} does not affect the current active dataset.
 929
 930 @node XEXPORT
 931 @section XEXPORT
 932 @vindex XEXPORT
 933
 934 @display
 935 XEXPORT
 936         /OUTFILE='@var{file_name}'
 937         /DIGITS=@var{n}
 938         /DROP=@var{var_list}
 939         /KEEP=@var{var_list}
 940         /RENAME=(@var{src_names}=@var{target_names})@dots{}
 941         /TYPE=@{COMM,TAPE@}
 942         /MAP
 943 @end display
 944
 945 The @cmd{EXPORT} transformation writes the active dataset dictionary and
 946 data to a specified portable file.
 947
 948 This transformation is a @pspp{} extension.
 949
 950 It is similar to the @cmd{EXPORT} procedure, with two differences:
 951
 952 @itemize
 953 @item
 954 @cmd{XEXPORT} is a transformation, not a procedure.  It is executed when
 955 the data is read by a procedure or procedure-like command.
 956
 957 @item
 958 @cmd{XEXPORT} does not support the @subcmd{UNSELECTED} subcommand.
 959 @end itemize
 960
 961 @xref{EXPORT}, for more information.
 962
 963 @node XSAVE
 964 @section XSAVE
 965 @vindex XSAVE
 966
 967 @display
 968 XSAVE
 969         /OUTFILE='@var{file_name}'
 970         /@{UNCOMPRESSED,COMPRESSED,ZCOMPRESSED@}
 971         /PERMISSIONS=@{WRITEABLE,READONLY@}
 972         /DROP=@var{var_list}
 973         /KEEP=@var{var_list}
 974         /VERSION=@var{version}
 975         /RENAME=(@var{src_names}=@var{target_names})@dots{}
 976         /NAMES
 977         /MAP
 978 @end display
 979
 980 The @cmd{XSAVE} transformation writes the active dataset's dictionary and
 981 data to a system file.  It is similar to the @cmd{SAVE}
 982 procedure, with two differences:
 983
 984 @itemize
 985 @item
 986 @cmd{XSAVE} is a transformation, not a procedure.  It is executed when
 987 the data is read by a procedure or procedure-like command.
 988
 989 @item
 990 @cmd{XSAVE} does not support the @subcmd{UNSELECTED} subcommand.
 991 @end itemize
 992
 993 @xref{SAVE}, for more information.