pintos-os.org Git - pspp/blob - files.texi

   1 @c PSPP - a program for statistical analysis.
   2 @c Copyright (C) 2017 Free Software Foundation, Inc.
   3 @c Permission is granted to copy, distribute and/or modify this document
   4 @c under the terms of the GNU Free Documentation License, Version 1.3
   5 @c or any later version published by the Free Software Foundation;
   6 @c with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
   7 @c A copy of the license is included in the section entitled "GNU
   8 @c Free Documentation License".
   9 @c
  10 @node System and Portable File IO
  11 @chapter System and Portable File I/O
  12
  13 The commands in this chapter read, write, and examine system files and
  14 portable files.
  15
  16 @menu
  17 * APPLY DICTIONARY::            Apply system file dictionary to active dataset.
  18 * EXPORT::                      Write to a portable file.
  19 * GET::                         Read from a system file.
  20 * GET DATA::                    Read from foreign files.
  21 * IMPORT::                      Read from a portable file.
  22 * SAVE::                        Write to a system file.
  23 * SAVE DATA COLLECTION::        Write to a system file and metadata file.
  24 * SAVE TRANSLATE::              Write data in foreign file formats.
  25 * SYSFILE INFO::                Display system file dictionary.
  26 * XEXPORT::                     Write to a portable file, as a transformation.
  27 * XSAVE::                       Write to a system file, as a transformation.
  28 @end menu
  29
  30 @node APPLY DICTIONARY
  31 @section APPLY DICTIONARY
  32 @vindex APPLY DICTIONARY
  33
  34 @display
  35 APPLY DICTIONARY FROM=@{'@var{file_name}',@var{file_handle}@}.
  36 @end display
  37
  38 @cmd{APPLY DICTIONARY} applies the variable labels, value labels,
  39 and missing values taken from a file to corresponding
  40 variables in the active dataset.  In some cases it also updates the
  41 weighting variable.
  42
  43 Specify a system file or portable file's name, a data set name
  44 (@pxref{Datasets}), or a file handle name (@pxref{File Handles}).  The
  45 dictionary in the file will be read, but it will not replace the
  46 active dataset's dictionary.  The file's data will not be read.
  47
  48 Only variables with names that exist in both the active dataset and the
  49 system file are considered.  Variables with the same name but different
  50 types (numeric, string) will cause an error message.  Otherwise, the
  51 system file variables' attributes will replace those in their matching
  52 active dataset variables:
  53
  54 @itemize @bullet
  55 @item
  56 If a system file variable has a variable label, then it will replace
  57 the variable label of the active dataset variable.  If the system
  58 file variable does not have a variable label, then the active dataset
  59 variable's variable label, if any, will be retained.
  60
  61 @item
  62 If the system file variable has custom attributes (@pxref{VARIABLE
  63 ATTRIBUTE}), then those attributes replace the active dataset variable's
  64 custom attributes.  If the system file variable does not have custom
  65 attributes, then the active dataset variable's custom attributes, if any,
  66 will be retained.
  67
  68 @item
  69 If the active dataset variable is numeric or short string, then value
  70 labels and missing values, if any, will be copied to the active dataset
  71 variable.  If the system file variable does not have value labels or
  72 missing values, then those in the active dataset variable, if any, will not
  73 be disturbed.
  74 @end itemize
  75
  76 In addition to properties of variables, some properties of the active
  77 file dictionary as a whole are updated:
  78
  79 @itemize @bullet
  80 @item
  81 If the system file has custom attributes (@pxref{DATAFILE ATTRIBUTE}),
  82 then those attributes replace the active dataset variable's custom
  83 attributes.
  84
  85 @item
  86 If the active dataset has a weighting variable (@pxref{WEIGHT}), and the
  87 system file does not, or if the weighting variable in the system file
  88 does not exist in the active dataset, then the active dataset weighting
  89 variable, if any, is retained.  Otherwise, the weighting variable in
  90 the system file becomes the active dataset weighting variable.
  91 @end itemize
  92
  93 @cmd{APPLY DICTIONARY} takes effect immediately.  It does not read the
  94 active dataset.  The system file is not modified.
  95
  96 @node EXPORT
  97 @section EXPORT
  98 @vindex EXPORT
  99
 100 @display
 101 EXPORT
 102         /OUTFILE='@var{file_name}'
 103         /UNSELECTED=@{RETAIN,DELETE@}
 104         /DIGITS=@var{n}
 105         /DROP=@var{var_list}
 106         /KEEP=@var{var_list}
 107         /RENAME=(@var{src_names}=@var{target_names})@dots{}
 108         /TYPE=@{COMM,TAPE@}
 109         /MAP
 110 @end display
 111
 112 The @cmd{EXPORT} procedure writes the active dataset's dictionary and
 113 data to a specified portable file.
 114
 115 By default, cases excluded with FILTER are written to the
 116 file.  These can be excluded by specifying DELETE on the @subcmd{UNSELECTED}
 117 subcommand.  Specifying RETAIN makes the default explicit.
 118
 119 Portable files express real numbers in base 30.  Integers are always
 120 expressed to the maximum precision needed to make them exact.
 121 Non-integers are, by default, expressed to the machine's maximum
 122 natural precision (approximately 15 decimal digits on many machines).
 123 If many numbers require this many digits, the portable file may
 124 significantly increase in size.  As an alternative, the @subcmd{DIGITS}
 125 subcommand may be used to specify the number of decimal digits of
 126 precision to write.  @subcmd{DIGITS} applies only to non-integers.
 127
 128 The @subcmd{OUTFILE} subcommand, which is the only required subcommand, specifies
 129 the portable file to be written as a file name string or
 130 a file handle (@pxref{File Handles}).
 131
 132 @subcmd{DROP}, @subcmd{KEEP}, and @subcmd{RENAME} follow the same format as the
 133 @subcmd{SAVE} procedure (@pxref{SAVE}).
 134
 135 The @subcmd{TYPE} subcommand specifies the character set for use in the
 136 portable file.  Its value is currently not used.
 137
 138 The @subcmd{MAP} subcommand is currently ignored.
 139
 140 @cmd{EXPORT} is a procedure.  It causes the active dataset to be read.
 141
 142 @node GET
 143 @section GET
 144 @vindex GET
 145
 146 @display
 147 GET
 148         /FILE=@{'@var{file_name}',@var{file_handle}@}
 149         /DROP=@var{var_list}
 150         /KEEP=@var{var_list}
 151         /RENAME=(@var{src_names}=@var{target_names})@dots{}
 152         /ENCODING='@var{encoding}'
 153 @end display
 154
 155 @cmd{GET} clears the current dictionary and active dataset and
 156 replaces them with the dictionary and data from a specified file.
 157
 158 The @subcmd{FILE} subcommand is the only required subcommand.  Specify
 159 the SPSS system file, SPSS/PC+ system file, or SPSS portable file to
 160 be read as a string file name or a file handle (@pxref{File Handles}).
 161
 162 By default, all the variables in a file are read.  The DROP
 163 subcommand can be used to specify a list of variables that are not to be
 164 read.  By contrast, the @subcmd{KEEP} subcommand can be used to specify
 165 variable that are to be read, with all other variables not read.
 166
 167 Normally variables in a file retain the names that they were
 168 saved under.  Use the @subcmd{RENAME} subcommand to change these names.
 169 Specify,
 170 within parentheses, a list of variable names followed by an equals sign
 171 (@samp{=}) and the names that they should be renamed to.  Multiple
 172 parenthesized groups of variable names can be included on a single
 173 @subcmd{RENAME} subcommand.
 174 Variables' names may be swapped using a @subcmd{RENAME}
 175 subcommand of the form @subcmd{/RENAME=(@var{A} @var{B}=@var{B} @var{A})}.
 176
 177 Alternate syntax for the @subcmd{RENAME} subcommand allows the parentheses to be
 178 eliminated.  When this is done, only a single variable may be renamed at
 179 once.  For instance, @subcmd{/RENAME=@var{A}=@var{B}}.  This alternate syntax is
 180 deprecated.
 181
 182 @subcmd{DROP}, @subcmd{KEEP}, and @subcmd{RENAME} are executed in left-to-right order.
 183 Each may be present any number of times.  @cmd{GET} never modifies a
 184 file on disk.  Only the active dataset read from the file
 185 is affected by these subcommands.
 186
 187 @pspp{} automatically detects the encoding of string data in the file,
 188 when possible.  The character encoding of old SPSS system files cannot
 189 always be guessed correctly, and SPSS/PC+ system files do not include
 190 any indication of their encoding.  Specify the @subcmd{ENCODING}
 191 subcommand with an @acronym{IANA} character set name as its string
 192 argument to override the default.  Use @cmd{SYSFILE INFO} to analyze
 193 the encodings that might be valid for a system file.  The
 194 @subcmd{ENCODING} subcommand is a @pspp{} extension.
 195
 196 @cmd{GET} does not cause the data to be read, only the dictionary.  The data
 197 is read later, when a procedure is executed.
 198
 199 Use of @cmd{GET} to read a portable file is a @pspp{} extension.
 200
 201 @node GET DATA
 202 @section GET DATA
 203 @vindex GET DATA
 204
 205 @display
 206 GET DATA
 207         /TYPE=@{GNM,ODS,PSQL,TXT@}
 208         @dots{}additional subcommands depending on TYPE@dots{}
 209 @end display
 210
 211 The @cmd{GET DATA} command is used to read files and other data
 212 sources created by other applications.  When this command is executed,
 213 the current dictionary and active dataset are replaced with variables
 214 and data read from the specified source.
 215
 216 The @subcmd{TYPE} subcommand is mandatory and must be the first subcommand
 217 specified.  It determines the type of the file or source to read.
 218 @pspp{} currently supports the following file types:
 219
 220 @table @asis
 221 @item GNM
 222 Spreadsheet files created by Gnumeric (@url{http://gnumeric.org}).
 223
 224 @item ODS
 225 Spreadsheet files in OpenDocument format (@url{http://opendocumentformat.org}).
 226
 227 @item PSQL
 228 Relations from PostgreSQL databases (@url{http://postgresql.org}).
 229
 230 @item TXT
 231 Textual data files in columnar and delimited formats.
 232 @end table
 233
 234 Each supported file type has additional subcommands, explained in
 235 separate sections below.
 236
 237 @menu
 238 * GET DATA /TYPE=GNM/ODS::     Spreadsheets
 239 * GET DATA /TYPE=PSQL::        Databases
 240 * GET DATA /TYPE=TXT::         Delimited Text Files
 241 @end menu
 242
 243 @node GET DATA /TYPE=GNM/ODS
 244 @subsection Spreadsheet Files
 245
 246 @display
 247 GET DATA /TYPE=@{GNM, ODS@}
 248         /FILE=@{'@var{file_name}'@}
 249         /SHEET=@{NAME '@var{sheet_name}', INDEX @var{n}@}
 250         /CELLRANGE=@{RANGE '@var{range}', FULL@}
 251         /READNAMES=@{ON, OFF@}
 252         /ASSUMEDSTRWIDTH=@var{n}.
 253 @end display
 254
 255 @cindex Gnumeric
 256 @cindex OpenDocument
 257 @cindex spreadsheet files
 258
 259 Gnumeric spreadsheets (@url{http://gnumeric.org}), and spreadsheets
 260 in OpenDocument format
 261 (@url{http://libreplanet.org/wiki/Group:OpenDocument/Software})
 262 can be read using the @cmd{GET DATA} command.
 263 Use the @subcmd{TYPE} subcommand to indicate the file's format.
 264 /TYPE=GNM indicates Gnumeric files,
 265 /TYPE=ODS indicates OpenDocument.
 266 The @subcmd{FILE} subcommand is mandatory.
 267 Use it to specify the name file to be read.
 268 All other subcommands are optional.
 269
 270 The format of each variable is determined by the format of the spreadsheet
 271 cell containing the first datum for the variable.
 272 If this cell is of string (text) format, then the width of the variable is
 273 determined from the length of the string it contains, unless the
 274 @subcmd{ASSUMEDSTRWIDTH} subcommand is given.
 275
 276 The @subcmd{SHEET} subcommand specifies the sheet within the spreadsheet file to read.
 277 There are two forms of the @subcmd{SHEET} subcommand.
 278 In the first form,
 279 @subcmd{/SHEET=name @var{sheet_name}}, the string @var{sheet_name} is the
 280 name of the sheet to read.
 281 In the second form, @subcmd{/SHEET=index @var{idx}}, @var{idx} is a
 282 integer which is the index of the sheet to read.
 283 The first sheet has the index 1.
 284 If the @subcmd{SHEET} subcommand is omitted, then the command will read the
 285 first sheet in the file.
 286
 287 The @subcmd{CELLRANGE} subcommand specifies the range of cells within the sheet to read.
 288 If the subcommand is given as @subcmd{/CELLRANGE=FULL}, then the entire
 289 sheet  is read.
 290 To read only part of a sheet, use the form
 291 @subcmd{/CELLRANGE=range '@var{top_left_cell}:@var{bottom_right_cell}'}.
 292 For example, the subcommand @subcmd{/CELLRANGE=range 'C3:P19'} reads
 293 columns C--P, and rows 3--19 inclusive.
 294 If no @subcmd{CELLRANGE} subcommand is given, then the entire sheet is read.
 295
 296 If @subcmd{/READNAMES=ON} is specified, then the contents of cells of
 297 the first row are used as the names of the variables in which to store
 298 the data from subsequent rows.  This is the default.
 299 If @subcmd{/READNAMES=OFF} is
 300 used, then the variables  receive automatically assigned names.
 301
 302 The @subcmd{ASSUMEDSTRWIDTH} subcommand specifies the maximum width of string
 303 variables read  from the file.
 304 If omitted, the default value is determined from the length of the
 305 string in the first spreadsheet cell for each variable.
 306
 307
 308 @node GET DATA /TYPE=PSQL
 309 @subsection Postgres Database Queries
 310
 311 @display
 312 GET DATA /TYPE=PSQL
 313          /CONNECT=@{@var{connection info}@}
 314          /SQL=@{@var{query}@}
 315          [/ASSUMEDSTRWIDTH=@var{w}]
 316          [/UNENCRYPTED]
 317          [/BSIZE=@var{n}].
 318 @end display
 319
 320 @cindex postgres
 321 @cindex databases
 322
 323 The PSQL type is used to import data from a postgres database server.
 324 The server may be located locally or remotely.
 325 Variables are automatically created based on the table column names
 326 or the names specified in the SQL query.
 327 Postgres data types of high precision, will loose precision when
 328 imported into @pspp{}.
 329 Not all the postgres data types are able to be represented in @pspp{}.
 330 If a datum cannot be represented a warning will be issued and that
 331 datum will be set to SYSMIS.
 332
 333 The @subcmd{CONNECT} subcommand is mandatory.
 334 It is a string specifying the parameters of the database server from
 335 which the data should be fetched.
 336 The format of the string is given in the postgres manual
 337 @url{http://www.postgresql.org/docs/8.0/static/libpq.html#LIBPQ-CONNECT}.
 338
 339 The @subcmd{SQL} subcommand is mandatory.
 340 It must be a valid SQL string to retrieve data from the database.
 341
 342 The @subcmd{ASSUMEDSTRWIDTH} subcommand specifies the maximum width of string
 343 variables read  from the database.
 344 If omitted, the default value is determined from the length of the
 345 string in the first value read for each variable.
 346
 347 The @subcmd{UNENCRYPTED} subcommand allows data to be retrieved over an insecure
 348 connection.
 349 If the connection is not encrypted, and the @subcmd{UNENCRYPTED} subcommand is
 350 not given, then an error will occur.
 351 Whether or not the connection is
 352 encrypted depends upon the underlying psql library and the
 353 capabilities of the database server.
 354
 355 The @subcmd{BSIZE} subcommand serves only to optimise the speed of data transfer.
 356 It specifies an upper limit on
 357 number of cases to fetch from the database at once.
 358 The default value is 4096.
 359 If your SQL statement fetches a large number of cases but only a small number of
 360 variables, then the data transfer may be faster if you increase this value.
 361 Conversely, if the number of variables is large, or if the machine on which
 362 @pspp{} is running has only a
 363 small amount of memory, then a smaller value will be better.
 364
 365
 366 The following syntax is an example:
 367 @example
 368 GET DATA /TYPE=PSQL
 369      /CONNECT='host=example.com port=5432 dbname=product user=fred passwd=xxxx'
 370      /SQL='select * from manufacturer'.
 371 @end example
 372
 373
 374 @node GET DATA /TYPE=TXT
 375 @subsection Textual Data Files
 376
 377 @display
 378 GET DATA /TYPE=TXT
 379         /FILE=@{'@var{file_name}',@var{file_handle}@}
 380         [ENCODING='@var{encoding}']
 381         [/ARRANGEMENT=@{DELIMITED,FIXED@}]
 382         [/FIRSTCASE=@{@var{first_case}@}]
 383         [/IMPORTCASES=...]
 384         @dots{}additional subcommands depending on ARRANGEMENT@dots{}
 385 @end display
 386
 387 @cindex text files
 388 @cindex data files
 389 When TYPE=TXT is specified, GET DATA reads data in a delimited or
 390 fixed columnar format, much like DATA LIST (@pxref{DATA LIST}).
 391
 392 The @subcmd{FILE} subcommand is mandatory.  Specify the file to be read as
 393 a string file name or (for textual data only) a
 394 file handle (@pxref{File Handles}).
 395
 396 The @subcmd{ENCODING} subcommand specifies the character encoding of
 397 the file to be read.  @xref{INSERT}, for information on supported
 398 encodings.
 399
 400 The @subcmd{ARRANGEMENT} subcommand determines the file's basic format.
 401 DELIMITED, the default setting, specifies that fields in the input
 402 data are separated by spaces, tabs, or other user-specified
 403 delimiters.  FIXED specifies that fields in the input data appear at
 404 particular fixed column positions within records of a case.
 405
 406 By default, cases are read from the input file starting from the first
 407 line.  To skip lines at the beginning of an input file, set @subcmd{FIRSTCASE}
 408 to the number of the first line to read: 2 to skip the first line, 3
 409 to skip the first two lines, and so on.
 410
 411 @subcmd{IMPORTCASES} is ignored, for compatibility.  Use @cmd{N OF
 412 CASES} to limit the number of cases read from a file (@pxref{N OF
 413 CASES}), or @cmd{SAMPLE} to obtain a random sample of cases
 414 (@pxref{SAMPLE}).
 415
 416 The remaining subcommands apply only to one of the two file
 417 arrangements, described below.
 418
 419 @menu
 420 * GET DATA /TYPE=TXT /ARRANGEMENT=DELIMITED::
 421 * GET DATA /TYPE=TXT /ARRANGEMENT=FIXED::
 422 @end menu
 423
 424 @node GET DATA /TYPE=TXT /ARRANGEMENT=DELIMITED
 425 @subsubsection Reading Delimited Data
 426
 427 @display
 428 GET DATA /TYPE=TXT
 429         /FILE=@{'@var{file_name}',@var{file_handle}@}
 430         [/ARRANGEMENT=@{DELIMITED,FIXED@}]
 431         [/FIRSTCASE=@{@var{first_case}@}]
 432         [/IMPORTCASE=@{ALL,FIRST @var{max_cases},PERCENT @var{percent}@}]
 433
 434         /DELIMITERS="@var{delimiters}"
 435         [/QUALIFIER="@var{quotes}"
 436         [/DELCASE=@{LINE,VARIABLES @var{n_variables}@}]
 437         /VARIABLES=@var{del_var1} [@var{del_var2}]@dots{}
 438 where each @var{del_var} takes the form:
 439         variable format
 440 @end display
 441
 442 The GET DATA command with TYPE=TXT and ARRANGEMENT=DELIMITED reads
 443 input data from text files in delimited format, where fields are
 444 separated by a set of user-specified delimiters.  Its capabilities are
 445 similar to those of DATA LIST FREE (@pxref{DATA LIST FREE}), with a
 446 few enhancements.
 447
 448 The required @subcmd{FILE} subcommand and optional @subcmd{FIRSTCASE} and @subcmd{IMPORTCASE}
 449 subcommands are described above (@pxref{GET DATA /TYPE=TXT}).
 450
 451 @subcmd{DELIMITERS}, which is required, specifies the set of characters that
 452 may separate fields.  Each character in the string specified on
 453 @subcmd{DELIMITERS} separates one field from the next.  The end of a line also
 454 separates fields, regardless of @subcmd{DELIMITERS}.  Two consecutive
 455 delimiters in the input yield an empty field, as does a delimiter at
 456 the end of a line.  A space character as a delimiter is an exception:
 457 consecutive spaces do not yield an empty field and neither does any
 458 number of spaces at the end of a line.
 459
 460 To use a tab as a delimiter, specify @samp{\t} at the beginning of the
 461 @subcmd{DELIMITERS} string.  To use a backslash as a delimiter, specify
 462 @samp{\\} as the first delimiter or, if a tab should also be a
 463 delimiter, immediately following @samp{\t}.  To read a data file in
 464 which each field appears on a separate line, specify the empty string
 465 for @subcmd{DELIMITERS}.
 466
 467 The optional @subcmd{QUALIFIER} subcommand names one or more characters that
 468 can be used to quote values within fields in the input.  A field that
 469 begins with one of the specified quote characters ends at the next
 470 matching quote.  Intervening delimiters become part of the field,
 471 instead of terminating it.  The ability to specify more than one quote
 472 character is a @pspp{} extension.
 473
 474 The character specified on @subcmd{QUALIFIER} can be embedded within a
 475 field that it quotes by doubling the qualifier.  For example, if
 476 @samp{'} is specified on @subcmd{QUALIFIER}, then @code{'a''b'}
 477 specifies a field that contains @samp{a'b}.
 478
 479 The @subcmd{DELCASE} subcommand controls how data may be broken across lines in
 480 the data file.  With LINE, the default setting, each line must contain
 481 all the data for exactly one case.  For additional flexibility, to
 482 allow a single case to be split among lines or multiple cases to be
 483 contained on a single line, specify VARIABLES @i{n_variables}, where
 484 @i{n_variables} is the number of variables per case.
 485
 486 The @subcmd{VARIABLES} subcommand is required and must be the last subcommand.
 487 Specify the name of each variable and its input format (@pxref{Input
 488 and Output Formats}) in the order they should be read from the input
 489 file.
 490
 491 @subsubheading Examples
 492
 493 @noindent
 494 On a Unix-like system, the @samp{/etc/passwd} file has a format
 495 similar to this:
 496
 497 @example
 498 root:$1$nyeSP5gD$pDq/:0:0:,,,:/root:/bin/bash
 499 blp:$1$BrP/pFg4$g7OG:1000:1000:Ben Pfaff,,,:/home/blp:/bin/bash
 500 john:$1$JBuq/Fioq$g4A:1001:1001:John Darrington,,,:/home/john:/bin/bash
 501 jhs:$1$D3li4hPL$88X1:1002:1002:Jason Stover,,,:/home/jhs:/bin/csh
 502 @end example
 503
 504 @noindent
 505 The following syntax reads a file in the format used by
 506 @samp{/etc/passwd}:
 507
 508 @c If you change this example, change the regression test in
 509 @c tests/language/data-io/get-data.at to match.
 510 @example
 511 GET DATA /TYPE=TXT /FILE='/etc/passwd' /DELIMITERS=':'
 512         /VARIABLES=username A20
 513                    password A40
 514                    uid F10
 515                    gid F10
 516                    gecos A40
 517                    home A40
 518                    shell A40.
 519 @end example
 520
 521 @noindent
 522 Consider the following data on used cars:
 523
 524 @example
 525 model   year    mileage price   type    age
 526 Civic   2002    29883   15900   Si      2
 527 Civic   2003    13415   15900   EX      1
 528 Civic   1992    107000  3800    n/a     12
 529 Accord  2002    26613   17900   EX      1
 530 @end example
 531
 532 @noindent
 533 The following syntax can be used to read the used car data:
 534
 535 @c If you change this example, change the regression test in
 536 @c tests/language/data-io/get-data.at to match.
 537 @example
 538 GET DATA /TYPE=TXT /FILE='cars.data' /DELIMITERS=' ' /FIRSTCASE=2
 539         /VARIABLES=model A8
 540                    year F4
 541                    mileage F6
 542                    price F5
 543                    type A4
 544                    age F2.
 545 @end example
 546
 547 @noindent
 548 Consider the following information on animals in a pet store:
 549
 550 @example
 551 'Pet''s Name', "Age", "Color", "Date Received", "Price", "Height", "Type"
 552 , (Years), , , (Dollars), ,
 553 "Rover", 4.5, Brown, "12 Feb 2004", 80, '1''4"', "Dog"
 554 "Charlie", , Gold, "5 Apr 2007", 12.3, "3""", "Fish"
 555 "Molly", 2, Black, "12 Dec 2006", 25, '5"', "Cat"
 556 "Gilly", , White, "10 Apr 2007", 10, "3""", "Guinea Pig"
 557 @end example
 558
 559 @noindent
 560 The following syntax can be used to read the pet store data:
 561
 562 @c If you change this example, change the regression test in
 563 @c tests/language/data-io/get-data.at to match.
 564 @example
 565 GET DATA /TYPE=TXT /FILE='pets.data' /DELIMITERS=', ' /QUALIFIER='''"' /ESCAPE
 566         /FIRSTCASE=3
 567         /VARIABLES=name A10
 568                    age F3.1
 569                    color A5
 570                    received EDATE10
 571                    price F5.2
 572                    height a5
 573                    type a10.
 574 @end example
 575
 576 @node GET DATA /TYPE=TXT /ARRANGEMENT=FIXED
 577 @subsubsection Reading Fixed Columnar Data
 578
 579 @c (modify-syntax-entry ?_ "w")
 580 @c (modify-syntax-entry ?' "'")
 581 @c (modify-syntax-entry ?@ "'")
 582
 583 @display
 584 GET DATA /TYPE=TXT
 585         /FILE=@{'file_name',@var{file_handle}@}
 586         [/ARRANGEMENT=@{DELIMITED,FIXED@}]
 587         [/FIRSTCASE=@{@var{first_case}@}]
 588         [/IMPORTCASE=@{ALL,FIRST @var{max_cases},PERCENT @var{percent}@}]
 589
 590         [/FIXCASE=@var{n}]
 591         /VARIABLES @var{fixed_var} [@var{fixed_var}]@dots{}
 592             [/rec# @var{fixed_var} [@var{fixed_var}]@dots{}]@dots{}
 593 where each @var{fixed_var} takes the form:
 594         @var{variable} @var{start}-@var{end} @var{format}
 595 @end display
 596
 597 The @cmd{GET DATA} command with TYPE=TXT and ARRANGEMENT=FIXED reads input
 598 data from text files in fixed format, where each field is located in
 599 particular fixed column positions within records of a case.  Its
 600 capabilities are similar to those of DATA LIST FIXED (@pxref{DATA LIST
 601 FIXED}), with a few enhancements.
 602
 603 The required @subcmd{FILE} subcommand and optional @subcmd{FIRSTCASE} and @subcmd{IMPORTCASE}
 604 subcommands are described above (@pxref{GET DATA /TYPE=TXT}).
 605
 606 The optional @subcmd{FIXCASE} subcommand may be used to specify the positive
 607 integer number of input lines that make up each case.  The default
 608 value is 1.
 609
 610 The @subcmd{VARIABLES} subcommand, which is required, specifies the positions
 611 at which each variable can be found.  For each variable, specify its
 612 name, followed by its start and end column separated by @samp{-}
 613 (e.g.@: @samp{0-9}), followed by an input format type (e.g.@:
 614 @samp{F}) or a full format specification (e.g.@: @samp{DOLLAR12.2}).
 615 For this command, columns are numbered starting from 0 at
 616 the left column.  Introduce the variables in the second and later
 617 lines of a case by a slash followed by the number of the line within
 618 the case, e.g.@: @samp{/2} for the second line.
 619
 620 @subsubheading Examples
 621
 622 @noindent
 623 Consider the following data on used cars:
 624
 625 @example
 626 model   year    mileage price   type    age
 627 Civic   2002    29883   15900   Si      2
 628 Civic   2003    13415   15900   EX      1
 629 Civic   1992    107000  3800    n/a     12
 630 Accord  2002    26613   17900   EX      1
 631 @end example
 632
 633 @noindent
 634 The following syntax can be used to read the used car data:
 635
 636 @c If you change this example, change the regression test in
 637 @c tests/language/data-io/get-data.at to match.
 638 @example
 639 GET DATA /TYPE=TXT /FILE='cars.data' /ARRANGEMENT=FIXED /FIRSTCASE=2
 640         /VARIABLES=model 0-7 A
 641                    year 8-15 F
 642                    mileage 16-23 F
 643                    price 24-31 F
 644                    type 32-40 A
 645                    age 40-47 F.
 646 @end example
 647
 648 @node IMPORT
 649 @section IMPORT
 650 @vindex IMPORT
 651
 652 @display
 653 IMPORT
 654         /FILE='@var{file_name}'
 655         /TYPE=@{COMM,TAPE@}
 656         /DROP=@var{var_list}
 657         /KEEP=@var{var_list}
 658         /RENAME=(@var{src_names}=@var{target_names})@dots{}
 659 @end display
 660
 661 The @cmd{IMPORT} transformation clears the active dataset dictionary and
 662 data and
 663 replaces them with a dictionary and data from a system file or
 664 portable file.
 665
 666 The @subcmd{FILE} subcommand, which is the only required subcommand, specifies
 667 the portable file to be read as a file name string or a file handle
 668 (@pxref{File Handles}).
 669
 670 The @subcmd{TYPE} subcommand is currently not used.
 671
 672 @subcmd{DROP}, @subcmd{KEEP}, and @subcmd{RENAME} follow the syntax used by @cmd{GET} (@pxref{GET}).
 673
 674 @cmd{IMPORT} does not cause the data to be read; only the dictionary.  The
 675 data is read later, when a procedure is executed.
 676
 677 Use of @cmd{IMPORT} to read a system file is a @pspp{} extension.
 678
 679 @node SAVE
 680 @section SAVE
 681 @vindex SAVE
 682
 683 @display
 684 SAVE
 685         /OUTFILE=@{'@var{file_name}',@var{file_handle}@}
 686         /UNSELECTED=@{RETAIN,DELETE@}
 687         /@{UNCOMPRESSED,COMPRESSED,ZCOMPRESSED@}
 688         /PERMISSIONS=@{WRITEABLE,READONLY@}
 689         /DROP=@var{var_list}
 690         /KEEP=@var{var_list}
 691         /VERSION=@var{version}
 692         /RENAME=(@var{src_names}=@var{target_names})@dots{}
 693         /NAMES
 694         /MAP
 695 @end display
 696
 697 The @cmd{SAVE} procedure causes the dictionary and data in the active
 698 dataset to
 699 be written to a system file.
 700
 701 OUTFILE is the only required subcommand.  Specify the system file
 702 to be written as a string file name or a file handle
 703 (@pxref{File Handles}).
 704
 705 By default, cases excluded with FILTER are written to the system file.
 706 These can be excluded by specifying @subcmd{DELETE} on the @subcmd{UNSELECTED}
 707 subcommand.  Specifying @subcmd{RETAIN} makes the default explicit.
 708
 709 The @subcmd{UNCOMPRESSED}, @subcmd{COMPRESSED}, and
 710 @subcmd{ZCOMPRESSED} subcommand determine the system file's
 711 compression level:
 712
 713 @table @code
 714 @item UNCOMPRESSED
 715 Data is not compressed.  Each numeric value uses 8 bytes of disk
 716 space.  Each string value uses one byte per column width, rounded up
 717 to a multiple of 8 bytes.
 718
 719 @item COMPRESSED
 720 Data is compressed with a simple algorithm.  Each integer numeric
 721 value between @minus{}99 and 151, inclusive, or system missing value
 722 uses one byte of disk space.  Each 8-byte segment of a string that
 723 consists only of spaces uses 1 byte.  Any other numeric value or
 724 8-byte string segment uses 9 bytes of disk space.
 725
 726 @item ZCOMPRESSED
 727 Data is compressed with the ``deflate'' compression algorithm
 728 specified in RFC@tie{}1951 (the same algorithm used by
 729 @command{gzip}).  Files written with this compression level cannot be
 730 read by PSPP 0.8.1 or earlier or by SPSS 20 or earlier.
 731 @end table
 732
 733 @subcmd{COMPRESSED} is the default compression level.  The SET command
 734 (@pxref{SET}) can change this default.
 735
 736 The @subcmd{PERMISSIONS} subcommand specifies permissions for the new system
 737 file.  WRITEABLE, the default, creates the file with read and write
 738 permission.  READONLY creates the file for read-only access.
 739
 740 By default, all the variables in the active dataset dictionary are written
 741 to the system file.  The @subcmd{DROP} subcommand can be used to specify a list
 742 of variables not to be written.  In contrast, KEEP specifies variables
 743 to be written, with all variables not specified not written.
 744
 745 Normally variables are saved to a system file under the same names they
 746 have in the active dataset.  Use the @subcmd{RENAME} subcommand to change these names.
 747 Specify, within parentheses, a list of variable names followed by an
 748 equals sign (@samp{=}) and the names that they should be renamed to.
 749 Multiple parenthesized groups of variable names can be included on a
 750 single @subcmd{RENAME} subcommand.  Variables' names may be swapped using a
 751 @subcmd{RENAME} subcommand of the
 752 form @subcmd{/RENAME=(@var{A} @var{B}=@var{B} @var{A})}.
 753
 754 Alternate syntax for the @subcmd{RENAME} subcommand allows the parentheses to be
 755 eliminated.  When this is done, only a single variable may be renamed at
 756 once.  For instance, @subcmd{/RENAME=@var{A}=@var{B}}.  This alternate syntax is
 757 deprecated.
 758
 759 @subcmd{DROP}, @subcmd{KEEP}, and @subcmd{RENAME} are performed in
 760 left-to-right order.  They
 761 each may be present any number of times.  @cmd{SAVE} never modifies
 762 the active dataset.  @subcmd{DROP}, @subcmd{KEEP}, and @subcmd{RENAME} only
 763 affect the system file written to disk.
 764
 765 The @subcmd{VERSION} subcommand specifies the version of the file format. Valid
 766 versions are 2 and 3.  The default version is 3.  In version 2 system
 767 files, variable names longer than 8 bytes will be truncated.  The two
 768 versions are otherwise identical.
 769
 770 The @subcmd{NAMES} and @subcmd{MAP} subcommands are currently ignored.
 771
 772 @cmd{SAVE} causes the data to be read.  It is a procedure.
 773
 774 @node SAVE DATA COLLECTION
 775 @section SAVE DATA COLLECTION
 776 @vindex SAVE DATA COLLECTION
 777
 778 @display
 779 SAVE DATA COLLECTION
 780         /OUTFILE=@{'@var{file_name}',@var{file_handle}@}
 781         /METADATA=@{'@var{file_name}',@var{file_handle}@}
 782         /@{UNCOMPRESSED,COMPRESSED,ZCOMPRESSED@}
 783         /PERMISSIONS=@{WRITEABLE,READONLY@}
 784         /DROP=@var{var_list}
 785         /KEEP=@var{var_list}
 786         /VERSION=@var{version}
 787         /RENAME=(@var{src_names}=@var{target_names})@dots{}
 788         /NAMES
 789         /MAP
 790 @end display
 791
 792 Like @cmd{SAVE}, @cmd{SAVE DATA COLLECTION} writes the dictionary and
 793 data in the active dataset to a system file.  In addition, it writes
 794 metadata to an additional XML metadata file.
 795
 796 OUTFILE is required.  Specify the system file to be written as a
 797 string file name or a file handle (@pxref{File Handles}).
 798
 799 METADATA is also required.  Specify the metadata file to be written as
 800 a string file name or a file handle.  Metadata files customarily use a
 801 @file{.mdd} extension.
 802
 803 The current implementation of this command is experimental.  It only
 804 outputs an approximation of the metadata file format.  Please report
 805 bugs.
 806
 807 Other subcommands are optional.  They have the same meanings as in the
 808 @cmd{SAVE} command.
 809
 810 @cmd{SAVE DATA COLLECTION} causes the data to be read.  It is a
 811 procedure.
 812
 813 @node SAVE TRANSLATE
 814 @section SAVE TRANSLATE
 815 @vindex SAVE TRANSLATE
 816
 817 @display
 818 SAVE TRANSLATE
 819         /OUTFILE=@{'@var{file_name}',@var{file_handle}@}
 820         /TYPE=@{CSV,TAB@}
 821         [/REPLACE]
 822         [/MISSING=@{IGNORE,RECODE@}]
 823
 824         [/DROP=@var{var_list}]
 825         [/KEEP=@var{var_list}]
 826         [/RENAME=(@var{src_names}=@var{target_names})@dots{}]
 827         [/UNSELECTED=@{RETAIN,DELETE@}]
 828         [/MAP]
 829
 830         @dots{}additional subcommands depending on TYPE@dots{}
 831 @end display
 832
 833 The @cmd{SAVE TRANSLATE} command is used to save data into various
 834 formats understood by other applications.
 835
 836 The @subcmd{OUTFILE} and @subcmd{TYPE} subcommands are mandatory.
 837 @subcmd{OUTFILE} specifies the file to be written, as a string file name or a file handle
 838 (@pxref{File Handles}).  @subcmd{TYPE} determines the type of the file or
 839 source to read.  It must be one of the following:
 840
 841 @table @asis
 842 @item CSV
 843 Comma-separated value format,
 844
 845 @item TAB
 846 Tab-delimited format.
 847 @end table
 848
 849 By default, @cmd{SAVE TRANSLATE} will not overwrite an existing file.  Use
 850 @subcmd{REPLACE} to force an existing file to be overwritten.
 851
 852 With MISSING=IGNORE, the default, @subcmd{SAVE TRANSLATE} treats user-missing
 853 values as if they were not missing.  Specify MISSING=RECODE to output
 854 numeric user-missing values like system-missing values and string
 855 user-missing values as all spaces.
 856
 857 By default, all the variables in the active dataset dictionary are
 858 saved to the system file, but @subcmd{DROP} or @subcmd{KEEP} can
 859 select a subset of variable to save.  The @subcmd{RENAME} subcommand
 860 can also be used to change the names under which variables are saved;
 861 because they are used only in the output, these names do not have to
 862 conform to the usual PSPP variable naming rules.  @subcmd{UNSELECTED}
 863 determines whether cases filtered out by the @cmd{FILTER} command are
 864 written to the output file.  These subcommands have the same syntax
 865 and meaning as on the @cmd{SAVE} command (@pxref{SAVE}).
 866
 867 Each supported file type has additional subcommands, explained in
 868 separate sections below.
 869
 870 @cmd{SAVE TRANSLATE} causes the data to be read.  It is a procedure.
 871
 872 @menu
 873 * SAVE TRANSLATE /TYPE=CSV and TYPE=TAB::
 874 @end menu
 875
 876 @node SAVE TRANSLATE /TYPE=CSV and TYPE=TAB
 877 @subsection Writing Comma- and Tab-Separated Data Files
 878
 879 @display
 880 SAVE TRANSLATE
 881         /OUTFILE=@{'@var{file_name}',@var{file_handle}@}
 882         /TYPE=CSV
 883         [/REPLACE]
 884         [/MISSING=@{IGNORE,RECODE@}]
 885
 886         [/DROP=@var{var_list}]
 887         [/KEEP=@var{var_list}]
 888         [/RENAME=(@var{src_names}=@var{target_names})@dots{}]
 889         [/UNSELECTED=@{RETAIN,DELETE@}]
 890
 891         [/FIELDNAMES]
 892         [/CELLS=@{VALUES,LABELS@}]
 893         [/TEXTOPTIONS DELIMITER='@var{delimiter}']
 894         [/TEXTOPTIONS QUALIFIER='@var{qualifier}']
 895         [/TEXTOPTIONS DECIMAL=@{DOT,COMMA@}]
 896         [/TEXTOPTIONS FORMAT=@{PLAIN,VARIABLE@}]
 897 @end display
 898
 899 The SAVE TRANSLATE command with TYPE=CSV or TYPE=TAB writes data in a
 900 comma- or tab-separated value format similar to that described by
 901 RFC@tie{}4180.  Each variable becomes one output column, and each case
 902 becomes one line of output.  If FIELDNAMES is specified, an additional
 903 line at the top of the output file lists variable names.
 904
 905 The CELLS and TEXTOPTIONS FORMAT settings determine how values are
 906 written to the output file:
 907
 908 @table @asis
 909 @item CELLS=VALUES FORMAT=PLAIN (the default settings)
 910 Writes variables to the output in ``plain'' formats that ignore the
 911 details of variable formats.  Numeric values are written as plain
 912 decimal numbers with enough digits to indicate their exact values in
 913 machine representation.  Numeric values include @samp{e} followed by
 914 an exponent if the exponent value would be less than -4 or greater
 915 than 16.  Dates are written in MM/DD/YYYY format and times in HH:MM:SS
 916 format.  WKDAY and MONTH values are written as decimal numbers.
 917
 918 Numeric values use, by default, the decimal point character set with
 919 SET DECIMAL (@pxref{SET DECIMAL}).  Use DECIMAL=DOT or DECIMAL=COMMA
 920 to force a particular decimal point character.
 921
 922 @item CELLS=VALUES FORMAT=VARIABLE
 923 Writes variables using their print formats.  Leading and trailing
 924 spaces are removed from numeric values, and trailing spaces are
 925 removed from string values.
 926
 927 @item CELLS=LABEL FORMAT=PLAIN
 928 @itemx CELLS=LABEL FORMAT=VARIABLE
 929 Writes value labels where they exist, and otherwise writes the values
 930 themselves as described above.
 931 @end table
 932
 933 Regardless of CELLS and TEXTOPTIONS FORMAT, numeric system-missing
 934 values are output as a single space.
 935
 936 For TYPE=TAB, tab characters delimit values.  For TYPE=CSV, the
 937 TEXTOPTIONS DELIMITER and DECIMAL settings determine the character
 938 that separate values within a line.  If DELIMITER is specified, then
 939 the specified string separate values.  If DELIMITER is not specified,
 940 then the default is a comma with DECIMAL=DOT or a semicolon with
 941 DECIMAL=COMMA.  If DECIMAL is not given either, it is implied by the
 942 decimal point character set with SET DECIMAL (@pxref{SET DECIMAL}).
 943
 944 The TEXTOPTIONS QUALIFIER setting specifies a character that is output
 945 before and after a value that contains the delimiter character or the
 946 qualifier character.  The default is a double quote (@samp{"}).  A
 947 qualifier character that appears within a value is doubled.
 948
 949 @node SYSFILE INFO
 950 @section SYSFILE INFO
 951 @vindex SYSFILE INFO
 952
 953 @display
 954 SYSFILE INFO FILE='@var{file_name}' [ENCODING='@var{encoding}'].
 955 @end display
 956
 957 @cmd{SYSFILE INFO} reads the dictionary in an SPSS system file,
 958 SPSS/PC+ system file, or SPSS portable file, and displays the
 959 information in its dictionary.
 960
 961 Specify a file name or file handle.  @cmd{SYSFILE INFO} reads that
 962 file and displays information on its dictionary.
 963
 964 @pspp{} automatically detects the encoding of string data in the file,
 965 when possible.  The character encoding of old SPSS system files cannot
 966 always be guessed correctly, and SPSS/PC+ system files do not include
 967 any indication of their encoding.  Specify the @subcmd{ENCODING}
 968 subcommand with an @acronym{IANA} character set name as its string
 969 argument to override the default, or specify @code{ENCODING='DETECT'}
 970 to analyze and report possibly valid encodings for the system file.
 971 The @subcmd{ENCODING} subcommand is a @pspp{} extension.
 972
 973 @cmd{SYSFILE INFO} does not affect the current active dataset.
 974
 975 @node XEXPORT
 976 @section XEXPORT
 977 @vindex XEXPORT
 978
 979 @display
 980 XEXPORT
 981         /OUTFILE='@var{file_name}'
 982         /DIGITS=@var{n}
 983         /DROP=@var{var_list}
 984         /KEEP=@var{var_list}
 985         /RENAME=(@var{src_names}=@var{target_names})@dots{}
 986         /TYPE=@{COMM,TAPE@}
 987         /MAP
 988 @end display
 989
 990 The @cmd{XEXPORT} transformation writes the active dataset dictionary and
 991 data to a specified portable file.
 992
 993 This transformation is a @pspp{} extension.
 994
 995 It is similar to the @cmd{EXPORT} procedure, with two differences:
 996
 997 @itemize
 998 @item
 999 @cmd{XEXPORT} is a transformation, not a procedure.  It is executed when
1000 the data is read by a procedure or procedure-like command.
1001
1002 @item
1003 @cmd{XEXPORT} does not support the @subcmd{UNSELECTED} subcommand.
1004 @end itemize
1005
1006 @xref{EXPORT}, for more information.
1007
1008 @node XSAVE
1009 @section XSAVE
1010 @vindex XSAVE
1011
1012 @display
1013 XSAVE
1014         /OUTFILE='@var{file_name}'
1015         /@{UNCOMPRESSED,COMPRESSED,ZCOMPRESSED@}
1016         /PERMISSIONS=@{WRITEABLE,READONLY@}
1017         /DROP=@var{var_list}
1018         /KEEP=@var{var_list}
1019         /VERSION=@var{version}
1020         /RENAME=(@var{src_names}=@var{target_names})@dots{}
1021         /NAMES
1022         /MAP
1023 @end display
1024
1025 The @cmd{XSAVE} transformation writes the active dataset's dictionary and
1026 data to a system file.  It is similar to the @cmd{SAVE}
1027 procedure, with two differences:
1028
1029 @itemize
1030 @item
1031 @cmd{XSAVE} is a transformation, not a procedure.  It is executed when
1032 the data is read by a procedure or procedure-like command.
1033
1034 @item
1035 @cmd{XSAVE} does not support the @subcmd{UNSELECTED} subcommand.
1036 @end itemize
1037
1038 @xref{SAVE}, for more information.