1 @node System and Portable File IO
2 @chapter System and Portable File I/O
4 The commands in this chapter read, write, and examine system files and
8 * APPLY DICTIONARY:: Apply system file dictionary to active dataset.
9 * EXPORT:: Write to a portable file.
10 * GET:: Read from a system file.
11 * GET DATA:: Read from foreign files.
12 * IMPORT:: Read from a portable file.
13 * SAVE:: Write to a system file.
14 * SAVE TRANSLATE:: Write data in foreign file formats.
15 * SYSFILE INFO:: Display system file dictionary.
16 * XEXPORT:: Write to a portable file, as a transformation.
17 * XSAVE:: Write to a system file, as a transformation.
20 @node APPLY DICTIONARY
21 @section APPLY DICTIONARY
22 @vindex APPLY DICTIONARY
25 APPLY DICTIONARY FROM=@{'@var{file_name}',@var{file_handle}@}.
28 @cmd{APPLY DICTIONARY} applies the variable labels, value labels,
29 and missing values taken from a file to corresponding
30 variables in the active dataset. In some cases it also updates the
33 Specify a system file or portable file's name, a data set name
34 (@pxref{Datasets}), or a file handle name (@pxref{File Handles}). The
35 dictionary in the file will be read, but it will not replace the
36 active dataset's dictionary. The file's data will not be read.
38 Only variables with names that exist in both the active dataset and the
39 system file are considered. Variables with the same name but different
40 types (numeric, string) will cause an error message. Otherwise, the
41 system file variables' attributes will replace those in their matching
42 active dataset variables:
46 If a system file variable has a variable label, then it will replace
47 the variable label of the active dataset variable. If the system
48 file variable does not have a variable label, then the active dataset
49 variable's variable label, if any, will be retained.
52 If the system file variable has custom attributes (@pxref{VARIABLE
53 ATTRIBUTE}), then those attributes replace the active dataset variable's
54 custom attributes. If the system file variable does not have custom
55 attributes, then the active dataset variable's custom attributes, if any,
59 If the active dataset variable is numeric or short string, then value
60 labels and missing values, if any, will be copied to the active dataset
61 variable. If the system file variable does not have value labels or
62 missing values, then those in the active dataset variable, if any, will not
66 In addition to properties of variables, some properties of the active
67 file dictionary as a whole are updated:
71 If the system file has custom attributes (@pxref{DATAFILE ATTRIBUTE}),
72 then those attributes replace the active dataset variable's custom
76 If the active dataset has a weighting variable (@pxref{WEIGHT}), and the
77 system file does not, or if the weighting variable in the system file
78 does not exist in the active dataset, then the active dataset weighting
79 variable, if any, is retained. Otherwise, the weighting variable in
80 the system file becomes the active dataset weighting variable.
83 @cmd{APPLY DICTIONARY} takes effect immediately. It does not read the
84 active dataset. The system file is not modified.
92 /OUTFILE='@var{file_name}'
93 /UNSELECTED=@{RETAIN,DELETE@}
97 /RENAME=(@var{src_names}=@var{target_names})@dots{}
102 The @cmd{EXPORT} procedure writes the active dataset's dictionary and
103 data to a specified portable file.
105 By default, cases excluded with FILTER are written to the
106 file. These can be excluded by specifying DELETE on the @subcmd{UNSELECTED}
107 subcommand. Specifying RETAIN makes the default explicit.
109 Portable files express real numbers in base 30. Integers are always
110 expressed to the maximum precision needed to make them exact.
111 Non-integers are, by default, expressed to the machine's maximum
112 natural precision (approximately 15 decimal digits on many machines).
113 If many numbers require this many digits, the portable file may
114 significantly increase in size. As an alternative, the @subcmd{DIGITS}
115 subcommand may be used to specify the number of decimal digits of
116 precision to write. @subcmd{DIGITS} applies only to non-integers.
118 The @subcmd{OUTFILE} subcommand, which is the only required subcommand, specifies
119 the portable file to be written as a file name string or
120 a file handle (@pxref{File Handles}).
122 @subcmd{DROP}, @subcmd{KEEP}, and @subcmd{RENAME} follow the same format as the
123 @subcmd{SAVE} procedure (@pxref{SAVE}).
125 The @subcmd{TYPE} subcommand specifies the character set for use in the
126 portable file. Its value is currently not used.
128 The @subcmd{MAP} subcommand is currently ignored.
130 @cmd{EXPORT} is a procedure. It causes the active dataset to be read.
138 /FILE=@{'@var{file_name}',@var{file_handle}@}
141 /RENAME=(@var{src_names}=@var{target_names})@dots{}
142 /ENCODING='@var{encoding}'
145 @cmd{GET} clears the current dictionary and active dataset and
146 replaces them with the dictionary and data from a specified file.
148 The @subcmd{FILE} subcommand is the only required subcommand. Specify
149 the SPSS system file, SPSS/PC+ system file, or SPSS portable file to
150 be read as a string file name or a file handle (@pxref{File Handles}).
152 By default, all the variables in a file are read. The DROP
153 subcommand can be used to specify a list of variables that are not to be
154 read. By contrast, the @subcmd{KEEP} subcommand can be used to specify
155 variable that are to be read, with all other variables not read.
157 Normally variables in a file retain the names that they were
158 saved under. Use the @subcmd{RENAME} subcommand to change these names.
160 within parentheses, a list of variable names followed by an equals sign
161 (@samp{=}) and the names that they should be renamed to. Multiple
162 parenthesized groups of variable names can be included on a single
163 @subcmd{RENAME} subcommand.
164 Variables' names may be swapped using a @subcmd{RENAME}
165 subcommand of the form @subcmd{/RENAME=(@var{A} @var{B}=@var{B} @var{A})}.
167 Alternate syntax for the @subcmd{RENAME} subcommand allows the parentheses to be
168 eliminated. When this is done, only a single variable may be renamed at
169 once. For instance, @subcmd{/RENAME=@var{A}=@var{B}}. This alternate syntax is
172 @subcmd{DROP}, @subcmd{KEEP}, and @subcmd{RENAME} are executed in left-to-right order.
173 Each may be present any number of times. @cmd{GET} never modifies a
174 file on disk. Only the active dataset read from the file
175 is affected by these subcommands.
177 @pspp{} automatically detects the encoding of string data in the file,
178 when possible. The character encoding of old SPSS system files cannot
179 always be guessed correctly, and SPSS/PC+ system files do not include
180 any indication of their encoding. Specify the @subcmd{ENCODING}
181 subcommand with an @acronym{IANA} character set name as its string
182 argument to override the default. Use @cmd{SYSFILE INFO} to analyze
183 the encodings that might be valid for a system file. The
184 @subcmd{ENCODING} subcommand is a @pspp{} extension.
186 @cmd{GET} does not cause the data to be read, only the dictionary. The data
187 is read later, when a procedure is executed.
189 Use of @cmd{GET} to read a portable file is a @pspp{} extension.
197 /TYPE=@{GNM,ODS,PSQL,TXT@}
198 @dots{}additional subcommands depending on TYPE@dots{}
201 The @cmd{GET DATA} command is used to read files and other data
202 sources created by other applications. When this command is executed,
203 the current dictionary and active dataset are replaced with variables
204 and data read from the specified source.
206 The @subcmd{TYPE} subcommand is mandatory and must be the first subcommand
207 specified. It determines the type of the file or source to read.
208 @pspp{} currently supports the following file types:
212 Spreadsheet files created by Gnumeric (@url{http://gnumeric.org}).
215 Spreadsheet files in OpenDocument format (@url{http://opendocumentformat.org}).
218 Relations from PostgreSQL databases (@url{http://postgresql.org}).
221 Textual data files in columnar and delimited formats.
224 Each supported file type has additional subcommands, explained in
225 separate sections below.
228 * GET DATA /TYPE=GNM/ODS:: Spreadsheets
229 * GET DATA /TYPE=PSQL:: Databases
230 * GET DATA /TYPE=TXT:: Delimited Text Files
233 @node GET DATA /TYPE=GNM/ODS
234 @subsection Spreadsheet Files
237 GET DATA /TYPE=@{GNM, ODS@}
238 /FILE=@{'@var{file_name}'@}
239 /SHEET=@{NAME '@var{sheet_name}', INDEX @var{n}@}
240 /CELLRANGE=@{RANGE '@var{range}', FULL@}
241 /READNAMES=@{ON, OFF@}
242 /ASSUMEDSTRWIDTH=@var{n}.
247 @cindex spreadsheet files
249 Gnumeric spreadsheets (@url{http://gnumeric.org}), and spreadsheets
250 in OpenDocument format
251 (@url{http://libreplanet.org/wiki/Group:OpenDocument/Software})
252 can be read using the @cmd{GET DATA} command.
253 Use the @subcmd{TYPE} subcommand to indicate the file's format.
254 /TYPE=GNM indicates Gnumeric files,
255 /TYPE=ODS indicates OpenDocument.
256 The @subcmd{FILE} subcommand is mandatory.
257 Use it to specify the name file to be read.
258 All other subcommands are optional.
260 The format of each variable is determined by the format of the spreadsheet
261 cell containing the first datum for the variable.
262 If this cell is of string (text) format, then the width of the variable is
263 determined from the length of the string it contains, unless the
264 @subcmd{ASSUMEDSTRWIDTH} subcommand is given.
266 The @subcmd{SHEET} subcommand specifies the sheet within the spreadsheet file to read.
267 There are two forms of the @subcmd{SHEET} subcommand.
269 @subcmd{/SHEET=name @var{sheet_name}}, the string @var{sheet_name} is the
270 name of the sheet to read.
271 In the second form, @subcmd{/SHEET=index @var{idx}}, @var{idx} is a
272 integer which is the index of the sheet to read.
273 The first sheet has the index 1.
274 If the @subcmd{SHEET} subcommand is omitted, then the command will read the
275 first sheet in the file.
277 The @subcmd{CELLRANGE} subcommand specifies the range of cells within the sheet to read.
278 If the subcommand is given as @subcmd{/CELLRANGE=FULL}, then the entire
280 To read only part of a sheet, use the form
281 @subcmd{/CELLRANGE=range '@var{top_left_cell}:@var{bottom_right_cell}'}.
282 For example, the subcommand @subcmd{/CELLRANGE=range 'C3:P19'} reads
283 columns C--P, and rows 3--19 inclusive.
284 If no @subcmd{CELLRANGE} subcommand is given, then the entire sheet is read.
286 If @subcmd{/READNAMES=ON} is specified, then the contents of cells of
287 the first row are used as the names of the variables in which to store
288 the data from subsequent rows. This is the default.
289 If @subcmd{/READNAMES=OFF} is
290 used, then the variables receive automatically assigned names.
292 The @subcmd{ASSUMEDSTRWIDTH} subcommand specifies the maximum width of string
293 variables read from the file.
294 If omitted, the default value is determined from the length of the
295 string in the first spreadsheet cell for each variable.
298 @node GET DATA /TYPE=PSQL
299 @subsection Postgres Database Queries
303 /CONNECT=@{@var{connection info}@}
305 [/ASSUMEDSTRWIDTH=@var{w}]
313 The PSQL type is used to import data from a postgres database server.
314 The server may be located locally or remotely.
315 Variables are automatically created based on the table column names
316 or the names specified in the SQL query.
317 Postgres data types of high precision, will loose precision when
318 imported into @pspp{}.
319 Not all the postgres data types are able to be represented in @pspp{}.
320 If a datum cannot be represented a warning will be issued and that
321 datum will be set to SYSMIS.
323 The @subcmd{CONNECT} subcommand is mandatory.
324 It is a string specifying the parameters of the database server from
325 which the data should be fetched.
326 The format of the string is given in the postgres manual
327 @url{http://www.postgresql.org/docs/8.0/static/libpq.html#LIBPQ-CONNECT}.
329 The @subcmd{SQL} subcommand is mandatory.
330 It must be a valid SQL string to retrieve data from the database.
332 The @subcmd{ASSUMEDSTRWIDTH} subcommand specifies the maximum width of string
333 variables read from the database.
334 If omitted, the default value is determined from the length of the
335 string in the first value read for each variable.
337 The @subcmd{UNENCRYPTED} subcommand allows data to be retrieved over an insecure
339 If the connection is not encrypted, and the @subcmd{UNENCRYPTED} subcommand is
340 not given, then an error will occur.
341 Whether or not the connection is
342 encrypted depends upon the underlying psql library and the
343 capabilities of the database server.
345 The @subcmd{BSIZE} subcommand serves only to optimise the speed of data transfer.
346 It specifies an upper limit on
347 number of cases to fetch from the database at once.
348 The default value is 4096.
349 If your SQL statement fetches a large number of cases but only a small number of
350 variables, then the data transfer may be faster if you increase this value.
351 Conversely, if the number of variables is large, or if the machine on which
352 @pspp{} is running has only a
353 small amount of memory, then a smaller value will be better.
356 The following syntax is an example:
359 /CONNECT='host=example.com port=5432 dbname=product user=fred passwd=xxxx'
360 /SQL='select * from manufacturer'.
364 @node GET DATA /TYPE=TXT
365 @subsection Textual Data Files
369 /FILE=@{'@var{file_name}',@var{file_handle}@}
370 [ENCODING='@var{encoding}']
371 [/ARRANGEMENT=@{DELIMITED,FIXED@}]
372 [/FIRSTCASE=@{@var{first_case}@}]
374 @dots{}additional subcommands depending on ARRANGEMENT@dots{}
379 When TYPE=TXT is specified, GET DATA reads data in a delimited or
380 fixed columnar format, much like DATA LIST (@pxref{DATA LIST}).
382 The @subcmd{FILE} subcommand is mandatory. Specify the file to be read as
383 a string file name or (for textual data only) a
384 file handle (@pxref{File Handles}).
386 The @subcmd{ENCODING} subcommand specifies the character encoding of
387 the file to be read. @xref{INSERT}, for information on supported
390 The @subcmd{ARRANGEMENT} subcommand determines the file's basic format.
391 DELIMITED, the default setting, specifies that fields in the input
392 data are separated by spaces, tabs, or other user-specified
393 delimiters. FIXED specifies that fields in the input data appear at
394 particular fixed column positions within records of a case.
396 By default, cases are read from the input file starting from the first
397 line. To skip lines at the beginning of an input file, set @subcmd{FIRSTCASE}
398 to the number of the first line to read: 2 to skip the first line, 3
399 to skip the first two lines, and so on.
401 @subcmd{IMPORTCASES} is ignored, for compatibility. Use @cmd{N OF
402 CASES} to limit the number of cases read from a file (@pxref{N OF
403 CASES}), or @cmd{SAMPLE} to obtain a random sample of cases
406 The remaining subcommands apply only to one of the two file
407 arrangements, described below.
410 * GET DATA /TYPE=TXT /ARRANGEMENT=DELIMITED::
411 * GET DATA /TYPE=TXT /ARRANGEMENT=FIXED::
414 @node GET DATA /TYPE=TXT /ARRANGEMENT=DELIMITED
415 @subsubsection Reading Delimited Data
419 /FILE=@{'@var{file_name}',@var{file_handle}@}
420 [/ARRANGEMENT=@{DELIMITED,FIXED@}]
421 [/FIRSTCASE=@{@var{first_case}@}]
422 [/IMPORTCASE=@{ALL,FIRST @var{max_cases},PERCENT @var{percent}@}]
424 /DELIMITERS="@var{delimiters}"
425 [/QUALIFIER="@var{quotes}"
426 [/DELCASE=@{LINE,VARIABLES @var{n_variables}@}]
427 /VARIABLES=@var{del_var1} [@var{del_var2}]@dots{}
428 where each @var{del_var} takes the form:
432 The GET DATA command with TYPE=TXT and ARRANGEMENT=DELIMITED reads
433 input data from text files in delimited format, where fields are
434 separated by a set of user-specified delimiters. Its capabilities are
435 similar to those of DATA LIST FREE (@pxref{DATA LIST FREE}), with a
438 The required @subcmd{FILE} subcommand and optional @subcmd{FIRSTCASE} and @subcmd{IMPORTCASE}
439 subcommands are described above (@pxref{GET DATA /TYPE=TXT}).
441 @subcmd{DELIMITERS}, which is required, specifies the set of characters that
442 may separate fields. Each character in the string specified on
443 @subcmd{DELIMITERS} separates one field from the next. The end of a line also
444 separates fields, regardless of @subcmd{DELIMITERS}. Two consecutive
445 delimiters in the input yield an empty field, as does a delimiter at
446 the end of a line. A space character as a delimiter is an exception:
447 consecutive spaces do not yield an empty field and neither does any
448 number of spaces at the end of a line.
450 To use a tab as a delimiter, specify @samp{\t} at the beginning of the
451 @subcmd{DELIMITERS} string. To use a backslash as a delimiter, specify
452 @samp{\\} as the first delimiter or, if a tab should also be a
453 delimiter, immediately following @samp{\t}. To read a data file in
454 which each field appears on a separate line, specify the empty string
455 for @subcmd{DELIMITERS}.
457 The optional @subcmd{QUALIFIER} subcommand names one or more characters that
458 can be used to quote values within fields in the input. A field that
459 begins with one of the specified quote characters ends at the next
460 matching quote. Intervening delimiters become part of the field,
461 instead of terminating it. The ability to specify more than one quote
462 character is a @pspp{} extension.
464 The character specified on @subcmd{QUALIFIER} can be embedded within a
465 field that it quotes by doubling the qualifier. For example, if
466 @samp{'} is specified on @subcmd{QUALIFIER}, then @code{'a''b'}
467 specifies a field that contains @samp{a'b}.
469 The @subcmd{DELCASE} subcommand controls how data may be broken across lines in
470 the data file. With LINE, the default setting, each line must contain
471 all the data for exactly one case. For additional flexibility, to
472 allow a single case to be split among lines or multiple cases to be
473 contained on a single line, specify VARIABLES @i{n_variables}, where
474 @i{n_variables} is the number of variables per case.
476 The @subcmd{VARIABLES} subcommand is required and must be the last subcommand.
477 Specify the name of each variable and its input format (@pxref{Input
478 and Output Formats}) in the order they should be read from the input
481 @subsubheading Examples
484 On a Unix-like system, the @samp{/etc/passwd} file has a format
488 root:$1$nyeSP5gD$pDq/:0:0:,,,:/root:/bin/bash
489 blp:$1$BrP/pFg4$g7OG:1000:1000:Ben Pfaff,,,:/home/blp:/bin/bash
490 john:$1$JBuq/Fioq$g4A:1001:1001:John Darrington,,,:/home/john:/bin/bash
491 jhs:$1$D3li4hPL$88X1:1002:1002:Jason Stover,,,:/home/jhs:/bin/csh
495 The following syntax reads a file in the format used by
498 @c If you change this example, change the regression test in
499 @c tests/language/data-io/get-data.at to match.
501 GET DATA /TYPE=TXT /FILE='/etc/passwd' /DELIMITERS=':'
502 /VARIABLES=username A20
512 Consider the following data on used cars:
515 model year mileage price type age
516 Civic 2002 29883 15900 Si 2
517 Civic 2003 13415 15900 EX 1
518 Civic 1992 107000 3800 n/a 12
519 Accord 2002 26613 17900 EX 1
523 The following syntax can be used to read the used car data:
525 @c If you change this example, change the regression test in
526 @c tests/language/data-io/get-data.at to match.
528 GET DATA /TYPE=TXT /FILE='cars.data' /DELIMITERS=' ' /FIRSTCASE=2
538 Consider the following information on animals in a pet store:
541 'Pet''s Name', "Age", "Color", "Date Received", "Price", "Height", "Type"
542 , (Years), , , (Dollars), ,
543 "Rover", 4.5, Brown, "12 Feb 2004", 80, '1''4"', "Dog"
544 "Charlie", , Gold, "5 Apr 2007", 12.3, "3""", "Fish"
545 "Molly", 2, Black, "12 Dec 2006", 25, '5"', "Cat"
546 "Gilly", , White, "10 Apr 2007", 10, "3""", "Guinea Pig"
550 The following syntax can be used to read the pet store data:
552 @c If you change this example, change the regression test in
553 @c tests/language/data-io/get-data.at to match.
555 GET DATA /TYPE=TXT /FILE='pets.data' /DELIMITERS=', ' /QUALIFIER='''"' /ESCAPE
566 @node GET DATA /TYPE=TXT /ARRANGEMENT=FIXED
567 @subsubsection Reading Fixed Columnar Data
569 @c (modify-syntax-entry ?_ "w")
570 @c (modify-syntax-entry ?' "'")
571 @c (modify-syntax-entry ?@ "'")
575 /FILE=@{'file_name',@var{file_handle}@}
576 [/ARRANGEMENT=@{DELIMITED,FIXED@}]
577 [/FIRSTCASE=@{@var{first_case}@}]
578 [/IMPORTCASE=@{ALL,FIRST @var{max_cases},PERCENT @var{percent}@}]
581 /VARIABLES @var{fixed_var} [@var{fixed_var}]@dots{}
582 [/rec# @var{fixed_var} [@var{fixed_var}]@dots{}]@dots{}
583 where each @var{fixed_var} takes the form:
584 @var{variable} @var{start}-@var{end} @var{format}
587 The @cmd{GET DATA} command with TYPE=TXT and ARRANGEMENT=FIXED reads input
588 data from text files in fixed format, where each field is located in
589 particular fixed column positions within records of a case. Its
590 capabilities are similar to those of DATA LIST FIXED (@pxref{DATA LIST
591 FIXED}), with a few enhancements.
593 The required @subcmd{FILE} subcommand and optional @subcmd{FIRSTCASE} and @subcmd{IMPORTCASE}
594 subcommands are described above (@pxref{GET DATA /TYPE=TXT}).
596 The optional @subcmd{FIXCASE} subcommand may be used to specify the positive
597 integer number of input lines that make up each case. The default
600 The @subcmd{VARIABLES} subcommand, which is required, specifies the positions
601 at which each variable can be found. For each variable, specify its
602 name, followed by its start and end column separated by @samp{-}
603 (e.g.@: @samp{0-9}), followed by an input format type (e.g.@:
604 @samp{F}) or a full format specification (e.g.@: @samp{DOLLAR12.2}).
605 For this command, columns are numbered starting from 0 at
606 the left column. Introduce the variables in the second and later
607 lines of a case by a slash followed by the number of the line within
608 the case, e.g.@: @samp{/2} for the second line.
610 @subsubheading Examples
613 Consider the following data on used cars:
616 model year mileage price type age
617 Civic 2002 29883 15900 Si 2
618 Civic 2003 13415 15900 EX 1
619 Civic 1992 107000 3800 n/a 12
620 Accord 2002 26613 17900 EX 1
624 The following syntax can be used to read the used car data:
626 @c If you change this example, change the regression test in
627 @c tests/language/data-io/get-data.at to match.
629 GET DATA /TYPE=TXT /FILE='cars.data' /ARRANGEMENT=FIXED /FIRSTCASE=2
630 /VARIABLES=model 0-7 A
644 /FILE='@var{file_name}'
648 /RENAME=(@var{src_names}=@var{target_names})@dots{}
651 The @cmd{IMPORT} transformation clears the active dataset dictionary and
653 replaces them with a dictionary and data from a system file or
656 The @subcmd{FILE} subcommand, which is the only required subcommand, specifies
657 the portable file to be read as a file name string or a file handle
658 (@pxref{File Handles}).
660 The @subcmd{TYPE} subcommand is currently not used.
662 @subcmd{DROP}, @subcmd{KEEP}, and @subcmd{RENAME} follow the syntax used by @cmd{GET} (@pxref{GET}).
664 @cmd{IMPORT} does not cause the data to be read; only the dictionary. The
665 data is read later, when a procedure is executed.
667 Use of @cmd{IMPORT} to read a system file is a @pspp{} extension.
675 /OUTFILE=@{'@var{file_name}',@var{file_handle}@}
676 /UNSELECTED=@{RETAIN,DELETE@}
677 /@{UNCOMPRESSED,COMPRESSED,ZCOMPRESSED@}
678 /PERMISSIONS=@{WRITEABLE,READONLY@}
681 /VERSION=@var{version}
682 /RENAME=(@var{src_names}=@var{target_names})@dots{}
687 The @cmd{SAVE} procedure causes the dictionary and data in the active
689 be written to a system file.
691 OUTFILE is the only required subcommand. Specify the system file
692 to be written as a string file name or a file handle
693 (@pxref{File Handles}).
695 By default, cases excluded with FILTER are written to the system file.
696 These can be excluded by specifying @subcmd{DELETE} on the @subcmd{UNSELECTED}
697 subcommand. Specifying @subcmd{RETAIN} makes the default explicit.
699 The @subcmd{UNCOMPRESSED}, @subcmd{COMPRESSED}, and
700 @subcmd{ZCOMPRESSED} subcommand determine the system file's
705 Data is not compressed. Each numeric value uses 8 bytes of disk
706 space. Each string value uses one byte per column width, rounded up
707 to a multiple of 8 bytes.
710 Data is compressed with a simple algorithm. Each integer numeric
711 value between @minus{}99 and 151, inclusive, or system missing value
712 uses one byte of disk space. Each 8-byte segment of a string that
713 consists only of spaces uses 1 byte. Any other numeric value or
714 8-byte string segment uses 9 bytes of disk space.
717 Data is compressed with the ``deflate'' compression algorithm
718 specified in RFC@tie{}1951 (the same algorithm used by
719 @command{gzip}). Files written with this compression level cannot be
720 read by PSPP 0.8.1 or earlier or by SPSS 20 or earlier.
723 @subcmd{COMPRESSED} is the default compression level. The SET command
724 (@pxref{SET}) can change this default.
726 The @subcmd{PERMISSIONS} subcommand specifies permissions for the new system
727 file. WRITEABLE, the default, creates the file with read and write
728 permission. READONLY creates the file for read-only access.
730 By default, all the variables in the active dataset dictionary are written
731 to the system file. The @subcmd{DROP} subcommand can be used to specify a list
732 of variables not to be written. In contrast, KEEP specifies variables
733 to be written, with all variables not specified not written.
735 Normally variables are saved to a system file under the same names they
736 have in the active dataset. Use the @subcmd{RENAME} subcommand to change these names.
737 Specify, within parentheses, a list of variable names followed by an
738 equals sign (@samp{=}) and the names that they should be renamed to.
739 Multiple parenthesized groups of variable names can be included on a
740 single @subcmd{RENAME} subcommand. Variables' names may be swapped using a
741 @subcmd{RENAME} subcommand of the
742 form @subcmd{/RENAME=(@var{A} @var{B}=@var{B} @var{A})}.
744 Alternate syntax for the @subcmd{RENAME} subcommand allows the parentheses to be
745 eliminated. When this is done, only a single variable may be renamed at
746 once. For instance, @subcmd{/RENAME=@var{A}=@var{B}}. This alternate syntax is
749 @subcmd{DROP}, @subcmd{KEEP}, and @subcmd{RENAME} are performed in
750 left-to-right order. They
751 each may be present any number of times. @cmd{SAVE} never modifies
752 the active dataset. @subcmd{DROP}, @subcmd{KEEP}, and @subcmd{RENAME} only
753 affect the system file written to disk.
755 The @subcmd{VERSION} subcommand specifies the version of the file format. Valid
756 versions are 2 and 3. The default version is 3. In version 2 system
757 files, variable names longer than 8 bytes will be truncated. The two
758 versions are otherwise identical.
760 The @subcmd{NAMES} and @subcmd{MAP} subcommands are currently ignored.
762 @cmd{SAVE} causes the data to be read. It is a procedure.
765 @section SAVE TRANSLATE
766 @vindex SAVE TRANSLATE
770 /OUTFILE=@{'@var{file_name}',@var{file_handle}@}
773 [/MISSING=@{IGNORE,RECODE@}]
775 [/DROP=@var{var_list}]
776 [/KEEP=@var{var_list}]
777 [/RENAME=(@var{src_names}=@var{target_names})@dots{}]
778 [/UNSELECTED=@{RETAIN,DELETE@}]
781 @dots{}additional subcommands depending on TYPE@dots{}
784 The @cmd{SAVE TRANSLATE} command is used to save data into various
785 formats understood by other applications.
787 The @subcmd{OUTFILE} and @subcmd{TYPE} subcommands are mandatory.
788 @subcmd{OUTFILE} specifies the file to be written, as a string file name or a file handle
789 (@pxref{File Handles}). @subcmd{TYPE} determines the type of the file or
790 source to read. It must be one of the following:
794 Comma-separated value format,
797 Tab-delimited format.
800 By default, @cmd{SAVE TRANSLATE} will not overwrite an existing file. Use
801 @subcmd{REPLACE} to force an existing file to be overwritten.
803 With MISSING=IGNORE, the default, @subcmd{SAVE TRANSLATE} treats user-missing
804 values as if they were not missing. Specify MISSING=RECODE to output
805 numeric user-missing values like system-missing values and string
806 user-missing values as all spaces.
808 By default, all the variables in the active dataset dictionary are saved
809 to the system file, but @subcmd{DROP} or @subcmd{KEEP} can select a subset of variable
810 to save. The @subcmd{RENAME} subcommand can also be used to change the names
811 under which variables are saved. @subcmd{UNSELECTED} determines whether cases
812 filtered out by the @cmd{FILTER} command are written to the output file.
813 These subcommands have the same syntax and meaning as on the
814 @cmd{SAVE} command (@pxref{SAVE}).
816 Each supported file type has additional subcommands, explained in
817 separate sections below.
819 @cmd{SAVE TRANSLATE} causes the data to be read. It is a procedure.
822 * SAVE TRANSLATE /TYPE=CSV and TYPE=TAB::
825 @node SAVE TRANSLATE /TYPE=CSV and TYPE=TAB
826 @subsection Writing Comma- and Tab-Separated Data Files
830 /OUTFILE=@{'@var{file_name}',@var{file_handle}@}
833 [/MISSING=@{IGNORE,RECODE@}]
835 [/DROP=@var{var_list}]
836 [/KEEP=@var{var_list}]
837 [/RENAME=(@var{src_names}=@var{target_names})@dots{}]
838 [/UNSELECTED=@{RETAIN,DELETE@}]
841 [/CELLS=@{VALUES,LABELS@}]
842 [/TEXTOPTIONS DELIMITER='@var{delimiter}']
843 [/TEXTOPTIONS QUALIFIER='@var{qualifier}']
844 [/TEXTOPTIONS DECIMAL=@{DOT,COMMA@}]
845 [/TEXTOPTIONS FORMAT=@{PLAIN,VARIABLE@}]
848 The SAVE TRANSLATE command with TYPE=CSV or TYPE=TAB writes data in a
849 comma- or tab-separated value format similar to that described by
850 RFC@tie{}4180. Each variable becomes one output column, and each case
851 becomes one line of output. If FIELDNAMES is specified, an additional
852 line at the top of the output file lists variable names.
854 The CELLS and TEXTOPTIONS FORMAT settings determine how values are
855 written to the output file:
858 @item CELLS=VALUES FORMAT=PLAIN (the default settings)
859 Writes variables to the output in ``plain'' formats that ignore the
860 details of variable formats. Numeric values are written as plain
861 decimal numbers with enough digits to indicate their exact values in
862 machine representation. Numeric values include @samp{e} followed by
863 an exponent if the exponent value would be less than -4 or greater
864 than 16. Dates are written in MM/DD/YYYY format and times in HH:MM:SS
865 format. WKDAY and MONTH values are written as decimal numbers.
867 Numeric values use, by default, the decimal point character set with
868 SET DECIMAL (@pxref{SET DECIMAL}). Use DECIMAL=DOT or DECIMAL=COMMA
869 to force a particular decimal point character.
871 @item CELLS=VALUES FORMAT=VARIABLE
872 Writes variables using their print formats. Leading and trailing
873 spaces are removed from numeric values, and trailing spaces are
874 removed from string values.
876 @item CELLS=LABEL FORMAT=PLAIN
877 @itemx CELLS=LABEL FORMAT=VARIABLE
878 Writes value labels where they exist, and otherwise writes the values
879 themselves as described above.
882 Regardless of CELLS and TEXTOPTIONS FORMAT, numeric system-missing
883 values are output as a single space.
885 For TYPE=TAB, tab characters delimit values. For TYPE=CSV, the
886 TEXTOPTIONS DELIMITER and DECIMAL settings determine the character
887 that separate values within a line. If DELIMITER is specified, then
888 the specified string separate values. If DELIMITER is not specified,
889 then the default is a comma with DECIMAL=DOT or a semicolon with
890 DECIMAL=COMMA. If DECIMAL is not given either, it is implied by the
891 decimal point character set with SET DECIMAL (@pxref{SET DECIMAL}).
893 The TEXTOPTIONS QUALIFIER setting specifies a character that is output
894 before and after a value that contains the delimiter character or the
895 qualifier character. The default is a double quote (@samp{"}). A
896 qualifier character that appears within a value is doubled.
899 @section SYSFILE INFO
903 SYSFILE INFO FILE='@var{file_name}' [ENCODING='@var{encoding}'].
906 @cmd{SYSFILE INFO} reads the dictionary in an SPSS system file,
907 SPSS/PC+ system file, or SPSS portable file, and displays the
908 information in its dictionary.
910 Specify a file name or file handle. @cmd{SYSFILE INFO} reads that
911 file and displays information on its dictionary.
913 @pspp{} automatically detects the encoding of string data in the file,
914 when possible. The character encoding of old SPSS system files cannot
915 always be guessed correctly, and SPSS/PC+ system files do not include
916 any indication of their encoding. Specify the @subcmd{ENCODING}
917 subcommand with an @acronym{IANA} character set name as its string
918 argument to override the default, or specify @code{ENCODING='DETECT'}
919 to analyze and report possibly valid encodings for the system file.
920 The @subcmd{ENCODING} subcommand is a @pspp{} extension.
922 @cmd{SYSFILE INFO} does not affect the current active dataset.
930 /OUTFILE='@var{file_name}'
934 /RENAME=(@var{src_names}=@var{target_names})@dots{}
939 The @cmd{EXPORT} transformation writes the active dataset dictionary and
940 data to a specified portable file.
942 This transformation is a @pspp{} extension.
944 It is similar to the @cmd{EXPORT} procedure, with two differences:
948 @cmd{XEXPORT} is a transformation, not a procedure. It is executed when
949 the data is read by a procedure or procedure-like command.
952 @cmd{XEXPORT} does not support the @subcmd{UNSELECTED} subcommand.
955 @xref{EXPORT}, for more information.
963 /OUTFILE='@var{file_name}'
964 /@{UNCOMPRESSED,COMPRESSED,ZCOMPRESSED@}
965 /PERMISSIONS=@{WRITEABLE,READONLY@}
968 /VERSION=@var{version}
969 /RENAME=(@var{src_names}=@var{target_names})@dots{}
974 The @cmd{XSAVE} transformation writes the active dataset's dictionary and
975 data to a system file. It is similar to the @cmd{SAVE}
976 procedure, with two differences:
980 @cmd{XSAVE} is a transformation, not a procedure. It is executed when
981 the data is read by a procedure or procedure-like command.
984 @cmd{XSAVE} does not support the @subcmd{UNSELECTED} subcommand.
987 @xref{SAVE}, for more information.