X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;ds=sidebyside;f=doc%2Ffiles.texi;h=4085e386af34d51f17dfa4d1ec228b6f7b0eb637;hb=b099405139cbc42700d5dd62a7741765785410f0;hp=2e03fc9dd334e0a5b9668fc8b2cfbffade1a5ea2;hpb=9d160f03d77979c6e3fbc177a19b7e41438e5c00;p=pspp diff --git a/doc/files.texi b/doc/files.texi index 2e03fc9dd3..4085e386af 100644 --- a/doc/files.texi +++ b/doc/files.texi @@ -1,3 +1,12 @@ +@c PSPP - a program for statistical analysis. +@c Copyright (C) 2017 Free Software Foundation, Inc. +@c Permission is granted to copy, distribute and/or modify this document +@c under the terms of the GNU Free Documentation License, Version 1.3 +@c or any later version published by the Free Software Foundation; +@c with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. +@c A copy of the license is included in the section entitled "GNU +@c Free Documentation License". +@c @node System and Portable File IO @chapter System and Portable File I/O @@ -11,6 +20,7 @@ portable files. * GET DATA:: Read from foreign files. * IMPORT:: Read from a portable file. * SAVE:: Write to a system file. +* SAVE DATA COLLECTION:: Write to a system file and metadata file. * SAVE TRANSLATE:: Write data in foreign file formats. * SYSFILE INFO:: Display system file dictionary. * XEXPORT:: Write to a portable file, as a transformation. @@ -145,10 +155,9 @@ GET @cmd{GET} clears the current dictionary and active dataset and replaces them with the dictionary and data from a specified file. -The @subcmd{FILE} subcommand is the only required subcommand. -Specify the system -file or portable file to be read as a string file name or -a file handle (@pxref{File Handles}). +The @subcmd{FILE} subcommand is the only required subcommand. Specify +the SPSS system file, SPSS/PC+ system file, or SPSS portable file to +be read as a string file name or a file handle (@pxref{File Handles}). By default, all the variables in a file are read. The DROP subcommand can be used to specify a list of variables that are not to be @@ -175,10 +184,11 @@ Each may be present any number of times. @cmd{GET} never modifies a file on disk. Only the active dataset read from the file is affected by these subcommands. -@pspp{} tries to automatically detect the encoding of string data in the -file. Sometimes, however, this does not work well, -especially for files written by old versions of SPSS or @pspp{}. Specify -the @subcmd{ENCODING} subcommand with an @acronym{IANA} character set name as its string +@pspp{} automatically detects the encoding of string data in the file, +when possible. The character encoding of old SPSS system files cannot +always be guessed correctly, and SPSS/PC+ system files do not include +any indication of their encoding. Specify the @subcmd{ENCODING} +subcommand with an @acronym{IANA} character set name as its string argument to override the default. Use @cmd{SYSFILE INFO} to analyze the encodings that might be valid for a system file. The @subcmd{ENCODING} subcommand is a @pspp{} extension. @@ -370,7 +380,7 @@ GET DATA /TYPE=TXT [ENCODING='@var{encoding}'] [/ARRANGEMENT=@{DELIMITED,FIXED@}] [/FIRSTCASE=@{@var{first_case}@}] - [/IMPORTCASE=@{ALL,FIRST @var{max_cases},PERCENT @var{percent}@}] + [/IMPORTCASES=...] @dots{}additional subcommands depending on ARRANGEMENT@dots{} @end display @@ -398,19 +408,13 @@ line. To skip lines at the beginning of an input file, set @subcmd{FIRSTCASE} to the number of the first line to read: 2 to skip the first line, 3 to skip the first two lines, and so on. -@subcmd{IMPORTCASE} can be used to limit the number of cases read from the -input file. With the default setting, ALL, all cases in the file are -read. Specify FIRST @var{max_cases} to read at most @var{max_cases} cases -from the file. Use @subcmd{PERCENT @var{percent}} to read only @var{percent} -percent, approximately, of the cases contained in the file. (The -percentage is approximate, because there is no way to accurately count -the number of cases in the file without reading the entire file. The -number of cases in some kinds of unusual files cannot be estimated; -@pspp{} will read all cases in such files.) +@subcmd{IMPORTCASES} is ignored, for compatibility. Use @cmd{N OF +CASES} to limit the number of cases read from a file (@pxref{N OF +CASES}), or @cmd{SAMPLE} to obtain a random sample of cases +(@pxref{SAMPLE}). -@subcmd{FIRSTCASE} and @subcmd{IMPORTCASE} may be used with delimited and fixed-format -data. The remaining subcommands, which apply only to one of the two file -arrangements, are described below. +The remaining subcommands apply only to one of the two file +arrangements, described below. @menu * GET DATA /TYPE=TXT /ARRANGEMENT=DELIMITED:: @@ -428,7 +432,7 @@ GET DATA /TYPE=TXT [/IMPORTCASE=@{ALL,FIRST @var{max_cases},PERCENT @var{percent}@}] /DELIMITERS="@var{delimiters}" - [/QUALIFIER="@var{quotes}" [/ESCAPE]] + [/QUALIFIER="@var{quotes}" [/DELCASE=@{LINE,VARIABLES @var{n_variables}@}] /VARIABLES=@var{del_var1} [@var{del_var2}]@dots{} where each @var{del_var} takes the form: @@ -467,15 +471,10 @@ matching quote. Intervening delimiters become part of the field, instead of terminating it. The ability to specify more than one quote character is a @pspp{} extension. -By default, a character specified on @subcmd{QUALIFIER} cannot itself be -embedded within a field that it quotes, because the quote character -always terminates the quoted field. With ESCAPE, however, a doubled -quote character within a quoted field inserts a single instance of the -quote into the field. For example, if @samp{'} is specified on -@subcmd{QUALIFIER}, then without ESCAPE @code{'a''b'} specifies a pair of -fields that contain @samp{a} and @samp{b}, but with ESCAPE it -specifies a single field that contains @samp{a'b}. ESCAPE is a @pspp{} -extension. +The character specified on @subcmd{QUALIFIER} can be embedded within a +field that it quotes by doubling the qualifier. For example, if +@samp{'} is specified on @subcmd{QUALIFIER}, then @code{'a''b'} +specifies a field that contains @samp{a'b}. The @subcmd{DELCASE} subcommand controls how data may be broken across lines in the data file. With LINE, the default setting, each line must contain @@ -772,6 +771,45 @@ The @subcmd{NAMES} and @subcmd{MAP} subcommands are currently ignored. @cmd{SAVE} causes the data to be read. It is a procedure. +@node SAVE DATA COLLECTION +@section SAVE DATA COLLECTION +@vindex SAVE DATA COLLECTION + +@display +SAVE DATA COLLECTION + /OUTFILE=@{'@var{file_name}',@var{file_handle}@} + /METADATA=@{'@var{file_name}',@var{file_handle}@} + /@{UNCOMPRESSED,COMPRESSED,ZCOMPRESSED@} + /PERMISSIONS=@{WRITEABLE,READONLY@} + /DROP=@var{var_list} + /KEEP=@var{var_list} + /VERSION=@var{version} + /RENAME=(@var{src_names}=@var{target_names})@dots{} + /NAMES + /MAP +@end display + +Like @cmd{SAVE}, @cmd{SAVE DATA COLLECTION} writes the dictionary and +data in the active dataset to a system file. In addition, it writes +metadata to an additional XML metadata file. + +OUTFILE is required. Specify the system file to be written as a +string file name or a file handle (@pxref{File Handles}). + +METADATA is also required. Specify the metadata file to be written as +a string file name or a file handle. Metadata files customarily use a +@file{.mdd} extension. + +The current implementation of this command is experimental. It only +outputs an approximation of the metadata file format. Please report +bugs. + +Other subcommands are optional. They have the same meanings as in the +@cmd{SAVE} command. + +@cmd{SAVE DATA COLLECTION} causes the data to be read. It is a +procedure. + @node SAVE TRANSLATE @section SAVE TRANSLATE @vindex SAVE TRANSLATE @@ -816,13 +854,15 @@ values as if they were not missing. Specify MISSING=RECODE to output numeric user-missing values like system-missing values and string user-missing values as all spaces. -By default, all the variables in the active dataset dictionary are saved -to the system file, but @subcmd{DROP} or @subcmd{KEEP} can select a subset of variable -to save. The @subcmd{RENAME} subcommand can also be used to change the names -under which variables are saved. @subcmd{UNSELECTED} determines whether cases -filtered out by the @cmd{FILTER} command are written to the output file. -These subcommands have the same syntax and meaning as on the -@cmd{SAVE} command (@pxref{SAVE}). +By default, all the variables in the active dataset dictionary are +saved to the system file, but @subcmd{DROP} or @subcmd{KEEP} can +select a subset of variable to save. The @subcmd{RENAME} subcommand +can also be used to change the names under which variables are saved; +because they are used only in the output, these names do not have to +conform to the usual PSPP variable naming rules. @subcmd{UNSELECTED} +determines whether cases filtered out by the @cmd{FILTER} command are +written to the output file. These subcommands have the same syntax +and meaning as on the @cmd{SAVE} command (@pxref{SAVE}). Each supported file type has additional subcommands, explained in separate sections below. @@ -914,20 +954,21 @@ qualifier character that appears within a value is doubled. SYSFILE INFO FILE='@var{file_name}' [ENCODING='@var{encoding}']. @end display -@cmd{SYSFILE INFO} reads the dictionary in a system file and -displays the information in its dictionary. +@cmd{SYSFILE INFO} reads the dictionary in an SPSS system file, +SPSS/PC+ system file, or SPSS portable file, and displays the +information in its dictionary. -Specify a file name or file handle. @cmd{SYSFILE INFO} reads that file as -a system file and displays information on its dictionary. +Specify a file name or file handle. @cmd{SYSFILE INFO} reads that +file and displays information on its dictionary. -@pspp{} tries to automatically detect the encoding of string data in -the file. Sometimes, however, this does not work well, especially for -files written by old versions of SPSS or @pspp{}. Specify the -@subcmd{ENCODING} subcommand with an @acronym{IANA} character set name -as its string argument to override the default, or specify -@code{ENCODING='DETECT'} to analyze and report possibly valid -encodings for the system file. The @subcmd{ENCODING} subcommand is a -@pspp{} extension. +@pspp{} automatically detects the encoding of string data in the file, +when possible. The character encoding of old SPSS system files cannot +always be guessed correctly, and SPSS/PC+ system files do not include +any indication of their encoding. Specify the @subcmd{ENCODING} +subcommand with an @acronym{IANA} character set name as its string +argument to override the default, or specify @code{ENCODING='DETECT'} +to analyze and report possibly valid encodings for the system file. +The @subcmd{ENCODING} subcommand is a @pspp{} extension. @cmd{SYSFILE INFO} does not affect the current active dataset. @@ -946,7 +987,7 @@ XEXPORT /MAP @end display -The @cmd{EXPORT} transformation writes the active dataset dictionary and +The @cmd{XEXPORT} transformation writes the active dataset dictionary and data to a specified portable file. This transformation is a @pspp{} extension.