From: Ben Pfaff Date: Thu, 22 Apr 2010 04:11:48 +0000 (-0700) Subject: Document system file record type 7, subtype 16 as 64-bit number of cases. X-Git-Tag: v0.7.5~51 X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=b8f75b2ac6bc701ecacaa248d630918d7a7346e2;p=pspp-builds.git Document system file record type 7, subtype 16 as 64-bit number of cases. I found this record type and subtype in all .sav files written by SPSS 14 and later (and in one file written by SPSS 13). After scratching my head a bit I realized that it always contained the same value as the "ncases" field in the top-level file header, but written as a 64-bit number. So presumably the purpose is to allow for a 64-bit count of cases. --- diff --git a/doc/dev/system-file-format.texi b/doc/dev/system-file-format.texi index c1d1e421..d1d00873 100644 --- a/doc/dev/system-file-format.texi +++ b/doc/dev/system-file-format.texi @@ -5,8 +5,10 @@ A system file encapsulates a set of cases and dictionary information that describes how they may be interpreted. This chapter describes the format of a system file. -System files use three data types: 8-bit characters, 32-bit integers, -and 64-bit floating points, called here @code{char}, @code{int32}, and +System files use four data types: 8-bit characters, 32-bit integers, +64-bit integers, +and 64-bit floating points, called here @code{char}, @code{int32}, +@code{int64}, and @code{flt64}, respectively. Data is not necessarily aligned on a word or double-word boundary: the long variable name record (@pxref{Long Variable Names Record}) and very long string records (@pxref{Very Long @@ -58,24 +60,7 @@ if present. Document record, if present. @item -Any of the following records, if present, in any order: - -@itemize @minus -@item -Machine integer info record. - -@item -Machine floating-point info record. - -@item -Variable display parameter record. - -@item -Long variable names record. - -@item -Miscellaneous informational records. -@end itemize +Any records not explicitly included in this list, in any order. @item Dictionary termination record. @@ -99,6 +84,7 @@ Each type of record is described separately below. * Character Encoding Record:: * Long String Value Labels Record:: * Data File and Variable Attributes Records:: +* Extended Number of Cases Record:: * Miscellaneous Informational Records:: * Dictionary Termination Record:: * Data Record:: @@ -166,7 +152,7 @@ In the general case it is not possible to determine the number of cases that will be output to a system file at the time that the header is written. The way that this is dealt with is by writing the entire system file, including the header, then seeking back to the beginning of -the file and writing just the @code{ncases} field. For `files' in which +the file and writing just the @code{ncases} field. For files in which this is not valid, the seek operation fails. In this case, @code{ncases} remains -1. @@ -1000,6 +986,45 @@ will contain a variable attribute record with the following contents: 00000030 0a 29 |.) | @end example +@node Extended Number of Cases Record +@section Extended Number of Cases Record + +The file header record expresses the number of cases in the system +file as an int32 (@pxref{File Header Record}). This record allows the +number of cases in the system file to be expressed as a 64-bit number. + +@example +int32 rec_type; +int32 subtype; +int32 size; +int32 count; +int64 unknown; +int64 ncases64; +@end example + +@table @code +@item int32 rec_type; +Record type. Always set to 7. + +@item int32 subtype; +Record subtype. Always set to 16. + +@item int32 size; +Size of each element. Always set to 8. + +@item int32 count; +Number of pieces of data in the data part. Alway set to 2. + +@item int64 unknown; +Meaning unknown. Always set to 1. + +@item int64 ncases64; +Number of cases in the file as a 64-bit integer. Presumably this +could be -1 to indicate that the number of cases is unknown, for the +same reason as @code{ncases} in the file header record, but this has +not been observed in the wild. +@end table + @node Miscellaneous Informational Records @section Miscellaneous Informational Records diff --git a/src/data/sys-file-reader.c b/src/data/sys-file-reader.c index f63a122f..d9d26d0a 100644 --- a/src/data/sys-file-reader.c +++ b/src/data/sys-file-reader.c @@ -1,5 +1,5 @@ /* PSPP - a program for statistical analysis. - Copyright (C) 1997-9, 2000, 2006, 2007, 2009 Free Software Foundation, Inc. + Copyright (C) 1997-9, 2000, 2006, 2007, 2009, 2010 Free Software Foundation, Inc. This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -845,7 +845,7 @@ read_extension_record (struct sfm_reader *r, struct dictionary *dict, return; case 16: - /* New in SPSS v14? Unknown purpose. */ + /* Extended number of cases. Not important. */ break; case 17: diff --git a/tests/dissect-sysfile.c b/tests/dissect-sysfile.c index 8bab745f..2bfe298d 100644 --- a/tests/dissect-sysfile.c +++ b/tests/dissect-sysfile.c @@ -1,5 +1,5 @@ /* PSPP - a program for statistical analysis. - Copyright (C) 2007, 2008, 2009 Free Software Foundation, Inc. + Copyright (C) 2007, 2008, 2009, 2010 Free Software Foundation, Inc. This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -74,6 +74,7 @@ static void read_datafile_attributes (struct sfm_reader *r, size_t size, size_t count); static void read_variable_attributes (struct sfm_reader *r, size_t size, size_t count); +static void read_ncases64 (struct sfm_reader *, size_t size, size_t count); static void read_character_encoding (struct sfm_reader *r, size_t size, size_t count); static void read_long_string_value_labels (struct sfm_reader *r, @@ -97,6 +98,7 @@ static void sys_error (struct sfm_reader *, const char *, ...) static void read_bytes (struct sfm_reader *, void *, size_t); static int read_int (struct sfm_reader *); +static int64_t read_int64 (struct sfm_reader *); static double read_float (struct sfm_reader *); static void read_string (struct sfm_reader *, char *, size_t); static void skip_bytes (struct sfm_reader *, size_t); @@ -535,8 +537,8 @@ read_extension_record (struct sfm_reader *r) return; case 16: - /* New in SPSS v14? Unknown purpose. */ - break; + read_ncases64 (r, size, count); + return; case 17: read_datafile_attributes (r, size, count); @@ -750,6 +752,31 @@ read_attributes (struct sfm_reader *r, struct text_record *text, } } +/* Read extended number of cases record. */ +static void +read_ncases64 (struct sfm_reader *r, size_t size, size_t count) +{ + int64_t unknown, ncases64; + + if (size != 8) + { + sys_warn (r, _("Bad size %zu for extended number of cases."), size); + skip_bytes (r, size * count); + return; + } + if (count != 2) + { + sys_warn (r, _("Bad count %zu for extended number of cases."), size); + skip_bytes (r, size * count); + return; + } + unknown = read_int64 (r); + ncases64 = read_int64 (r); + printf ("%08lx: extended number of cases: " + "unknown=%"PRId64", ncases64=%"PRId64"\n", + ftell (r->file), unknown, ncases64); +} + static void read_datafile_attributes (struct sfm_reader *r, size_t size, size_t count) { @@ -1087,6 +1114,16 @@ read_int (struct sfm_reader *r) return integer_get (r->integer_format, integer, sizeof integer); } +/* Reads a 64-bit signed integer from R and returns its value in + host format. */ +static int64_t +read_int64 (struct sfm_reader *r) +{ + uint8_t integer[8]; + read_bytes (r, integer, sizeof integer); + return integer_get (r->integer_format, integer, sizeof integer); +} + /* Reads a 64-bit floating-point number from R and returns its value in host format. */ static double