X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?p=pspp-builds.git;a=blobdiff_plain;f=TODO;h=4889f5d0cbfa02952501e880e9a7f5baae6048c9;hp=198bf210d1283b712d27739bd5f077066167c56d;hb=HEAD;hpb=998c6bac5f1d781505591ac6b3e78df25e566282 diff --git a/TODO b/TODO index 198bf210..4889f5d0 100644 --- a/TODO +++ b/TODO @@ -1,79 +1,22 @@ -Time-stamp: <2006-02-17 22:06:31 blp> +Time-stamp: <2006-12-17 18:45:35 blp> Get rid of need for GNU diff in `make check'. -Get rid of need for file name canonicalization. - -Format specifier code needs to be rewritten for lowered crappiness. - CROSSTABS needs to be re-examined. -RANK, which is needed for the Wilcoxon signed-rank statistic, Mann-Whitney U, -Kruskal-Wallis on NPAR TESTS and for Spearman and the Johnkheere trend test (in -other procedures). - -lex_token_representation() should take a buffer to fill. - -Make valgrind --leak-check=yes --show-reachable=yes work. - -Add NOT_REACHED() macro. - -Add compression to casefiles. - Scratch variables should not be available for use following TEMPORARY. Check our results against the NIST StRD benchmark results at strd.itl.nist.gov/div898/strd -In debug mode hash table code should verify that collisions are reasonably low. - -Use AFM files instead of Groff font files, and include AFMs for our default -fonts with the distribution. - Storage of value labels on disk is inefficient. Invent new data structure. -Add an output flag which would cause a page break if a table segment could fit -vertically on a page but it just happens to be positioned such that it won't. - Fix spanned joint cells, i.e., EDLEVEL on crosstabs.stat. -Cell footnotes. - -PostScript driver should emit thin lines, then thick lines, to optimize time -and space. - -Should be able to bottom-justify cells. It'll be expensive, though, by -requiring an extra metrics call. - -Perhaps instead of the current lines we should define the following line types: -null, thin, thick, double. It might look pretty classy. - -Perhaps thick table borders that are cut off by a page break should decay to -thin borders. (i.e., on a thick bordered table that's longer than one page, -but narrow, the bottom border would be thin on the first page, and the top and -bottom borders on middle pages.) - -Support multi-line titles on tables. (For the first page only, presumably.) - -In addition to searching the source directory, we should search the current -directory (for data files). (Yuck!) - -Fix line-too-long problems in PostScript code, instead of covering them up. -setlinecap is *not* a proper solution. - -Fix som_columns(). - -Improve interactivity of output by allowing a `commit' function for a page. -This will also allow for infinite-length pages. - -Implement thin single lines, should be pretty easy now. - SELECT IF should be moved before other transformations whenever possible. It should only be impossible when one of the variables referred to in SELECT IF is created or modified by a previous transformation. -The manual: add text, add index entries, add examples. - Figure out a stylesheet for messages displayed by PSPP: i.e., what quotation marks around filenames, etc. @@ -95,9 +38,6 @@ From Zvi Grauer and : 6. Categorical data analsys ? -MORE NOTES/IDEAS/BUGS ---------------------- - Sometimes very wide (or very tall) columns can occur in tables. What is a good way to truncate them? It doesn't seem to cause problems for the ascii or postscript drivers, but it's not good in the general case. Should they be @@ -105,12 +45,6 @@ split somehow? (One way that wide columns can occur is through user request, for instance through a wide PRINT request--try time-date.stat with a narrow ascii page or with the postscript driver on letter size paper.) -NULs in input files break the products we're replacing: although it will input -them properly and display them properly as AHEX format, it truncates them in A -format. Also, string-manipulation functions such as CONCAT truncate their -results after the first NUL. This should simplify the result of PSPP design. -Perhaps those ugly a_string, b_string, ..., can all be eliminated. - From Moshe Braner : An idea regarding MATCH FILES, again getting BEYOND the state of SPSS: it always bothered me that if I have a large data file and I want to match it to a small lookup table, via @@ -124,122 +58,6 @@ whatever) for it. Then read the /FILE and use the index to match to each case. OTOH, if the /TABLE is too large, then do it the old way, complaining if either file is not sorted on key. ----------------------------------------------------------------------- -Statistical procedures: - -For each case we read from the input program: - -1. Execute permanent transformations. If these drop the case, stop. -2. N OF CASES. If we have already written N cases, stop. -3. Write case to replacement active file. -4. Execute temporary transformations. If these drop the case, stop. -5. Post-TEMPORARY N OF CASES. If we have already analyzed N cases, stop. -6. FILTER, PROCESS IF. If these drop the case, stop. -7. Pass case to procedure. - -Ugly cases: - -LAG records cases in step 3. - -AGGREGATE: When output goes to an external file, this is just an ordinary -procedure. When output goes to the active file, step 3 should be skipped, -because AGGREGATE creates its own case sink and writes to it in step 7. Also, -TEMPORARY has no effect and we just cancel it. Regardless of direction of -output, we should not implement AGGREGATE through a transformation because that -will fail to honor FILTER, PROCESS IF, N OF CASES. - -ADD FILES: Essentially an input program. It silently cancels unclosed LOOPs -and DO IFs. If the active file is used for input, then runs EXECUTE (if there -are any transformations) and then steals vfm_source and encapsulates it. If -the active file is not used for input, then it cancels all the transformations -and deletes the original active file. - -CASESTOVARS: ??? - -FLIP: - -MATCH FILES: Similar to AGGREGATE. This is a procedure. When the active file -is used for input, it reads the active file; otherwise, it just cancels all the -transformations and deletes the original active file. Step 3 should be -skipped, because MATCH FILES creates its own case sink and writes to it in step -7. TEMPORARY is not allowed. - -MODIFY VARS: - -REPEATING DATA: - -SORT CASES: - -UPDATE: same as ADD FILES. - -VARSTOCASES: ??? ----------------------------------------------------------------------- -N OF CASES - - * Before TEMPORARY, limits number of cases sent to the sink. - - * After TEMPORARY, limits number of cases sent to the procedure. - - * Without TEMPORARY, those are the same cases, so it limits both. - -SAMPLE - - * Sample is just a transformation. It has no special properties. - -FILTER - - * Always selects cases sent to the procedure. - - * No effect on cases sent to sink. - - * Before TEMPORARY, selection is permanent. After TEMPORARY, - selection stops after a procedure. - -PROCESS IF - - * Always selects cases sent to the procedure. - - * No effect on cases sent to sink. - - * Always stops after a procedure. - -SPLIT FILE - - * Ignored by AGGREGATE. Used when procedures write matrices. - - * Always applies to the procedure. - - * Before TEMPORARY, splitting is permanent. After TEMPORARY, - splitting stops after a procedure. - -TEMPORARY - - * TEMPORARY has no effect on AGGREGATE when output goes to the active file. - - * SORT CASES, ADD FILES, RENAME VARIABLES, CASESTOVARS, VARSTOCASES, - COMPUTE with a lag function cannot be used after TEMPORARY. - - * Cannot be used in DO IF...END IF or LOOP...END LOOP. - - * FLIP ignores TEMPORARY. All transformations become permanent. - - * MATCH FILES and UPDATE cannot be used after TEMPORARY if active - file is an input source. - - * RENAME VARIABLES is invalid after TEMPORARY. - - * WEIGHT, SPLIT FILE, N OF CASES, FILTER, PROCESS IF apply only to - the next procedure when used after TEMPORARY. - -WEIGHT - - * Always applies to the procedure. - - * Before TEMPORARY, weighting is permanent. After TEMPORARY, - weighting stops after a procedure. - - -------------------------------------------------------------------------------- Local Variables: mode: text fill-column: 79