X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?p=pspp-builds.git;a=blobdiff_plain;f=TODO;h=4889f5d0cbfa02952501e880e9a7f5baae6048c9;hp=8a798cf2aa38be74f995ad4044ab9c8e9cc424c7;hb=HEAD;hpb=897a260ef7a8b954d56698cc40241a3197127505 diff --git a/TODO b/TODO index 8a798cf2..4889f5d0 100644 --- a/TODO +++ b/TODO @@ -1,107 +1,25 @@ -Time-stamp: <2005-08-02 10:24:25 blp> +Time-stamp: <2006-12-17 18:45:35 blp> Get rid of need for GNU diff in `make check'. -Get rid of need for file name canonicalization. - -Use getsubopt()? - -Format specifier and missing values code needs to be rewritten for lowered -crappiness. - CROSSTABS needs to be re-examined. -RANK, which is needed for the Wilcoxon signed-rank statistic, Mann-Whitney U, -Kruskal-Wallis on NPAR TESTS and for Spearman and the Johnkheere trend test (in -other procedures). - -lex_token_representation() should take a buffer to fill. - -Make valgrind --leak-check=yes --show-reachable=yes work. - -Add NOT_REACHED() macro. - -Add compression to casefiles. - -There needs to be another layer onto the lexer, which should probably be -entirely rewritten anyway. The lexer needs to read entire *commands* at a -time, not just a *line* at a time. It also needs to support arbitrary putback, -probably by just backing up the "current position" in the command buffer. - Scratch variables should not be available for use following TEMPORARY. -Details of N OF CASES, SAMPLE, FILTER, PROCESS IF, TEMPORARY, etc., need to be -checked against the documentation. See notes on these at end of file for a -start. - Check our results against the NIST StRD benchmark results at strd.itl.nist.gov/div898/strd -In debug mode hash table code should verify that collisions are reasonably low. - -Use AFM files instead of Groff font files, and include AFMs for our default -fonts with the distribution. - Storage of value labels on disk is inefficient. Invent new data structure. -Add an output flag which would cause a page break if a table segment could fit -vertically on a page but it just happens to be positioned such that it won't. - Fix spanned joint cells, i.e., EDLEVEL on crosstabs.stat. -Cell footnotes. - -PostScript driver should emit thin lines, then thick lines, to optimize time -and space. - -New functions? var_name_or_label(), tab_value_or_label() - -Should be able to bottom-justify cells. It'll be expensive, though, by -requiring an extra metrics call. - -Perhaps instead of the current lines we should define the following line types: -null, thin, thick, double. It might look pretty classy. - -Perhaps thick table borders that are cut off by a page break should decay to -thin borders. (i.e., on a thick bordered table that's longer than one page, -but narrow, the bottom border would be thin on the first page, and the top and -bottom borders on middle pages.) - -Support multi-line titles on tables. (For the first page only, presumably.) - -Rewrite the convert_F() function in data-out.c to be nicer code. - -In addition to searching the source directory, we should search the current -directory (for data files). (Yuck!) - -Fix line-too-long problems in PostScript code, instead of covering them up. -setlinecap is *not* a proper solution. - -Fix som_columns(). - -Has glob.c been pared down enough? - -Improve interactivity of output by allowing a `commit' function for a page. -This will also allow for infinite-length pages. - -Implement thin single lines, should be pretty easy now. - SELECT IF should be moved before other transformations whenever possible. It should only be impossible when one of the variables referred to in SELECT IF is created or modified by a previous transformation. -The manual: add text, add index entries, add examples. - -The inline file should be improved: There should be *real* detection of whether -it is used (in dfm.c:cmd_begin_data), not after-the-fact detection. - Figure out a stylesheet for messages displayed by PSPP: i.e., what quotation marks around filenames, etc. -New SET subcommand: OUTPUT. i.e., SET OUTPUT="filename" to send output to that -file; SET OUTPUT="filename"(APPEND) to append to that file; SET OUTPUT=DEFAULT -to reset everything. There might be a better approach, though--think about it. - From Zvi Grauer and : 1. design of experiments software, specifically Factorial, response surface @@ -120,47 +38,6 @@ From Zvi Grauer and : 6. Categorical data analsys ? -IDEAS ------ - -In addition to an "infinite journal", we should keep a number of -individual-session journals, pspp.jnl-1 through pspp.jnl-X, renaming and -deleting as needed. All of the journals should have date/time comments. - -Qualifiers for variables giving type--categorical, ordinal, ... - -Analysis Wizard - -Consider consequences of xmalloc(), fail(), hcf() in interactive -use: -a. Can we safely just use setjmp()/longjmp()? -b. Will that leak memory? -i. I don't think so: all procedure-created memory is either -garbage-collected or globally-accessible. -ii. But you never know... esp. w/o Checker. -c. Is this too early to worry? too late? - -Need to implement a shared buffer for funny functions that require relatively -large permanent transient buffers (1024 bytes or so), that is, buffers that are -permanent in the sense that they probably shouldn't be deallocated but are only -used from time to time, buffers that can't be allocated on the stack because -they are of variable and unpredictable but usually relatively small (usually -line buffers). There are too many of these lurking around; can save a sizeable -amount of space at very little overhead and with very little effort by merging -them. - -Clever multiplatform GUI idea (due partly to John Williams): write a GUI in -Java where each statistical procedure dialog box could be downloaded from the -server independently. The statistical procedures would run on (the/a) server -and results would be reported through HTML tables viewed with the user's choice -of web browsers. Help could be implemented through the browser as well. - -HOWTOs ------- - -MORE NOTES/IDEAS/BUGS ---------------------- - Sometimes very wide (or very tall) columns can occur in tables. What is a good way to truncate them? It doesn't seem to cause problems for the ascii or postscript drivers, but it's not good in the general case. Should they be @@ -168,12 +45,6 @@ split somehow? (One way that wide columns can occur is through user request, for instance through a wide PRINT request--try time-date.stat with a narrow ascii page or with the postscript driver on letter size paper.) -NULs in input files break the products we're replacing: although it will input -them properly and display them properly as AHEX format, it truncates them in A -format. Also, string-manipulation functions such as CONCAT truncate their -results after the first NUL. This should simplify the result of PSPP design. -Perhaps those ugly a_string, b_string, ..., can all be eliminated. - From Moshe Braner : An idea regarding MATCH FILES, again getting BEYOND the state of SPSS: it always bothered me that if I have a large data file and I want to match it to a small lookup table, via @@ -187,122 +58,6 @@ whatever) for it. Then read the /FILE and use the index to match to each case. OTOH, if the /TABLE is too large, then do it the old way, complaining if either file is not sorted on key. ----------------------------------------------------------------------- -Statistical procedures: - -For each case we read from the input program: - -1. Execute permanent transformations. If these drop the case, stop. -2. N OF CASES. If we have already written N cases, stop. -3. Write case to replacement active file. -4. Execute temporary transformations. If these drop the case, stop. -5. Post-TEMPORARY N OF CASES. If we have already analyzed N cases, stop. -6. FILTER, PROCESS IF. If these drop the case, stop. -7. Pass case to procedure. - -Ugly cases: - -LAG records cases in step 3. - -AGGREGATE: When output goes to an external file, this is just an ordinary -procedure. When output goes to the active file, step 3 should be skipped, -because AGGREGATE creates its own case sink and writes to it in step 7. Also, -TEMPORARY has no effect and we just cancel it. Regardless of direction of -output, we should not implement AGGREGATE through a transformation because that -will fail to honor FILTER, PROCESS IF, N OF CASES. - -ADD FILES: Essentially an input program. It silently cancels unclosed LOOPs -and DO IFs. If the active file is used for input, then runs EXECUTE (if there -are any transformations) and then steals vfm_source and encapsulates it. If -the active file is not used for input, then it cancels all the transformations -and deletes the original active file. - -CASESTOVARS: ??? - -FLIP: - -MATCH FILES: Similar to AGGREGATE. This is a procedure. When the active file -is used for input, it reads the active file; otherwise, it just cancels all the -transformations and deletes the original active file. Step 3 should be -skipped, because MATCH FILES creates its own case sink and writes to it in step -7. TEMPORARY is not allowed. - -MODIFY VARS: - -REPEATING DATA: - -SORT CASES: - -UPDATE: same as ADD FILES. - -VARSTOCASES: ??? ----------------------------------------------------------------------- -N OF CASES - - * Before TEMPORARY, limits number of cases sent to the sink. - - * After TEMPORARY, limits number of cases sent to the procedure. - - * Without TEMPORARY, those are the same cases, so it limits both. - -SAMPLE - - * Sample is just a transformation. It has no special properties. - -FILTER - - * Always selects cases sent to the procedure. - - * No effect on cases sent to sink. - - * Before TEMPORARY, selection is permanent. After TEMPORARY, - selection stops after a procedure. - -PROCESS IF - - * Always selects cases sent to the procedure. - - * No effect on cases sent to sink. - - * Always stops after a procedure. - -SPLIT FILE - - * Ignored by AGGREGATE. Used when procedures write matrices. - - * Always applies to the procedure. - - * Before TEMPORARY, splitting is permanent. After TEMPORARY, - splitting stops after a procedure. - -TEMPORARY - - * TEMPORARY has no effect on AGGREGATE when output goes to the active file. - - * SORT CASES, ADD FILES, RENAME VARIABLES, CASESTOVARS, VARSTOCASES, - COMPUTE with a lag function cannot be used after TEMPORARY. - - * Cannot be used in DO IF...END IF or LOOP...END LOOP. - - * FLIP ignores TEMPORARY. All transformations become permanent. - - * MATCH FILES and UPDATE cannot be used after TEMPORARY if active - file is an input source. - - * RENAME VARIABLES is invalid after TEMPORARY. - - * WEIGHT, SPLIT FILE, N OF CASES, FILTER, PROCESS IF apply only to - the next procedure when used after TEMPORARY. - -WEIGHT - - * Always applies to the procedure. - - * Before TEMPORARY, weighting is permanent. After TEMPORARY, - weighting stops after a procedure. - - -------------------------------------------------------------------------------- Local Variables: mode: text fill-column: 79