X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=TODO;h=302175eb347d08c7e24ed88fae47f4b2f683f3b1;hb=2876dcbf55e4ed59b17b7a164b0157aeeafd5f3b;hp=b5ae842d7f8ab18e60a73c2817fdbaa98a7ef2df;hpb=ed119c70c9af58568acf4d09e1f52abacc7cd05c;p=pspp-builds.git diff --git a/TODO b/TODO index b5ae842d..302175eb 100644 --- a/TODO +++ b/TODO @@ -1,12 +1,11 @@ -Time-stamp: <2004-03-26 01:12:33 blp> +Time-stamp: <2004-11-30 22:59:24 blp> What Ben's working on now. -------------------------- -Procedures need to be able to make multiple passes. +Workspace exhaustion heuristics. -Write a better descriptive stats evaluator based on NR two-pass technique, -revise all existing code to use it. +Does SET work correctly? Update q2c input format description. @@ -14,9 +13,21 @@ Rewrite output subsystem, break into multiple processes. CROSSTABS needs to be re-examined. +RANK, which is needed for the Wilcoxon signed-rank statistic, Mann-Whitney U, +Kruskal-Wallis on NPAR TESTS and for Spearman and the Johnkheere trend test (in +other procedures). + TODO ---- +Make valgrind --leak-check=yes --show-reachable=yes work. + +Add Boolean type. + +Add NOT_REACHED() macro. + +Add compression to casefiles. + Expressions need to be able to abbreviate function names. XDATE.QUARTER abbreviates to XDA.QUA, etc. @@ -41,16 +52,9 @@ strd.itl.nist.gov/div898/strd In debug mode hash table code should verify that collisions are reasonably low. -Use posix_fadvise(POSIX_FADV_SEQUENTIAL) where available. - Use AFM files instead of Groff font files, and include AFMs for our default fonts with the distribution. -Add libplot output driver. Suggested by Robert S. Maier -: "it produces output in idraw-editable PS format, PCL5 -format, xfig-editable format, Illustrator format,..., and can draw vector -graphics on X11 displays also". - Storage of value labels on disk is inefficient. Invent new data structure. Add an output flag which would cause a page break if a table segment could fit @@ -86,16 +90,6 @@ directory (for data files). (Yuck!) Fix line-too-long problems in PostScript code, instead of covering them up. setlinecap is *not* a proper solution. -Need a better way than MAX_WORKSPACE to detect low-memory conditions. - -When malloc() returns 0, page to disk and free() unnecessary data. - -Remove ccase * argument from procfunc argument to procedure(). - -See if process_active_file() has wider applicability. - -Eliminate private data in struct variable through use of pointers. - Fix som_columns(). Has glob.c been pared down enough? @@ -103,9 +97,6 @@ Has glob.c been pared down enough? Improve interactivity of output by allowing a `commit' function for a page. This will also allow for infinite-length pages. -All the tests need to be looked over. Some of the SET calls don't make sense -any more. - Implement thin single lines, should be pretty easy now. SELECT IF should be moved before other transformations whenever possible. It @@ -120,21 +111,10 @@ it is used (in dfm.c:cmd_begin_data), not after-the-fact detection. Figure out a stylesheet for messages displayed by PSPP: i.e., what quotation marks around filenames, etc. -Data input and data output are currently arranged in reciprocal pairs: input is -done directly, with write_record() or whatever; output is done on a callback -event-driven basis. It would definitely be easier if both could be done on a -direct basis, with read_record() and write_record() routines, with a coroutine -implementation (see Knuth). But I'm not sure that coroutines can be -implemented in ANSI C. This will require some thought. Perhaps 0.4.0 can do -this. - New SET subcommand: OUTPUT. i.e., SET OUTPUT="filename" to send output to that file; SET OUTPUT="filename"(APPEND) to append to that file; SET OUTPUT=DEFAULT to reset everything. There might be a better approach, though--think about it. -HDF export capabilities (http://hdf.ncsa.uiuc.edu). Suggested by Marcus -G. Daniels . - From Zvi Grauer and : 1. design of experiments software, specifically Factorial, response surface @@ -301,36 +281,9 @@ string, 1-char strings, and 255-char strings. g. Test the code. Write some test syntax files. Examine the output carefully. -NOTES ON SEARCH ALGORITHMS --------------------------- - -1. Trees are nicer when you want a sorted table. However, you can always -sort a hash table after you're done adding values. - -2. Brent's variation of Algorithm D is best when the table is fixed: it's -memory-efficient, having small, fixed overhead. It's easier to use -when you know in advance how many entries the table will contain. - -3. Algorithm L is rather slow for a hash algorithm, however it's easy. - -4. Chaining is best in terms of speed; ordered/self-ordering is even -better. - -5. Rehashing is slow. - -6. Might want to decide on an algorithm empirically since there are no -clear mathematical winners in some cases. - -7. gprof? Hey, it works! - MORE NOTES/IDEAS/BUGS --------------------- -The behavior of converting a floating point to an integer when the value of the -float is out of range of the integer type is UNDEFINED! See ANSI 6.2.1.3. - -What should we do for *negative* times in expressions? - Sometimes very wide (or very tall) columns can occur in tables. What is a good way to truncate them? It doesn't seem to cause problems for the ascii or postscript drivers, but it's not good in the general case. Should they be @@ -367,7 +320,7 @@ For each case we read from the input program: 3. Write case to replacement active file. 4. Execute temporary transformations. If these drop the case, stop. 5. Post-TEMPORARY N OF CASES. If we have already analyzed N cases, stop. -6. FILTER, PROCESS IF. If these drop the case, go to 5. +6. FILTER, PROCESS IF. If these drop the case, stop. 7. Pass case to procedure. Ugly cases: