Update TODO.

[pspp-builds.git] / TODO
diff --git a/TODO b/TODO

index 2b06c1262544bbf276326f4fb43eb014ad7721b3..302175eb347d08c7e24ed88fae47f4b2f683f3b1 100644 (file)
--- a/TODO
+++ b/TODO
@@ -1,8 +1,46 @@
-Time-stamp: <2004-03-14 21:37:40 blp>
+Time-stamp: <2004-11-30 22:59:24 blp>
+
+What Ben's working on now.
+--------------------------
+
+Workspace exhaustion heuristics.
+
+Does SET work correctly?
+
+Update q2c input format description.
+
+Rewrite output subsystem, break into multiple processes.
+
+CROSSTABS needs to be re-examined.
+
+RANK, which is needed for the Wilcoxon signed-rank statistic, Mann-Whitney U,
+Kruskal-Wallis on NPAR TESTS and for Spearman and the Johnkheere trend test (in
+other procedures).
  
  TODO
  ----
  
+Make valgrind --leak-check=yes --show-reachable=yes work.
+
+Add Boolean type.
+
+Add NOT_REACHED() macro.
+
+Add compression to casefiles.
+
+Expressions need to be able to abbreviate function names.  XDATE.QUARTER
+abbreviates to XDA.QUA, etc.
+
+The expression tests need tests for XDATE and a few others, see
+tests/xforms/expressions.sh comments for details.
+
+Expressions need random distribution functions.
+
+There needs to be another layer onto the lexer, which should probably be
+entirely rewritten anyway.  The lexer needs to read entire *commands* at a
+time, not just a *line* at a time.  It also needs to support arbitrary putback,
+probably by just backing up the "current position" in the command buffer.
+
  Scratch variables should not be available for use following TEMPORARY.
  
  Details of N OF CASES, SAMPLE, FILTER, PROCESS IF, TEMPORARY, etc., need to be
@@ -14,18 +52,9 @@ strd.itl.nist.gov/div898/strd
  
  In debug mode hash table code should verify that collisions are reasonably low.
  
-Use posix_fadvise(POSIX_FADV_SEQUENTIAL) where available.
-
-random.c should not know about set_seed.
-
  Use AFM files instead of Groff font files, and include AFMs for our default
  fonts with the distribution.
  
-Add libplot output driver.  Suggested by Robert S. Maier
-<rsm@math.arizona.edu>: "it produces output in idraw-editable PS format, PCL5
-format, xfig-editable format, Illustrator format,..., and can draw vector
-graphics on X11 displays also".
-
  Storage of value labels on disk is inefficient.  Invent new data structure.
  
  Add an output flag which would cause a page break if a table segment could fit
@@ -61,31 +90,13 @@ directory (for data files).  (Yuck!)
  Fix line-too-long problems in PostScript code, instead of covering them up.
  setlinecap is *not* a proper solution.
  
-Need a better way than MAX_WORKSPACE to detect low-memory conditions.
-
-When malloc() returns 0, page to disk and free() unnecessary data.
-
-Remove ccase * argument from procfunc argument to procedure().
-
-See if process_active_file() has wider applicability.
-
-Eliminate private data in struct variable through use of pointers.
-
  Fix som_columns().
  
-There needs to be another layer onto the lexer, which should probably be
-entirely rewritten anyway.  The lexer needs to read entire *commands* at a
-time, not just a *line* at a time.  This would vastly simplify the
-(yet-to-be-implemented) logging mechanism and other stuff as well.
-
  Has glob.c been pared down enough?
  
  Improve interactivity of output by allowing a `commit' function for a page.
  This will also allow for infinite-length pages.
  
-All the tests need to be looked over.  Some of the SET calls don't make sense
-any more.
-
  Implement thin single lines, should be pretty easy now.
  
  SELECT IF should be moved before other transformations whenever possible.  It
@@ -100,21 +111,10 @@ it is used (in dfm.c:cmd_begin_data), not after-the-fact detection.
  Figure out a stylesheet for messages displayed by PSPP: i.e., what quotation
  marks around filenames, etc.
  
-Data input and data output are currently arranged in reciprocal pairs: input is
-done directly, with write_record() or whatever; output is done on a callback
-event-driven basis.  It would definitely be easier if both could be done on a
-direct basis, with read_record() and write_record() routines, with a coroutine
-implementation (see Knuth).  But I'm not sure that coroutines can be
-implemented in ANSI C.  This will require some thought.  Perhaps 0.4.0 can do
-this.
-
  New SET subcommand: OUTPUT.  i.e., SET OUTPUT="filename" to send output to that
  file; SET OUTPUT="filename"(APPEND) to append to that file; SET OUTPUT=DEFAULT
  to reset everything.  There might be a better approach, though--think about it.
  
-HDF export capabilities (http://hdf.ncsa.uiuc.edu).  Suggested by Marcus
-G. Daniels <mgd@santafe.edu>.
-
  From Zvi Grauer <z.grauer@csuohio.edu> and <zvi@mail.ohio.net>:
  
     1. design of experiments software, specifically Factorial, response surface
@@ -281,36 +281,9 @@ string, 1-char strings, and 255-char strings.
  
  g. Test the code.  Write some test syntax files.  Examine the output carefully.
  
-NOTES ON SEARCH ALGORITHMS
---------------------------
-
-1. Trees are nicer when you want a sorted table.  However, you can always
-sort a hash table after you're done adding values.
-
-2. Brent's variation of Algorithm D is best when the table is fixed: it's
-memory-efficient, having small, fixed overhead.  It's easier to use
-when you know in advance how many entries the table will contain.
-
-3. Algorithm L is rather slow for a hash algorithm, however it's easy.
-
-4. Chaining is best in terms of speed; ordered/self-ordering is even
-better.
-
-5. Rehashing is slow.
-
-6. Might want to decide on an algorithm empirically since there are no
-clear mathematical winners in some cases.
-
-7. gprof?  Hey, it works!
-
  MORE NOTES/IDEAS/BUGS
  ---------------------
  
-The behavior of converting a floating point to an integer when the value of the
-float is out of range of the integer type is UNDEFINED!  See ANSI 6.2.1.3.
-
-What should we do for *negative* times in expressions?
-
  Sometimes very wide (or very tall) columns can occur in tables.  What is a good
  way to truncate them?  It doesn't seem to cause problems for the ascii or
  postscript drivers, but it's not good in the general case.  Should they be
@@ -347,15 +320,15 @@ For each case we read from the input program:
  3. Write case to replacement active file.
  4. Execute temporary transformations.  If these drop the case, stop.
  5. Post-TEMPORARY N OF CASES.  If we have already analyzed N cases, stop.
-6. FILTER, PROCESS IF.  If these drop the case, go to 5.
+6. FILTER, PROCESS IF.  If these drop the case, stop.
  7. Pass case to procedure.
  
  Ugly cases:
  
-LAG records cases in step 4.
+LAG records cases in step 3.
  
  AGGREGATE: When output goes to an external file, this is just an ordinary
-procedure.  When output goes to the active file, step 4 should be skipped,
+procedure.  When output goes to the active file, step 3 should be skipped,
  because AGGREGATE creates its own case sink and writes to it in step 7.  Also,
  TEMPORARY has no effect and we just cancel it.  Regardless of direction of
  output, we should not implement AGGREGATE through a transformation because that
@@ -373,7 +346,7 @@ FLIP:
  
  MATCH FILES: Similar to AGGREGATE.  This is a procedure.  When the active file
  is used for input, it reads the active file; otherwise, it just cancels all the
-transformations and deletes the original active file.  Step 4 should be
+transformations and deletes the original active file.  Step 3 should be
  skipped, because MATCH FILES creates its own case sink and writes to it in step
  7.  TEMPORARY is not allowed.