pintos-os.org Git - pspp/blob - doc/dev/data.texi

   1 @c PSPP - a program for statistical analysis.
   2 @c Copyright (C) 2019 Free Software Foundation, Inc.
   3 @c Permission is granted to copy, distribute and/or modify this document
   4 @c under the terms of the GNU Free Documentation License, Version 1.3
   5 @c or any later version published by the Free Software Foundation;
   6 @c with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
   7 @c A copy of the license is included in the section entitled "GNU
   8 @c Free Documentation License".
   9 @c
  10
  11 @node Processing Data
  12 @chapter Processing Data
  13
  14 Developer's Guide
  15
  16 Proposed outline:
  17
  18 @example
  19 * Introduction
  20 * Basic concepts
  21 ** Data sets
  22 ** Variables
  23 ** Dictionaries
  24 ** Coding conventions
  25 ** Pools
  26 * Syntax parsing
  27 * Data processing
  28 ** Reading data
  29 *** Casereaders generalities
  30 *** Casereaders from data files
  31 *** Casereaders from the active dataset
  32 *** Other casereaders
  33 ** Writing data
  34 *** Casewriters generally
  35 *** Casewriters to data files
  36 *** Modifying the active dataset
  37 **** Modifying cases obtained from active dataset casereaders has no real effect
  38 **** Transformations; procedures that transform
  39 ** Transforming data
  40 *** Sorting and merging
  41 *** Filtering
  42 *** Grouping
  43 **** Ordering and interaction of filtering and grouping
  44 *** Multiple passes over data
  45 *** Counting cases and case weights
  46 ** Best practices
  47 *** Multiple passes with filters versus single pass with loops
  48 *** Sequential versus random access
  49 *** Managing memory
  50 *** Passing cases around
  51 *** Renaming casereaders
  52 *** Avoiding excessive buffering
  53 *** Propagating errors
  54 *** Avoid static/global data
  55 *** Don't worry about null filters, groups, etc.
  56 *** Be aware of reference counting semantics for cases
  57 @end example