pintos-os.org Git - pintos-anon/blob - ta-advice/README

   1                         PINTOS GRADING ADVICE
   2                         =====================
   3
   4 This directory contains advice for TAs regarding Pintos grading.  This
   5 file contains overall advice for grading all the projects, and each
   6 project has a file with additional, more specific advice for grading
   7 that particular project.
   8
   9 Be familiar with the Grading subsection within the Introduction
  10 chapter in the Pintos manual.  The principles stated there should
  11 guide your grading decisions.  You should also carefully read the
  12 Coding Standards chapter and, of course, the assignments themselves.
  13
  14 Grading is inherently subjective.  The most important principle is to
  15 be fair.  Try to be patient with students, in the same way that you
  16 would appreciate a TA being patient with you when you take classes
  17 yourself.  In my experience, this takes practice: many TAs tend to be
  18 fairly impatient in their first quarter of TAing, and then improve in
  19 their second quarter.  I have noticed this pattern in myself and
  20 others.
  21
  22 Submission Structure
  23 ====================
  24
  25 At Stanford, each project submission puts files into a directory named
  26 /usr/class/cs140/submissions/hw<number>/<username>, where <number> is
  27 the project number (between 1 and 4) and <username> is the user name
  28 of the team member who did the project submission.
  29
  30 Each submission directory contains a tarball that actually contains
  31 the submission.  The tarball contains pintos/src and the files and
  32 directories underneath it.  If a student group submits more than once,
  33 then there will be multiple tarballs, one for each submission.
  34
  35 Each submission directory also contains a file named grade.txt that
  36 describes the group, giving first and last name and email address for
  37 each student.  There is only a single copy of this file, regardless of
  38 the number of submissions.
  39
  40 If two different students from a single group both submit the project,
  41 then you can end up with almost-identical submissions in two different
  42 directories.  It's best to check for this before beginning grading, to
  43 avoid duplicating effort.  The check-duplicates script in this
  44 directory can help you with this (there should be a copy of it in
  45 /usr/class/cs140/submissions).
  46
  47 Grading Test Results
  48 ====================
  49
  50 Obtaining test results should be the easier half of the grading
  51 process.  The procedure for obtaining test results for one submitted
  52 project is roughly this:
  53
  54         1. Extract the student code from its tarball into "pintos/src":
  55
  56                 tar xzf <file>.tar.gz
  57
  58         2. Delete the existing pintos/src/tests directory and replace
  59            it by a pristine copy:
  60
  61                 rm -rf pintos/src/tests
  62                 cp -R /usr/class/cs140/winter13/pintos/src/tests pintos/src/tests
  63
  64     (make sure you are using the correct version of Pintos for the current
  65     offering of the course; it may be in a different location than the
  66     example above)
  67
  68         3. Run "make clean" in the top-level directory, to get rid of
  69            any binaries or objects mistakenly included in the
  70            submission:
  71
  72                 (cd pintos/src && make clean)
  73
  74         4. Run "make grade" in the project-specific directory,
  75            e.g. threads:
  76
  77                 (cd pintos/src/threads && make grade)
  78
  79         5. Make a copy of the "grade" file that this produces, which
  80            is in the "build" directory.
  81
  82                 cp pintos/src/threads/build/grade tests.out
  83
  84         6. Compare the grade report that you produced against the one
  85            submitted by the group.  You can use "diff -u" or just
  86            compare the final grades:
  87
  88                 diff -u tests.out pintos/src/threads/GRADE
  89
  90            If there are major discrepancies (e.g. all their tests
  91            passed, but all yours failed) then you should contact the
  92            group.  Otherwise, use the grade report that you produced.
  93
  94            Grade reports can vary for a number of reasons: QEMU is not
  95            fully reproducible, Bochs sometimes has reproducibility
  96            bugs, the compilers used on different machines may produce
  97            code with different behavior, and so on.  Finally, it's
  98            possible that the group submitted a grade report that goes
  99            with an earlier version of their code.
 100
 101         7. Run "make clean" in pintos/src again:
 102
 103                 (cd pintos/src && make clean)
 104
 105            You don't have to do this immediately, but if you try to
 106            grade too many projects without doing so, then you will
 107            probably run out of quota.
 108
 109            An alternative is to do the builds in temporary storage,
 110            e.g. in /tmp.  This will probably be a lot faster than
 111            doing it in AFS, but it is slightly less convenient.
 112
 113 There is a script called run-tests in this directory (and in
 114 /usr/class/cs140/submissions) that can do most of this work for you.
 115 Run "run-tests --help" for instructions.
 116
 117 You can automate running the tests in several directories using a
 118 command like (in the default C shell)
 119         foreach d (*)
 120                 cd $d && run-tests threads
 121         end
 122 or in the Bourne shell:
 123         for d in *; do cd $d && run-tests threads; done
 124
 125 Grading Design
 126 ==============
 127
 128 There are two parts to grading students' designs: their design
 129 documents and their code.  Both are lumped into a single grade, taken
 130 out of 100 points.
 131
 132 The form to use for grading each project is in hw<N>.txt in this
 133 directory.  You should copy this file into each submission directory
 134 and delete the lines that do not apply.  The grading form is divided
 135 into sections, one for each problem, and an OVERALL section.  Each
 136 section has its own set of deductions, with a cap on the total
 137 deductions for that section. To compute the overall design score,
 138 first compute the (capped) deductions within each section, then
 139 subtract the section deductions from 100. If the final score is less
 140 than zero, round it up to zero.
 141
 142 IMPORTANT: Be critical in grading designs.  Most submissions will pass
 143 most of the tests, which means that they get almost 50% of the grade
 144 for "free".  When TAs only take off a few points in the design grade,
 145 then total project scores can average 90% or even higher.  This may
 146 not sound like a bad thing, but it is, because students who receive
 147 numerically high grades think that they did well relative to the other
 148 students.  At the end of the quarter when the curve is applied, these
 149 students are then understandably disappointed or angry that their
 150 final grades are so low when their intermediate grades seemed so high.
 151 It is better to take off lots of points on projects and thereby give
 152 students more realistic expectations about their final course grades.
 153
 154 At the same time, don't be unfair.  You should only deduct points in
 155 situations where students should have known what was expected of them.
 156 For example, don't invent your own standards for coding style based on
 157 what you think is "right": stick to what is documented, or what any
 158 reasonable person might assume.
 159
 160 Grading Design Documents
 161 ------------------------
 162
 163 Be familiar with the Design Document subsection of the Introduction
 164 chapter in the Pintos manual.
 165
 166 Deduct all the points for a given question in these cases:
 167
 168         - Missing: The question is not answered at all.
 169
 170         - Non-responsive: The response does not actually answer what
 171     is being asked.  (If the question does not reasonably apply
 172     to the solution chosen by the group, then the answer should
 173     explain why it does not.)
 174
 175         - Too long: e.g. a "25 words or less" response takes a whole
 176     page.  These qualifiers aim to save the group's time and
 177     your time, so don't waste your time in these cases.
 178
 179         - Too short: The response is evasive or excessively terse to
 180     the point that you don't feel confident in the answer.
 181
 182         - Grossly inaccurate: When you examine the code, you find that
 183     it has no resemblance to the description.
 184
 185         - Not implemented: The functionality described in the answer
 186     was not implemented.  This often happens when a group runs
 187     out of time before implementing the entire project.  Don't
 188     give credit for a design without an implementation.
 189
 190 Take off some points (use your judgment) for:
 191
 192         - Conceptual errors: Statements, assumptions, or strong
 193     implications made in the design document are incorrect,
 194     e.g. assuming that unblocking a thread immediately schedules
 195     it.
 196
 197         - Minor inaccuracies: Some aspects of the code do not match
 198
 199         - Partial response: Multiple questions are asked, but only
 200     some of them are answered.
 201
 202 Minor issues (take off only a few points, and only if the problem
 203 is severe):
 204
 205         - Capitalization, punctuation, spelling, or grammar: An
 206     occasional mistake is tolerable, but repeated or frequent
 207     errors should be penalized.  Try to distinguish grammar
 208     errors made by non-native speakers of English, which are
 209     understandable, from those made by others, which are less
 210     so.
 211
 212           In Emacs, it is easy to check the spelling of a word: put
 213           the cursor on or just after it, then type M-$.  You can also
 214           make it highlight misspelled words with M-x flyspell-buffer.
 215
 216         - Excessive redundancy: The answer restates much of what is
 217     specified in the assignment.
 218
 219 Instructions for recurring questions:
 220
 221     ---- DATA STRUCTURES ----
 222
 223     Copy here the declaration of each new or changed struct or
 224     struct member, global or static variable, typedef, or
 225     enumeration.  Identify the purpose of each in 25 words or
 226     less.
 227
 228         - Deduct points if the required comment on each declaration is
 229     missing.  (The Introduction states "Add a brief comment on
 230     every structure, structure member, global or static
 231     variable, and function definition.")
 232
 233         - Deduct points if the response does not describe *purpose*.
 234     We can see the type and the name of each entity from their
 235     declarations.  But why are they needed?  If the comments
 236     themselves adequately explain purpose, then that is
 237     sufficient. Comments should provide information that is not
 238     obvious from the code.  A common mistake is for a comment to
 239     duplicate information in the variable or function name. For
 240     example, a variable "pageFaultCount" might have the comment
 241     "count of page faults"; that's not much help!
 242
 243     ---- RATIONALE ----
 244
 245     Why did you choose this design?  In what ways is it superior to
 246     another design you considered?
 247
 248         - Deduct points for failing to compare their design to another
 249           *correct* possibility.
 250
 251 Grading Code
 252 ------------
 253
 254 You should start by quickly scanning all the submitted code by eye.
 255 Usually the easiest way to do this is with a command like
 256         diff -urpbN -X /usr/class/cs140/submissions/diff.ignore \
 257                  /usr/class/cs140/pintos/pintos/src pintos/src | less
 258 in a group's top-level directory.  The options to "diff" here are
 259 useful:
 260         -u: produces a "unified" diff that is easier to read than the
 261             default.
 262         -r: recurses on directories.
 263         -p: prints the name of the function in which each difference
 264             lies.
 265         -b: ignores differences in white space.
 266         -N: includes added files in the diff.
 267         -X .../diff.ignore: ignore files that match patterns listed in
 268             diff.ignore, which lists files that you don't really want
 269             to look at.  You can add to the list when you notice files
 270             that should be.
 271
 272 You can page through the "diff" output fairly quickly, perhaps a few
 273 seconds per page.  Nevertheless, you should be able to notice some
 274 important flaws:
 275
 276         - Inconsistent style: indentation changing randomly between 4
 277     spaces and 8 spaces per level, between BSD and GNU brace
 278     placement, and so on.  (The Introduction states "In new
 279     source files, adopt the existing Pintos style by preference,
 280     but make your code self-consistent at the very least. There
 281     should not be a patchwork of different styles that makes it
 282     obvious that three different people wrote the code.")
 283
 284         - Bad style: such as no indentation at all or cramming many statements
 285     onto a single line.
 286
 287         - Many very long source code lines (over 100 columns wide).
 288
 289         - Lack of white space: consistent lack of spaces after commas
 290     or around binary operators that makes code difficult to read.
 291
 292         - Use of static or file scope ("global") variables instead of
 293     automatic, block scope ("local") variables: one student
 294     submission actually declared 12 (!) different global
 295     variables "just so we don't have to make a new var in each
 296     function".  This is unacceptable.
 297
 298         - Use of struct thread members instead of automatic, block
 299     scope ("local") variables: sometimes it's not obvious
 300     whether this is the case, but subtract points when it is.
 301
 302         - Code copied into multiple places that should be abstracted
 303     into a function.
 304
 305         - Gratuitous use of dynamic allocation: e.g. a struct that
 306     contains a pointer to a semaphore instead of a semaphore
 307     itself.