pintos-os.org Git - pintos-anon/blob - ta-advice/README

   1                         PINTOS GRADING ADVICE
   2                         =====================
   3
   4 This directory contains advice for TAs regarding Pintos grading.  This
   5 file contains overall advice for grading all the projects, and each
   6 project has a file with additional, more specific advice for grading
   7 that particular project.
   8
   9 Be familiar with the Grading subsection within the Introduction
  10 chapter in the Pintos manual.  The principles stated there should
  11 guide your grading decisions.  You should also carefully read the
  12 Coding Standards chapter and, of course, the assignments themselves.
  13
  14 Grading is inherently subjective.  The most important principle is to
  15 be fair.  Try to be patient with students, in the same way that you
  16 would appreciate a TA being patient with you when you take classes
  17 yourself.  In my experience, this takes practice: many TAs tend to be
  18 fairly impatient in their first quarter of TAing, and then improve in
  19 their second quarter.  I have noticed this pattern in myself and
  20 others.
  21
  22 Submission Structure
  23 ====================
  24
  25 At Stanford, each project submission puts files into a directory named
  26 /usr/class/cs140/submissions/hw<number>/<username>, where <number> is
  27 the project number (between 1 and 4) and <username> is the user name
  28 of the team member who did the project submission.
  29
  30 Each submission directory contains a tarball that actually contains
  31 the submission.  The tarball contains pintos/src and the files and
  32 directories underneath it.  If a student group submits more than once,
  33 then there will be multiple tarballs, one for each submission.
  34
  35 Each submission directory also contains a file named grade.txt that
  36 describes the group, giving first and last name and email address for
  37 each student.  There is only a single copy of this file, regardless of
  38 the number of submissions.
  39
  40 If two different students from a single group both submit the project,
  41 then you can end up with almost-identical submissions in two different
  42 directories.  It's best to check for this before beginning grading, to
  43 avoid duplicating effort.  The check-duplicates script in this
  44 directory can help you with this (there should be a copy of it in
  45 /usr/class/cs140/submissions).
  46
  47 Grading Test Results
  48 ====================
  49
  50 Obtaining test results should be the easier half of the grading
  51 process.  The procedure for obtaining test results for one submitted
  52 project is roughly this:
  53
  54         1. Extract the student code from its tarball into "pintos/src":
  55
  56                 tar xzf <file>.tar.gz
  57
  58         2. Delete the existing pintos/src/tests directory and replace
  59            it by a pristine copy:
  60
  61                 rm -rf pintos/src/tests
  62                 cp -R /usr/class/cs140/pintos/pintos/src/tests pintos/src/tests
  63
  64         3. Run "make clean" in the top-level directory, to get rid of
  65            any binaries or objects mistakenly included in the
  66            submission:
  67
  68                 (cd pintos/src && make clean)
  69
  70         4. Run "make grade" in the project-specific directory,
  71            e.g. threads:
  72
  73                 (cd pintos/src/threads && make grade)
  74
  75         5. Make a copy of the "grade" file that this produces, which
  76            is in the "build" directory.
  77
  78                 cp pintos/src/threads/build/grade tests.out
  79
  80         6. Compare the grade report that you produced against the one
  81            submitted by the group.  You can use "diff -u" or just
  82            compare the final grades:
  83
  84                 diff -u tests.out pintos/src/threads/GRADE
  85
  86            If there are major discrepancies (e.g. all their tests
  87            passed, but all yours failed) then you should contact the
  88            group.  Otherwise, use the grade report that you produced.
  89
  90            Grade reports can vary a number of reasons: QEMU is not
  91            fully reproducible, Bochs sometimes has reproducibility
  92            bugs, the compilers used on different machines may produce
  93            code with different behavior, and so on.  Finally, it's
  94            possible that the group submitted a grade report that goes
  95            with an earlier version of their code.
  96
  97         7. Run "make clean" in pintos/src again:
  98
  99                 (cd pintos/src && make clean)
 100
 101            You don't have to do this immediately, but if you try to
 102            grade too many projects without doing so, then you will
 103            probably run out of quota.
 104
 105            An alternative is to do the builds in temporary storage,
 106            e.g. in /tmp.  This will probably be a lot faster than
 107            doing it in AFS, but it is slightly less convenient.
 108
 109 There is a script called run-tests in this directory (and in
 110 /usr/class/cs140/submissions) that can do most of this work for you.
 111 Run "run-tests --help" for instructions.
 112
 113 You can automate running the tests in several directories using a
 114 command like (in the default C shell)
 115         foreach d (*)
 116                 cd $d && run-tests threads
 117         end
 118 or in the Bourne shell:
 119         for d in *; do cd $d && run-tests threads; done
 120
 121 Grading Design
 122 ==============
 123
 124 There are two parts to grading students' designs: their design
 125 documents and their code.  Both are lumped into a single grade, taken
 126 out of 100 points.
 127
 128 A suggested form to use for grading each project is in hw<N>.txt in
 129 this directory.  You should copy this file into each submission
 130 directory and delete the lines that do not apply.
 131
 132 The subtractions for each kind of problem with a submission are
 133 suggestions.  You are free to modify them.  You can also add your own
 134 subtractions for problems that are not listed.
 135
 136 When you add up the subtractions for a project, those for the OVERALL
 137 section are not capped at any particular maximum.  Those for
 138 individual problems are capped at the value of the problem.
 139
 140 IMPORTANT: Be critical in grading designs.  Most submissions will pass
 141 most of the tests, which means that they get almost 50% of the grade
 142 for "free".  When TAs only take off a few points in the design grade,
 143 then total project scores can average 90% or even higher.  This may
 144 not sound like a bad thing, but it is, because students who receive
 145 numerically high grades think that they did well relative to the other
 146 students.  At the end of the quarter when the curve is applied, these
 147 students are then understandably disappointed or angry that their
 148 final grades are so low when their intermediate grades seemed so high.
 149 It is better to take off lots of points on projects and thereby give
 150 students more realistic expectations about their final course grades.
 151
 152 Grading Design Documents
 153 ------------------------
 154
 155 Be familiar with the Design Document subsection of the Introduction
 156 chapter in the Pintos manual.
 157
 158 Deduct all the points for a given question in these cases:
 159
 160         - Missing: The question is not answered at all.
 161
 162         - Non-responsive: The response does not actually answer what
 163           is being asked.  (If the question does not reasonably apply
 164           to the solution chosen by the group, then the answer should
 165           explain why it does not.)
 166
 167         - Too long: e.g. a "25 words or less" response takes a whole
 168           page.  These qualifiers aim to save the group's time and
 169           your time, so don't waste your time in these cases.
 170
 171         - Too short: The response is evasive or excessively terse to
 172           the point that you don't feel confident in the answer.
 173
 174         - Grossly inaccurate: When you examine the code, you find that
 175           it has no resemblance to the description.
 176
 177         - Not implemented: The functionality described in the answer
 178           was not implemented.  This often happens when a group runs
 179           out of time before implementing the entire project.  Don't
 180           give credit for a design without an implementation.
 181
 182 Take off some points (use your judgment) for:
 183
 184         - Capitalization, punctuation, spelling, or grammar: An
 185           occasional mistake is tolerable, but repeated or frequent
 186           errors should be penalized.  Try to distinguish grammar
 187           errors made by non-native speakers of English, which are
 188           understandable, from those made by others, which are less
 189           so.
 190
 191           In Emacs, it is easy to check the spelling of a word: put
 192           the cursor on or just after it, then type M-$.  You can also
 193           make it highlight misspelled words with M-x flyspell-buffer.
 194
 195         - Minor inaccuracies: Some aspects of the code do not match
 196           its description.
 197
 198         - Conceptual errors: Statements, assumptions, or strong
 199           implications made in the design document are incorrect,
 200           e.g. assuming that unblocking a thread immediately schedules
 201           it.
 202
 203         - Partial response: Multiple questions are asked, but only
 204           some of them are answered.
 205
 206         - Excessive redundancy: The answer restates much of what is
 207           specified in the assignment.
 208
 209 Instructions for recurring questions:
 210
 211     ---- DATA STRUCTURES ----
 212
 213     Copy here the declaration of each new or changed struct or
 214     struct member, global or static variable, typedef, or
 215     enumeration.  Identify the purpose of each in 25 words or
 216     less.
 217
 218         - Deduct points if the required comment on each declaration is
 219           missing.  (The Introduction states "Add a brief comment on
 220           every structure, structure member, global or static
 221           variable, and function definition.")
 222
 223         - Deduct points if the response does not describe *purpose*.
 224           We can see the type and the name of each entity from their
 225           declarations.  But why are they needed?  If the comments
 226           themselves adequately explain purpose, then that is
 227           sufficient.
 228
 229     ---- RATIONALE ----
 230
 231     Why did you choose this design?  In what ways is it superior to
 232     another design you considered?
 233
 234         - Deduct points for failing to compare their design to another
 235           *correct* possibility.
 236
 237 Grading Code
 238 ------------
 239
 240 You should start by quickly scanning all the submitted code by eye.
 241 Usually the easiest way to do this is with a command like
 242         diff -urpbN -X /usr/class/cs140/submissions/diff.ignore \
 243                  /usr/class/cs140/pintos/pintos/src pintos/src | less
 244 in a group's top-level directory.  The options to "diff" here are
 245 useful:
 246         -u: produces a "unified" diff that is easier to read than the
 247             default.
 248         -r: recurses on directories.
 249         -p: prints the name of the function in which each difference
 250             lies.
 251         -b: ignores differences in white space.
 252         -N: includes added files in the diff.
 253         -X .../diff.ignore: ignore files that match patterns listed in
 254             diff.ignore, which lists files that you don't really want
 255             to look at.  You can add to the list when you notice files
 256             that should be.
 257
 258 You can page through the "diff" output fairly quickly, perhaps a few
 259 seconds per page.  Nevertheless, you should be able to notice some
 260 important flaws:
 261
 262         - Inconsistent style: indentation changing randomly between 4
 263           spaces and 8 spaces per level, between BSD and GNU brace
 264           placement, and so on.  (The Introduction states "In new
 265           source files, adopt the existing Pintos style by preference,
 266           but make your code self-consistent at the very least. There
 267           should not be a patchwork of different styles that makes it
 268           obvious that three different people wrote the code.")
 269
 270         - Bad style: such as no indentation at all or cramming many statements
 271           onto a single line.
 272
 273         - Many very long source code lines (over 100 columns wide).
 274
 275         - Lack of white space: consistent lack of spaces after commas
 276           or around binary operators that makes code difficult to read.
 277
 278         - Use of static or file scope ("global") variables instead of
 279           automatic, block scope ("local") variables: one student
 280           submission actually declared 12 (!) different global
 281           variables "just so we don't have to make a new var in each
 282           function".  This is unacceptable.
 283
 284         - Use of struct thread members instead of automatic, block
 285           scope ("local") variables: sometimes it's not obvious
 286           whether this is the case, but subtract points when it is.
 287
 288         - Code copied into multiple places that should be abstracted
 289           into a function.
 290
 291         - Gratuitous use of dynamic allocation: e.g. a struct that
 292           contains a pointer to a semaphore instead of a semaphore
 293           itself.