4 This directory contains advice for TAs regarding Pintos grading. This
5 file contains overall advice for grading all the projects, and each
6 project has a file with additional, more specific advice for grading
7 that particular project.
9 Be familiar with the Grading subsection within the Introduction
10 chapter in the Pintos manual. The principles stated there should
11 guide your grading decisions. You should also carefully read the
12 Coding Standards chapter and, of course, the assignments themselves.
14 Grading is inherently subjective. The most important principle is to
15 be fair. Try to be patient with students, in the same way that you
16 would appreciate a TA being patient with you when you take classes
17 yourself. In my experience, this takes practice: many TAs tend to be
18 fairly impatient in their first quarter of TAing, and then improve in
19 their second quarter. I have noticed this pattern in myself and
25 At Stanford, each project submission puts files into a directory named
26 /usr/class/cs140/submissions/hw<number>/<username>, where <number> is
27 the project number (between 1 and 4) and <username> is the user name
28 of the team member who did the project submission.
30 Each submission directory contains a tarball that actually contains
31 the submission. The tarball contains pintos/src and the files and
32 directories underneath it. If a student group submits more than once,
33 then there will be multiple tarballs, one for each submission.
35 Each submission directory also contains a file named grade.txt that
36 describes the group, giving first and last name and email address for
37 each student. There is only a single copy of this file, regardless of
38 the number of submissions.
40 If two different students from a single group both submit the project,
41 then you can end up with almost-identical submissions in two different
42 directories. It's best to check for this before beginning grading, to
43 avoid duplicating effort. The check-duplicates script in this
44 directory can help you with this (there should be a copy of it in
45 /usr/class/cs140/submissions).
50 Obtaining test results should be the easier half of the grading
51 process. The procedure for obtaining test results for one submitted
52 project is roughly this:
54 1. Extract the student code from its tarball into "pintos/src":
58 2. Delete the existing pintos/src/tests directory and replace
59 it by a pristine copy:
61 rm -rf pintos/src/tests
62 cp -R /usr/class/cs140/pintos/pintos/src/tests pintos/src/tests
64 3. Run "make clean" in the top-level directory, to get rid of
65 any binaries or objects mistakenly included in the
68 (cd pintos/src && make clean)
70 4. Run "make grade" in the project-specific directory,
73 (cd pintos/src/threads && make grade)
75 5. Make a copy of the "grade" file that this produces, which
76 is in the "build" directory.
78 cp pintos/src/threads/build/grade tests.out
80 6. Compare the grade report that you produced against the one
81 submitted by the group. You can use "diff -u" or just
82 compare the final grades:
84 diff -u tests.out pintos/src/threads/GRADE
86 If there are major discrepancies (e.g. all their tests
87 passed, but all yours failed) then you should contact the
88 group. Otherwise, use the grade report that you produced.
90 Grade reports can vary a number of reasons: QEMU is not
91 fully reproducible, Bochs sometimes has reproducibility
92 bugs, the compilers used on different machines may produce
93 code with different behavior, and so on. Finally, it's
94 possible that the group submitted a grade report that goes
95 with an earlier version of their code.
97 7. Run "make clean" in pintos/src again:
99 (cd pintos/src && make clean)
101 You don't have to do this immediately, but if you try to
102 grade too many projects without doing so, then you will
103 probably run out of quota.
105 An alternative is to do the builds in temporary storage,
106 e.g. in /tmp. This will probably be a lot faster than
107 doing it in AFS, but it is slightly less convenient.
109 There is a script called run-tests in this directory (and in
110 /usr/class/cs140/submissions) that can do most of this work for you.
111 Run "run-tests --help" for instructions.
113 You can automate running the tests in several directories using a
114 command like (in the default C shell)
116 cd $d && run-tests threads
118 or in the Bourne shell:
119 for d in *; do cd $d && run-tests threads; done
124 There are two parts to grading students' designs: their design
125 documents and their code. Both are lumped into a single grade, taken
128 A suggested form to use for grading each project is in hw<N>.txt in
129 this directory. You should copy this file into each submission
130 directory and delete the lines that do not apply.
132 The subtractions for each kind of problem with a submission are
133 suggestions. You are free to modify them. You can also add your own
134 subtractions for problems that are not listed.
136 When you add up the subtractions for a project, those for the OVERALL
137 section are not capped at any particular maximum. Those for
138 individual problems are capped at the value of the problem.
140 IMPORTANT: Be critical in grading designs. Most submissions will pass
141 most of the tests, which means that they get almost 50% of the grade
142 for "free". When TAs only take off a few points in the design grade,
143 then total project scores can average 90% or even higher. This may
144 not sound like a bad thing, but it is, because students who receive
145 numerically high grades think that they did well relative to the other
146 students. At the end of the quarter when the curve is applied, these
147 students are then understandably disappointed or angry that their
148 final grades are so low when their intermediate grades seemed so high.
149 It is better to take off lots of points on projects and thereby give
150 students more realistic expectations about their final course grades.
152 Grading Design Documents
153 ------------------------
155 Be familiar with the Design Document subsection of the Introduction
156 chapter in the Pintos manual.
158 Deduct all the points for a given question in these cases:
160 - Missing: The question is not answered at all.
162 - Non-responsive: The response does not actually answer what
163 is being asked. (If the question does not reasonably apply
164 to the solution chosen by the group, then the answer should
165 explain why it does not.)
167 - Too long: e.g. a "25 words or less" response takes a whole
168 page. These qualifiers aim to save the group's time and
169 your time, so don't waste your time in these cases.
171 - Too short: The response is evasive or excessively terse to
172 the point that you don't feel confident in the answer.
174 - Grossly inaccurate: When you examine the code, you find that
175 it has no resemblance to the description.
177 - Not implemented: The functionality described in the answer
178 was not implemented. This often happens when a group runs
179 out of time before implementing the entire project. Don't
180 give credit for a design without an implementation.
182 Take off some points (use your judgment) for:
184 - Capitalization, punctuation, spelling, or grammar: An
185 occasional mistake is tolerable, but repeated or frequent
186 errors should be penalized. Try to distinguish grammar
187 errors made by non-native speakers of English, which are
188 understandable, from those made by others, which are less
191 In Emacs, it is easy to check the spelling of a word: put
192 the cursor on or just after it, then type M-$. You can also
193 make it highlight misspelled words with M-x flyspell-buffer.
195 - Minor inaccuracies: Some aspects of the code do not match
198 - Conceptual errors: Statements, assumptions, or strong
199 implications made in the design document are incorrect,
200 e.g. assuming that unblocking a thread immediately schedules
203 - Partial response: Multiple questions are asked, but only
204 some of them are answered.
206 - Excessive redundancy: The answer restates much of what is
207 specified in the assignment.
209 Instructions for recurring questions:
211 ---- DATA STRUCTURES ----
213 Copy here the declaration of each new or changed struct or
214 struct member, global or static variable, typedef, or
215 enumeration. Identify the purpose of each in 25 words or
218 - Deduct points if the required comment on each declaration is
219 missing. (The Introduction states "Add a brief comment on
220 every structure, structure member, global or static
221 variable, and function definition.")
223 - Deduct points if the response does not describe *purpose*.
224 We can see the type and the name of each entity from their
225 declarations. But why are they needed? If the comments
226 themselves adequately explain purpose, then that is
231 Why did you choose this design? In what ways is it superior to
232 another design you considered?
234 - Deduct points for failing to compare their design to another
235 *correct* possibility.
240 You should start by quickly scanning all the submitted code by eye.
241 Usually the easiest way to do this is with a command like
242 diff -urpbN -X /usr/class/cs140/submissions/diff.ignore \
243 /usr/class/cs140/pintos/pintos/src pintos/src | less
244 in a group's top-level directory. The options to "diff" here are
246 -u: produces a "unified" diff that is easier to read than the
248 -r: recurses on directories.
249 -p: prints the name of the function in which each difference
251 -b: ignores differences in white space.
252 -N: includes added files in the diff.
253 -X .../diff.ignore: ignore files that match patterns listed in
254 diff.ignore, which lists files that you don't really want
255 to look at. You can add to the list when you notice files
258 You can page through the "diff" output fairly quickly, perhaps a few
259 seconds per page. Nevertheless, you should be able to notice some
262 - Inconsistent style: indentation changing randomly between 4
263 spaces and 8 spaces per level, between BSD and GNU brace
264 placement, and so on. (The Introduction states "In new
265 source files, adopt the existing Pintos style by preference,
266 but make your code self-consistent at the very least. There
267 should not be a patchwork of different styles that makes it
268 obvious that three different people wrote the code.")
270 - Bad style: such as no indentation at all or cramming many statements
273 - Many very long source code lines (over 100 columns wide).
275 - Lack of white space: consistent lack of spaces after commas
276 or around binary operators that makes code difficult to read.
278 - Use of static or file scope ("global") variables instead of
279 automatic, block scope ("local") variables: one student
280 submission actually declared 12 (!) different global
281 variables "just so we don't have to make a new var in each
282 function". This is unacceptable.
284 - Use of struct thread members instead of automatic, block
285 scope ("local") variables: sometimes it's not obvious
286 whether this is the case, but subtract points when it is.
288 - Code copied into multiple places that should be abstracted
291 - Gratuitous use of dynamic allocation: e.g. a struct that
292 contains a pointer to a semaphore instead of a semaphore