4 This directory contains advice for TAs regarding Pintos grading. This
5 file contains overall advice for grading all the projects, and each
6 project has a file with additional, more specific advice for grading
7 that particular project.
9 Be familiar with the Grading subsection within the Introduction
10 chapter in the Pintos manual. The principles stated there should
11 guide your grading decisions. You should also carefully read the
12 Coding Standards chapter and, of course, the assignments themselves.
14 Grading is inherently subjective. The most important principle is to
15 be fair. Try to be patient with students, in the same way that you
16 would appreciate a TA being patient with you when you take classes
17 yourself. In my experience, this takes practice: many TAs tend to be
18 fairly impatient in their first quarter of TAing, and then improve in
19 their second quarter. I have noticed this pattern in myself and
25 At Stanford, each project submission puts files into a directory named
26 /usr/class/cs140/submissions/hw<number>/<username>, where <number> is
27 the project number (between 1 and 4) and <username> is the user name
28 of the team member who did the project submission.
30 Each submission directory contains a tarball that actually contains
31 the submission. The tarball contains pintos/src and the files and
32 directories underneath it. If a student group submits more than once,
33 then there will be multiple tarballs, one for each submission.
35 Each submission directory also contains a file named grade.txt that
36 describes the group, giving first and last name and email address for
37 each student. There is only a single copy of this file, regardless of
38 the number of submissions.
40 If two different students from a single group both submit the project,
41 then you can end up with almost-identical submissions in two different
42 directories. It's best to check for this before beginning grading, to
43 avoid duplicating effort. The check-duplicates script in this
44 directory can help you with this (there should be a copy of it in
45 /usr/class/cs140/submissions).
50 Obtaining test results should be the easier half of the grading
51 process. The procedure for obtaining test results for one submitted
52 project is roughly this:
54 1. Extract the student code from its tarball into "pintos/src":
58 2. Delete the existing pintos/src/tests directory and replace
59 it by a pristine copy:
61 rm -rf pintos/src/tests
62 cp -R /usr/class/cs140/winter13/pintos/src/tests pintos/src/tests
64 (make sure you are using the correct version of Pintos for the current
65 offering of the course; it may be in a different location than the
68 3. Run "make clean" in the top-level directory, to get rid of
69 any binaries or objects mistakenly included in the
72 (cd pintos/src && make clean)
74 4. Run "make grade" in the project-specific directory,
77 (cd pintos/src/threads && make grade)
79 5. Make a copy of the "grade" file that this produces, which
80 is in the "build" directory.
82 cp pintos/src/threads/build/grade tests.out
84 6. Compare the grade report that you produced against the one
85 submitted by the group. You can use "diff -u" or just
86 compare the final grades:
88 diff -u tests.out pintos/src/threads/GRADE
90 If there are major discrepancies (e.g. all their tests
91 passed, but all yours failed) then you should contact the
92 group. Otherwise, use the grade report that you produced.
94 Grade reports can vary for a number of reasons: QEMU is not
95 fully reproducible, Bochs sometimes has reproducibility
96 bugs, the compilers used on different machines may produce
97 code with different behavior, and so on. Finally, it's
98 possible that the group submitted a grade report that goes
99 with an earlier version of their code.
101 7. Run "make clean" in pintos/src again:
103 (cd pintos/src && make clean)
105 You don't have to do this immediately, but if you try to
106 grade too many projects without doing so, then you will
107 probably run out of quota.
109 An alternative is to do the builds in temporary storage,
110 e.g. in /tmp. This will probably be a lot faster than
111 doing it in AFS, but it is slightly less convenient.
113 There is a script called run-tests in this directory (and in
114 /usr/class/cs140/submissions) that can do most of this work for you.
115 Run "run-tests --help" for instructions.
117 You can automate running the tests in several directories using a
118 command like (in the default C shell)
120 cd $d && run-tests threads
122 or in the Bourne shell:
123 for d in *; do cd $d && run-tests threads; done
128 There are two parts to grading students' designs: their design
129 documents and their code. Both are lumped into a single grade, taken
132 The form to use for grading each project is in hw<N>.txt in this
133 directory. You should copy this file into each submission directory
134 and delete the lines that do not apply. The grading form is divided
135 into sections, one for each problem, and an OVERALL section. Each
136 section has its own set of deductions, with a cap on the total
137 deductions for that section. To compute the overall design score,
138 first compute the (capped) deductions within each section, then
139 subtract the section deductions from 100. If the final score is less
140 than zero, round it up to zero.
142 IMPORTANT: Be critical in grading designs. Most submissions will pass
143 most of the tests, which means that they get almost 50% of the grade
144 for "free". When TAs only take off a few points in the design grade,
145 then total project scores can average 90% or even higher. This may
146 not sound like a bad thing, but it is, because students who receive
147 numerically high grades think that they did well relative to the other
148 students. At the end of the quarter when the curve is applied, these
149 students are then understandably disappointed or angry that their
150 final grades are so low when their intermediate grades seemed so high.
151 It is better to take off lots of points on projects and thereby give
152 students more realistic expectations about their final course grades.
154 At the same time, don't be unfair. You should only deduct points in
155 situations where students should have known what was expected of them.
156 For example, don't invent your own standards for coding style based on
157 what you think is "right": stick to what is documented, or what any
158 reasonable person might assume.
160 Grading Design Documents
161 ------------------------
163 Be familiar with the Design Document subsection of the Introduction
164 chapter in the Pintos manual.
166 Deduct all the points for a given question in these cases:
168 - Missing: The question is not answered at all.
170 - Non-responsive: The response does not actually answer what
171 is being asked. (If the question does not reasonably apply
172 to the solution chosen by the group, then the answer should
173 explain why it does not.)
175 - Too long: e.g. a "25 words or less" response takes a whole
176 page. These qualifiers aim to save the group's time and
177 your time, so don't waste your time in these cases.
179 - Too short: The response is evasive or excessively terse to
180 the point that you don't feel confident in the answer.
182 - Grossly inaccurate: When you examine the code, you find that
183 it has no resemblance to the description.
185 - Not implemented: The functionality described in the answer
186 was not implemented. This often happens when a group runs
187 out of time before implementing the entire project. Don't
188 give credit for a design without an implementation.
190 Take off some points (use your judgment) for:
192 - Conceptual errors: Statements, assumptions, or strong
193 implications made in the design document are incorrect,
194 e.g. assuming that unblocking a thread immediately schedules
197 - Minor inaccuracies: Some aspects of the code do not match
199 - Partial response: Multiple questions are asked, but only
200 some of them are answered.
202 Minor issues (take off only a few points, and only if the problem
205 - Capitalization, punctuation, spelling, or grammar: An
206 occasional mistake is tolerable, but repeated or frequent
207 errors should be penalized. Try to distinguish grammar
208 errors made by non-native speakers of English, which are
209 understandable, from those made by others, which are less
212 In Emacs, it is easy to check the spelling of a word: put
213 the cursor on or just after it, then type M-$. You can also
214 make it highlight misspelled words with M-x flyspell-buffer.
216 - Excessive redundancy: The answer restates much of what is
217 specified in the assignment.
219 Instructions for recurring questions:
221 ---- DATA STRUCTURES ----
223 Copy here the declaration of each new or changed struct or
224 struct member, global or static variable, typedef, or
225 enumeration. Identify the purpose of each in 25 words or
228 - Deduct points if the required comment on each declaration is
229 missing. (The Introduction states "Add a brief comment on
230 every structure, structure member, global or static
231 variable, and function definition.")
233 - Deduct points if the response does not describe *purpose*.
234 We can see the type and the name of each entity from their
235 declarations. But why are they needed? If the comments
236 themselves adequately explain purpose, then that is
237 sufficient. Comments should provide information that is not
238 obvious from the code. A common mistake is for a comment to
239 duplicate information in the variable or function name. For
240 example, a variable "pageFaultCount" might have the comment
241 "count of page faults"; that's not much help!
245 Why did you choose this design? In what ways is it superior to
246 another design you considered?
248 - Deduct points for failing to compare their design to another
249 *correct* possibility.
254 You should start by quickly scanning all the submitted code by eye.
255 Usually the easiest way to do this is with a command like
256 diff -urpbN -X /usr/class/cs140/submissions/diff.ignore \
257 /usr/class/cs140/pintos/pintos/src pintos/src | less
258 in a group's top-level directory. The options to "diff" here are
260 -u: produces a "unified" diff that is easier to read than the
262 -r: recurses on directories.
263 -p: prints the name of the function in which each difference
265 -b: ignores differences in white space.
266 -N: includes added files in the diff.
267 -X .../diff.ignore: ignore files that match patterns listed in
268 diff.ignore, which lists files that you don't really want
269 to look at. You can add to the list when you notice files
272 You can page through the "diff" output fairly quickly, perhaps a few
273 seconds per page. Nevertheless, you should be able to notice some
276 - Inconsistent style: indentation changing randomly between 4
277 spaces and 8 spaces per level, between BSD and GNU brace
278 placement, and so on. (The Introduction states "In new
279 source files, adopt the existing Pintos style by preference,
280 but make your code self-consistent at the very least. There
281 should not be a patchwork of different styles that makes it
282 obvious that three different people wrote the code.")
284 - Bad style: such as no indentation at all or cramming many statements
287 - Many very long source code lines (over 100 columns wide).
289 - Lack of white space: consistent lack of spaces after commas
290 or around binary operators that makes code difficult to read.
292 - Use of static or file scope ("global") variables instead of
293 automatic, block scope ("local") variables: one student
294 submission actually declared 12 (!) different global
295 variables "just so we don't have to make a new var in each
296 function". This is unacceptable.
298 - Use of struct thread members instead of automatic, block
299 scope ("local") variables: sometimes it's not obvious
300 whether this is the case, but subtract points when it is.
302 - Code copied into multiple places that should be abstracted
305 - Gratuitous use of dynamic allocation: e.g. a struct that
306 contains a pointer to a semaphore instead of a semaphore