PINTOS GRADING ADVICE ===================== This directory contains advice for TAs regarding Pintos grading. This file contains overall advice for grading all the projects, and each project has a file with additional, more specific advice for grading that particular project. Be familiar with the Grading subsection within the Introduction chapter in the Pintos manual. The principles stated there should guide your grading decisions. You should also carefully read the Coding Standards chapter and, of course, the assignments themselves. Grading is inherently subjective. The most important principle is to be fair. Try to be patient with students, in the same way that you would appreciate a TA being patient with you when you take classes yourself. In my experience, this takes practice: many TAs tend to be fairly impatient in their first quarter of TAing, and then improve in their second quarter. I have noticed this pattern in myself and others. Submission Structure ==================== At Stanford, each project submission puts files into a directory named /usr/class/cs140/submissions/hw/, where is the project number (between 1 and 4) and is the user name of the team member who did the project submission. Each submission directory contains a tarball that actually contains the submission. The tarball contains pintos/src and the files and directories underneath it. If a student group submits more than once, then there will be multiple tarballs, one for each submission. Each submission directory also contains a file named grade.txt that describes the group, giving first and last name and email address for each student. There is only a single copy of this file, regardless of the number of submissions. If two different students from a single group both submit the project, then you can end up with almost-identical submissions in two different directories. It's best to check for this before beginning grading, to avoid duplicating effort. The check-duplicates script in this directory can help you with this (there should be a copy of it in /usr/class/cs140/submissions). Grading Test Results ==================== Obtaining test results should be the easier half of the grading process. The procedure for obtaining test results for one submitted project is roughly this: 1. Extract the student code from its tarball into "pintos/src": tar xzf .tar.gz 2. Delete the existing pintos/src/tests directory and replace it by a pristine copy: rm -rf pintos/src/tests cp -R /usr/class/cs140/winter13/pintos/src/tests pintos/src/tests (make sure you are using the correct version of Pintos for the current offering of the course; it may be in a different location than the example above) 3. Run "make clean" in the top-level directory, to get rid of any binaries or objects mistakenly included in the submission: (cd pintos/src && make clean) 4. Run "make grade" in the project-specific directory, e.g. threads: (cd pintos/src/threads && make grade) 5. Make a copy of the "grade" file that this produces, which is in the "build" directory. cp pintos/src/threads/build/grade tests.out 6. Compare the grade report that you produced against the one submitted by the group. You can use "diff -u" or just compare the final grades: diff -u tests.out pintos/src/threads/GRADE If there are major discrepancies (e.g. all their tests passed, but all yours failed) then you should contact the group. Otherwise, use the grade report that you produced. Grade reports can vary for a number of reasons: QEMU is not fully reproducible, Bochs sometimes has reproducibility bugs, the compilers used on different machines may produce code with different behavior, and so on. Finally, it's possible that the group submitted a grade report that goes with an earlier version of their code. 7. Run "make clean" in pintos/src again: (cd pintos/src && make clean) You don't have to do this immediately, but if you try to grade too many projects without doing so, then you will probably run out of quota. An alternative is to do the builds in temporary storage, e.g. in /tmp. This will probably be a lot faster than doing it in AFS, but it is slightly less convenient. There is a script called run-tests in this directory (and in /usr/class/cs140/submissions) that can do most of this work for you. Run "run-tests --help" for instructions. You can automate running the tests in several directories using a command like (in the default C shell) foreach d (*) cd $d && run-tests threads end or in the Bourne shell: for d in *; do cd $d && run-tests threads; done Grading Design ============== There are two parts to grading students' designs: their design documents and their code. Both are lumped into a single grade, taken out of 100 points. The form to use for grading each project is in hw.txt in this directory. You should copy this file into each submission directory and delete the lines that do not apply. The grading form is divided into sections, one for each problem, and an OVERALL section. Each section has its own set of deductions, with a cap on the total deductions for that section. To compute the overall design score, first compute the (capped) deductions within each section, then subtract the section deductions from 100. If the final score is less than zero, round it up to zero. IMPORTANT: Be critical in grading designs. Most submissions will pass most of the tests, which means that they get almost 50% of the grade for "free". When TAs only take off a few points in the design grade, then total project scores can average 90% or even higher. This may not sound like a bad thing, but it is, because students who receive numerically high grades think that they did well relative to the other students. At the end of the quarter when the curve is applied, these students are then understandably disappointed or angry that their final grades are so low when their intermediate grades seemed so high. It is better to take off lots of points on projects and thereby give students more realistic expectations about their final course grades. At the same time, don't be unfair. You should only deduct points in situations where students should have known what was expected of them. For example, don't invent your own standards for coding style based on what you think is "right": stick to what is documented, or what any reasonable person might assume. Grading Design Documents ------------------------ Be familiar with the Design Document subsection of the Introduction chapter in the Pintos manual. Deduct all the points for a given question in these cases: - Missing: The question is not answered at all. - Non-responsive: The response does not actually answer what is being asked. (If the question does not reasonably apply to the solution chosen by the group, then the answer should explain why it does not.) - Too long: e.g. a "25 words or less" response takes a whole page. These qualifiers aim to save the group's time and your time, so don't waste your time in these cases. - Too short: The response is evasive or excessively terse to the point that you don't feel confident in the answer. - Grossly inaccurate: When you examine the code, you find that it has no resemblance to the description. - Not implemented: The functionality described in the answer was not implemented. This often happens when a group runs out of time before implementing the entire project. Don't give credit for a design without an implementation. Take off some points (use your judgment) for: - Conceptual errors: Statements, assumptions, or strong implications made in the design document are incorrect, e.g. assuming that unblocking a thread immediately schedules it. - Minor inaccuracies: Some aspects of the code do not match - Partial response: Multiple questions are asked, but only some of them are answered. Minor issues (take off only a few points, and only if the problem is severe): - Capitalization, punctuation, spelling, or grammar: An occasional mistake is tolerable, but repeated or frequent errors should be penalized. Try to distinguish grammar errors made by non-native speakers of English, which are understandable, from those made by others, which are less so. In Emacs, it is easy to check the spelling of a word: put the cursor on or just after it, then type M-$. You can also make it highlight misspelled words with M-x flyspell-buffer. - Excessive redundancy: The answer restates much of what is specified in the assignment. Instructions for recurring questions: ---- DATA STRUCTURES ---- Copy here the declaration of each new or changed struct or struct member, global or static variable, typedef, or enumeration. Identify the purpose of each in 25 words or less. - Deduct points if the required comment on each declaration is missing. (The Introduction states "Add a brief comment on every structure, structure member, global or static variable, and function definition.") - Deduct points if the response does not describe *purpose*. We can see the type and the name of each entity from their declarations. But why are they needed? If the comments themselves adequately explain purpose, then that is sufficient. Comments should provide information that is not obvious from the code. A common mistake is for a comment to duplicate information in the variable or function name. For example, a variable "pageFaultCount" might have the comment "count of page faults"; that's not much help! ---- RATIONALE ---- Why did you choose this design? In what ways is it superior to another design you considered? - Deduct points for failing to compare their design to another *correct* possibility. Grading Code ------------ You should start by quickly scanning all the submitted code by eye. Usually the easiest way to do this is with a command like diff -urpbN -X /usr/class/cs140/submissions/diff.ignore \ /usr/class/cs140/pintos/pintos/src pintos/src | less in a group's top-level directory. The options to "diff" here are useful: -u: produces a "unified" diff that is easier to read than the default. -r: recurses on directories. -p: prints the name of the function in which each difference lies. -b: ignores differences in white space. -N: includes added files in the diff. -X .../diff.ignore: ignore files that match patterns listed in diff.ignore, which lists files that you don't really want to look at. You can add to the list when you notice files that should be. You can page through the "diff" output fairly quickly, perhaps a few seconds per page. Nevertheless, you should be able to notice some important flaws: - Inconsistent style: indentation changing randomly between 4 spaces and 8 spaces per level, between BSD and GNU brace placement, and so on. (The Introduction states "In new source files, adopt the existing Pintos style by preference, but make your code self-consistent at the very least. There should not be a patchwork of different styles that makes it obvious that three different people wrote the code.") - Bad style: such as no indentation at all or cramming many statements onto a single line. - Many very long source code lines (over 100 columns wide). - Lack of white space: consistent lack of spaces after commas or around binary operators that makes code difficult to read. - Use of static or file scope ("global") variables instead of automatic, block scope ("local") variables: one student submission actually declared 12 (!) different global variables "just so we don't have to make a new var in each function". This is unacceptable. - Use of struct thread members instead of automatic, block scope ("local") variables: sometimes it's not obvious whether this is the case, but subtract points when it is. - Code copied into multiple places that should be abstracted into a function. - Gratuitous use of dynamic allocation: e.g. a struct that contains a pointer to a semaphore instead of a semaphore itself.