PROJECT 2 GRADING ADVICE
		       ========================

This file contains advice for grading project 2.  General advice for
Pintos grading is in README, which you should read first.

			   ARGUMENT PASSING
			   ================

A1:

    Most correct solutions don't introduce any new data structures for
    this part.

    If any global variables are introduced, be suspicious about the
    possibility for race conditions.

A2:

    The "Program Startup Details" section in the User Programs chapter
    of the Pintos manual describes what needs to be done.

    There are two main ways to put argv[] in the correct order:

	- Many solutions use strtok_r() to divide the arguments into
	  space-separated strings, storing a pointer to each argument
	  in a temporary storage area, then push each pointer on the
	  stack in reverse order.

	- Another approach is to push the arguments obtained by
	  strtok_r() on the user stack directly, then reverse their
	  order when finished.  This is what the sample solution does.

    Students must specify how they avoid overflow:

	- One acceptable approach is to check for overflow just before
	  pushing each item on the stack.

	- Another acceptable approach is to make two passes.  In the
          first pass, calculate how many bytes of stack space are
          required.  Abort at this point if the stack would overflow.
          Then, in the second pass, actually set up the stack.

        - A third acceptable approach is to allocate additional stack
          pages as the currently allocate page or pages are exhausted.
          If this is done, verify that the case where no more memory
          is available is correctly handled (by returning an error and
          ensuring that the pages already allocated are freed).

          Few groups choose this approach.

	- Some students assume that any command line under 4,096 bytes
	  long will fit.  This is wrong: some space is required for
	  the pointers in argv[], as well as a little extra for other
	  data that goes on the stack and some space for the user
	  process's use as well.

	- Generalizing the previous approach, some students set some
          other fixed limit on command-line length in bytes.  Usually
          they don't properly consider the consequences.  Every two
          bytes in a command line can require four bytes for a pointer
          in argv[], as in a command line that consists of alternating
          space and non-space characters.  Thus, N bytes of
          command-line arguments can require N + 4(N / 2) = 3N bytes
          of stack space.  Setting 3N = 4,096 we obtain N = 1,365.
          Thus, any solution that sets a fixed cap on command-line
          length greater than 1,365 is incorrect.  Preferably, any
          fixed limit should be significantly less than that.

	- Some students believe that there is a 128-byte limit on the
          length of a command line.  This is wrong: the 128-byte limit
          only applies to the "pintos" utility's command line.  It
          does not apply to the command line passed to the "exec"
          system call.

	- Some groups limit the number of command line arguments to a
          fixed value.  There's nothing wrong with that as long as
          it's sufficiently large, e.g. 512.

    One common fallacy is an assumption that each string must be
    word-aligned for performance.  This is not true.  It's okay if
    they align strings, because it's harmless, but stating that it
    improves performance should incur a small deduction.

A3:

    strtok() uses a static object to hold the starting position for
    the next token.  Thus, it isn't safe for concurrent use in
    different threads (the problem here) or even for nested use within
    a single thread.

    strtok_r() solves the problem by requiring the caller to provide
    storage for the starting position for the next token.

    Wrong answers:

	- Some students believe that strtok_r() does not modify the
          string it tokenizes whereas strtok() does.  This is wrong:
          both functions modify the string that they tokenize.

        - Some students think strtok() uses a "static buffer" for
          parsing.  This is wrong: it modifies the string provided and
          stores a pointer into it statically.

          However, the manpage for on GNU/Linux (at least) says that
          strtok uses a static buffer.  That's wrong.  So don't
          penalize students who repeat that claim.  Instead, please
          add a note, such as:

	      Although some manpages for strtok do claim that strtok
	      uses a static buffer, this is incorrect.  strtok uses a
	      user-provided buffer and statically stores a pointer
	      into that buffer.

A4:

    Some correct answers:

        - Increased flexibility: a variety of shells can accept
          varying syntaxes without any need to change the kernel.

        - Simplifies the kernel and improves its security: a simpler
          kernel is a kernel less likely to have important bugs or
          security holes.  This is especially true given the
          complicated parsing rules of real-world shells.

    Some wrong answers:

        - Saves time and disk space: doing parsing in user space or
          kernel space doesn't have a strong time or space trade-off,
          but some students think so.

        - Relative paths: some students think that relative paths are
          implemented by the shell, but this is wrong.
	
        - Fewer context switches: where parsing is done has no effect
          on the number of context switches required to start a
          process.  (It might be possible to justify this claim by
          giving specific details.)

        - Policy enforcement: some students think that the shell is in
          charge of enforcing system security policies, but this is
          not normally true.

Check the submitted code, by hand, for the following: 

  1. That the code used for argument passing is readable.  This is
     probably near the end of userprog/process.c, in or near
     setup_stack().  In particular, many stack "push" operations are
     required, so the code to do this should either be simple
     (e.g. "*--p = something;") or, if it is more complicated,
     abstracted into a function or set of functions.

			     SYSTEM CALLS
			     ============

B1:

    This question has a pretty big answer, because system calls need
    quite a few data structures.  The guidelines for B1 are therefore
    broken down into B1(i) through B1(iv), below, to make them easier
    to follow.

B1(i):

    A global lock or semaphore for file system synchronization is
    needed, e.g.

	/* Serializes file system operations. */
	static struct lock fs_lock;

B1(ii):

    Some mechanism for tracking file descriptors is needed.  There are
    three common variants (see B2 for more on this topic):

	- Per-thread list: adds a list to struct thread, like so:

            struct list fds;                /* List of file descriptors. */

          Each member of the list includes at least a file and a fd
          number, e.g.:

	    /* A file descriptor, for binding a file handle to a file. */
	    struct file_descriptor
	      {
		struct list_elem elem;      /* List element. */
		struct file *file;          /* File. */
		int handle;                 /* File handle. */
	      };

          This case also requires a way to come up with unique file
          descriptors.  The simplest and best way is to add a counter
          to struct thread, e.g.:

            int next_handle;                /* Next handle value. */

          Another alternative is to use a global counter, but this
          requires adding a lock as well.  (It's possible to use the
          global file system lock or some other lock for this, but I
          haven't seen that done in practice.)  Deduct points if
          synchronization is omitted:

            static int next_fd = 2;	    /* Next file descriptor. */
            static struct lock fd_lock;     /* Protects next_fd. */

	- Per-thread array: adds an array of 128 elements to struct
          thread (128 because the assignment states that as a minimum
          number):

            struct file *fds[128];

          Some solutions include some extra members to speed up
          finding an empty slot.

        - Global hash table: adds a global hash table that maps from a
          descriptor number to a "struct file *".  Check that there is
          also a new hash_elem in struct thread in this case.

          Some way to generate unique file descriptors is needed,
          usually a global counter.  A lock or other synchronization
          is also required.

          The global data for this case looks something like this:

            static struct hash fd_hash;
            static int fd_next = 2;
            static struct lock fd_lock;

	  The hash table elements look something like this:

            struct file_descriptor
              {
                struct hash_elem hash_elem;
                struct file *file;
                int handle;
                pid_t owner;
              };

          Notice the inclusion of the "owner" member, which ensures
          that a fd can only be used by the process that opened it.
          If there's no mechanism to ensure this, deduct points.

B1(iii):

    Data structures for implementing the "wait" system call are needed
    (see B5 and B8 for more on this topic).  Fundamentally, "wait"
    requires communication between a process and its children, usually
    implemented through shared data.  The shared data might be added
    to struct thread, but many solutions separate it into a separate
    structure.  At least the following must be shared between a parent
    and each of its children:

	- Child's exit status, so that "wait" can return it.

	- Child's thread id, for "wait" to compare against its
          argument.

	- A way for the parent to block until the child dies (usually
          a semaphore).

	- A way for the parent and child to tell whether the other is
          already dead, in a race-free fashion (to ensure that their
          shared data can be freed).

    The sample solution uses a new "struct wait_status" for this
    purpose:

	/* Tracks the completion of a process.
	   Reference held by both the parent, in its `children' list,
	   and by the child, in its `wait_status' pointer. */
	struct wait_status
	  {
	    struct list_elem elem;      /* `children' list element. */
	    struct lock lock;           /* Protects ref_cnt. */
	    int ref_cnt;                /* 2=child and parent both alive,
					   1=either child or parent alive,
					   0=child and parent both dead. */
	    tid_t tid;                  /* Child thread id. */
	    int exit_code;              /* Child exit code, if dead. */
	    struct semaphore dead;      /* 1=child alive, 0=child dead. */
	  };

    If the shared data is integrated into struct thread, then one of
    the cleanest ways to do it is to use a pair of semaphores, both
    initially set to 0, like so (see B5/B8 below for more information
    on the usage of these semaphores):

	int exit_code;		    /* Process's exit status. */
	struct semaphore exiting;   /* Up'd when process ready to exit. */
	struct semaphore may_exit;  /* Up'd after parent reads exit code. */

    Of course, each thread needs a way to find the shared data for
    each of its children and, if shared data is not stored in struct
    thread, its own shared data as well.  The sample solution adds a
    pair of members to struct thread:

	struct wait_status *wait_status; /* This process's completion state. */
	struct list children;            /* Completion status of children. */

    If the shared data is integrated into struct thread, then a list
    of children and a corresponding list element is needed:

        struct list children;		 /* List of child processes. */
	struct list_elem child_elem;	 /* Used for `children' list. */

    Most solutions add an exit code member to struct thread, even if
    there is an exit code member in a separate shared data structure.
    That's fine.

B1(iv): 

    Some data is needed to allow a process that calls "exec" to wait
    until the new process completes loading.  The sample solution, and
    some student submissions, just add a new structure.  A pointer to
    the new structure is passed to the new thread as the "void *"
    pointer argument to thread_create().  Here is the sample
    solution's structure:

	/* Data structure shared between process_execute() in the
	   invoking thread and start_process() in the newly invoked
	   thread. */
	struct exec_info 
	  {
	    const char *file_name;          /* Program to load. */
	    struct semaphore load_done;     /* "Up"ed when loading complete. */
	    struct wait_status *wait_status;/* Child process. */
	    bool success;                   /* Program successfully loaded? */
	  };

    Many student submissions instead add members to struct thread to
    do the same things, which is fine.

B2:

    See B1(ii) above for information on the basic data structures used
    for file descriptors.  Part of the answer to this question might
    be given in B1.

    Globally unique and per-process unique file descriptors are both
    acceptable.

    If a global counter is used to generate file descriptors, there
    must be some mention of synchronization.

    If an array totalling 512 bytes or more (e.g. 128 elements of
    "int" or pointer type) is added to struct thread, then there must
    be some mention that this is a large amount of data to add to
    struct thread.  This can be mentioned here, or in B1(ii), or in
    B10 below.  See the "struct thread" section in the Reference Guide
    chapter of the Pintos manual for explanation.

    If the fd counter skips the first 3 fd values, with mention that
    this is for stdin, stdout, and stderr, then it must also mention
    that Pintos does not have stderr.

    There seems to be some confusion about locking here among
    students.  No locking is necessary for per-thread data structures,
    such as a thread-specific list of file descriptors.  Locking is
    required for data that might be concurrently accessed by multiple
    threads.

    There is no need for submissions to handle overflow of the counter
    used for generating fds.

B3:

    This question was badly phrased in previous quarters, so there's
    no pool of good student answers to compare against.

    The answer should say which of the techniques described in
    "Accessing User Memory" in the assignment is used.  It should
    describe a few of the implementation details, such as interfaces
    of functions used to read or write user memory.

    A few students believe that writing to the "eax" member of struct
    intr_frame is a form of writing to user memory.  This is wrong:
    struct intr_frame is in the kernel stack.  Modifying its "eax"
    member changes the value of the EAX register seen by the process
    when a system call returns, not any kind of user memory.

    Occasionally, a group extends the page fault-based method
    described in the Pintos manual.  The method described there has a
    little bit of overhead in that it requires the EAX register to be
    set up with the address of a recovery handler before accessing
    memory.  This is necessary for the page fault handler to know what
    to do.  An alternative without this overhead is to maintain a
    table that associates EIP values of potentially faulting
    instructions with their recovery handlers.  This technique is used
    in the Linux kernel and elsewhere to good effect.

B4:

    In the first part of this question, students should address their
    implementation.  In the second part, they should address an ideal
    implementation.  They need to answer both parts, although the
    answers may be very brief as long as they are correct.

    An ideal implementation would only inspect as many page table
    entries as aligned virtual pages spanned by the buffer.  Both
    2-byte and 4,096 buffers can span either 1 or 2 pages, so they
    would require 1 or 2 checks.

    Student implementations often check page table entries more often
    than this.  That's acceptable, within reason.  For example, one
    common student implementation checks the first byte and last byte
    of the user memory block to be accessed, plus one byte spaced
    every 4,096 bytes within the block.  That's fine, although it
    often makes a few more checks than absolutely necessary.

    Implementations that always inspect every byte read or written are
    not acceptable.  Take off points.

    Bizarrely, some students believe that checking a page table entry
    every 4 bytes within a block is correct.  It's not: it will fail,
    for example, in the case of a 2-byte block spanning a pair of
    virtual pages.  If they check each *aligned* 4-byte region, then
    it's correct, but then they're still doing 1,024 checks per page,
    which is still unacceptable.

    An implementation that uses page faults to detect bad user memory
    accesses requires 0 page table inspections to copy any number of
    bytes.  An answer from this point of view is acceptable too.

    Some students suggest that further improvements could be made by
    page-aligning the buffers passed to the kernel.  That's a
    reasonable suggestion.

    Some very confused students think that each of the bytes in a
    4,096 user memory region can be on a separate page.  Penalize
    them.

B5 and B8:

    These questions really have to be graded together, because they
    are so closely related.  See B1(iii) above for information on the
    basic data structures used for "wait".

    A call to "wait" should not require disabling interrupts or
    calling thread_block() by hand.  Take off points.

    Global locks, semaphores, etc. should not generally be used for
    implementing "wait".  Their use forces serialization of process
    exit, and they are unnecessary.  They are also, often, an
    indication of an incorrect implementation.

    Some students indicate that they actually implement the wrong
    semantics: If C exits before P, then "wait" by P on C should
    return the process's exit code, not -1 as these students say.
    Take off points for this.  (But it's strange that they would think
    so, seeing as the tests for "wait" are pretty thorough.)

    The "exit" and "wait" system calls should not have to search a
    global list of all processes, because it is inefficient and, more
    importantly, unnecessary given that it is so easy to keep a
    per-thread list of children.  Take off points.

    The procedures for "wait" and "exit" vary broadly based on the
    data structures used.  If shared data separate from struct thread
    are used:

	The procedure for "wait" is roughly this:

	    - Find the child in the list of shared data structures.
	      (If none is found, return -1.)

	    - Wait for the child to die, by downing a semaphore in the
	      shared data.

	    - Obtain the child's exit code from the shared data.

	    - Destroy the shared data structure and remove it from the
	      list.

	    - Return the exit code.

	The procedure for "exit" is then:

	    - Save the exit code in the shared data.

	    - Up the semaphore in the data shared with our parent
	      process (if any).  In some kind of race-free way (such
	      as using a lock and a reference count or pair of boolean
	      variables in the shared data area), mark the shared data
	      as unused by us and free it if the parent is also dead.

	    - Iterate the list of children and, as in the previous
	      step, mark them as no longer used by us and free them if
	      the child is also dead.

            - Terminate the thread.

	A common variation on the above "wait" procedure is to mark
	the shared data structure as "used" instead of removing it
	from the list and destroying it, then on subsequent "wait"s
	noticing that it is already used, skipping the down of the
	semaphore, and returning -1.

    If the shared data is integrated into struct thread:

        The procedure for "wait" is roughly this:

	    - Find the child in the list of child processes.  (If none
              is found, return -1.)

	    - Wait for the child to die, by downing a semaphore in the
	      child thread struct (this is the "exiting" semaphore in
	      the example in B1(iii)).

	    - Obtain the child's exit code from its thread struct.

	    - Remove the child from the list of children.

	    - Give the child permission to finish exiting, by "up"ing
              its may_exit semaphore.  (After this action, the child
              thread struct must no longer be accessed, because the
              child could wake up at any time and exit.  In particular
              this means that the previous two steps *must* come
              before this one.)

	    - Return the exit code.

        The procedure for "exit" is then:

	    - Save the exit code in the exiting process's thread
              struct.

	    - Up the "exiting" semaphore, to allow a waiting parent to
              retrieve the exit code.

            - Iterate the list of children and up each child's
              may_exit semaphore, to allow them to exit when they're
              ready.

            - Down the "may_exit" semaphore, to wait for our parent to
              retrieve our exit code.

            - Terminate the thread.

	A common variation on the above "wait" procedure is to mark a
	child as "used" instead of allowing it to terminate then on
	subsequent "wait"s noticing that it is already used, skipping
	the down of the semaphore, and returning -1.  Then the "exit"
	procedure actually gives each child permission to terminate.

        The main "trick" in this kind of implementation is that an
        exiting process must not terminate until its parent process
        has terminated or retrieved its exit status.  (If the exiting
        process were allowed to terminate immediately, then its thread
        struct would be freed and the exit status could not be
        retrieved at all.)  If the students do not explain how they do
        this, deduct points.

    (Students are not required to explain the "exit" procedure as part
    of B5, but they usually end up doing so.)

    Some groups come up with extraordinarily complicated, confusing
    schemes for implementing wait/exit.  If you can't understand their
    description, take off points.

    You should expect to see an answer to each of the 4 parts in B8.
    Here's an example answer that corresponds to the sample solution
    (which uses shared data separate from the thread struct):

	- If P calls wait(C) before C exits, P will wait for C to die
          by "down"ing the semaphore in C's wait_status struct, which
          C will "up" when it dies.

	- If P calls wait(C) after C exits, P will "down" the
          semaphore similarly but no waiting will be required.

	- If P terminates without waiting, before C exits, then
          C will "up" the semaphore when it terminates, but P will
          never "down" it, which is fine.

	- If P terminates without waiting, after C exits, then the
          same thing happens, except that C "up"s the semaphore before
          P dies instead of after.

    Students should also say something about freeing resources and
    special cases, e.g.:

	- When either C or P terminates, it decrements their shared
          wait_status's reference count (under the wait_status lock).
          When the reference count reaches 0, the wait_status is no
          longer referenced so it is freed.

	- There are no special cases.

B6: 

    There are three common types of answers:

	- Pre-checking: System call handlers verify that all the data
          they need to access can be accessed before allocating any
          resources.  Thus, there's never anything to free on error.

          (Once project 3 rolls around, this will no longer be
          possible, so these students haven't done a good job of
          looking ahead.)

	- Manual freeing: If an error occurs, the system call handler
          manually frees everything allocated up to that point.  Often
          these answers focus on "abstraction", making the point that
          it's easier to free resources if the resources are correctly
          abstracted.

	- Resource pools: All the resources allocated by a thread
          (memory, locks, etc.) is put on a list.  If an error occurs,
          a handler iterates through the list and releases all of the
          resources.

	  (This is a clever solution, but rare.)

    Any of these are fine.

B7:

    This depends on the data structures defined in B1(iv) above.

    The usual solution is to use a semaphore initialized to 0.  The
    process calling "exec" downs it.  The new process "up"s it once
    the load success or failure is known.  A shared Boolean variable
    is used to indicate success or failure.

    Some students use thread_block() instead of a semaphore.  Take off
    points.

    Some students store the success indicator variable in the child
    process.  This requires additional synchronization so that the
    child process can't terminate before the parent reads the status.
    Typically, an additional semaphore is used to give the process
    calling "exec" a chance to read the shared Boolean variable
    without a race on the new process's termination.  This is not a
    good approach, because it is so easy to do without the additional
    synchronization by storing the success indicator in the parent's
    thread struct or in a local variable (e.g. as in the sample
    solution).  Take off points.

B8:

    See "B5 and B8", above.

B9 and B10:

    Just make sure of a few things here:

	- They actually answered "why", not "how" or "what".

	- Their answers do not contain any factual inaccuracies
          (e.g. do not claim that Pintos processes inherit file
          descriptors on exec, which contradicts what the assignment
          says).

	- Their answers seem "reasonable".

B11:

    A typical answer is just "we didn't change anything".  Otherwise,
    see "B9 and B10" above.

    Take off points if they come up with some bizarre or stupid scheme
    and don't properly justify it, e.g. one group decided that pid_t
    for a process should be the sum of its tid_t and its parent's
    tid_t.

    Take off points if an answer assumes that there can be more than
    one thread per process and doesn't mention that this would be an
    extension to the Pintos design.

Check the submitted code, by hand, for the following:

  1. That the system call handler, which is probably in
     userprog/syscall.c in syscall_handler(), is reasonably
     abstracted.  It should not all be in one function as one huge
     switch statement, for example.  The code that accesses user
     memory should be separated into specialized functions.  If either
     of these is not the case, or if the code is otherwise difficult
     to read, take off points.

  2. The "Source Code" section in the Introduction to the Pintos
     manual says, "Update existing comments as you modify code."  Most
     students ignore this.  Check whether the comment on
     process_wait() in process.c has been updated.  At a minimum, the
     two final sentences ("This function will be implemented in
     problem 2-2.  For now, it does nothing.") should have been
     deleted.

  3. Find the implementation of the "open" system call.  Verify that
     all resources are released in error cases and that errors are
     properly handled.  One particular problem can be malloc() failure
     after a file is successfully opened--is the return value checked?
     If so, is the file closed before returning an error?

     (Failure to check malloc() is in the OVERALL section of the grade
     report, under DESIGN.)