"file system" not "filesystem"

[pintos-anon] / doc / userprog.texi
diff --git a/doc/userprog.texi b/doc/userprog.texi

index 8ac03dacfdda7ddd3831437460b833d925319465..db1ad33bbf3e458ea2657a3b03e3f0eab270790d 100644 (file)
--- a/doc/userprog.texi
+++ b/doc/userprog.texi
@@ -14,8 +14,10 @@ assignment.  However, you will also be interacting with almost every
  other part of the code for this assignment. We will describe the
  relevant parts below. If you are confident in your HW1 code, you can
  build on top of it.  However, if you wish you can start with a fresh
-copy of the code and re-implement @code{thread_join()}, which is the
-only part of project #1 required for this assignment.
+copy of the code and re-implement @func{thread_join}, which is the
+only part of project #1 required for this assignment.  Your submission
+should define @code{THREAD_JOIN_IMPLEMENTED} in @file{constants.h}
+(@pxref{Conditional Compilation}).
  
  Up to now, all of the code you have written for Pintos has been part
  of the operating system kernel.  This means, for example, that all the
@@ -32,13 +34,12 @@ this illusion.
  
  Before we delve into the details of the new code that you'll be
  working with, you should probably undo the test cases from project 1.
-All you need to do is make sure the original @file{threads/test.c} is
-in place.  This will stop the tests from being run.
  
  @menu
  * Project 2 Code::              
  * Using the File System::       
  * How User Programs Work::      
+* Virtual Memory Layout::       
  * Global Requirements::         
  * Problem 2-1 Argument Passing::  
  * Problem 2-2 System Calls::    
@@ -56,22 +57,16 @@ doing is to simply go over each part you'll be working with.  In
  where the bulk of your work will be:
  
  @table @file
-@item addrspace.c
-@itemx addrspace.h
-An address space keeps track of all the data necessary to execute a
-user program.  Address space data is stored in @code{struct thread},
-but manipulated only by @file{addrspace.c}.  Address spaces need to
-keep track of things like paging information for the process (so that
-it knows which memory the process is using).  Address spaces also
-handle loading the program into memory and starting up the process's
-execution.
+@item process.c
+@itemx process.h
+Loads ELF binaries and starts processes.
  
  @item pagedir.c
  @itemx pagedir.h
  A simple manager for 80@var{x} page directories and page tables.
  Although you probably won't want to modify this code for this project,
  you may want to call some of its functions.  In particular,
-@code{pagedir_get_page()} may be helpful for accessing user memory.
+@func{pagedir_get_page} may be helpful for accessing user memory.
  
  @item syscall.c
  @itemx syscall.h
@@ -90,7 +85,7 @@ distinction between them, although the Intel processor manuals define
  them slightly differently on 80@var{x}86.}  These files handle
  exceptions.  Currently all exceptions simply print a message and
  terminate the process.  Some, but not all, solutions to project 2
-require modifying @code{page_fault()} in this file.
+require modifying @func{page_fault} in this file.
  
  @item gdt.c
  @itemx gdt.c
@@ -106,7 +101,7 @@ The Task-State Segment (TSS) is used for 80@var{x}86 architectural
  task switching.  Pintos uses the TSS only for switching stacks when a
  user process enters an interrupt handler, as does Linux.  @strong{You
  should not need to modify these files for any of the projects.}
-However, you can read the code if you're interested in how the GDT
+However, you can read the code if you're interested in how the TSS
  works.
  @end table
  
@@ -130,11 +125,52 @@ interfaces to understand how to use the file system, and especially
  its many limitations.  @strong{You should not modify the file system
  code for this project}.  Proper use of the file system routines now
  will make life much easier for project 4, when you improve the file
-system implementation.
+system implementation.  Until then, you will have to put up with the
+following limitations:
+
+@itemize @bullet
+@item
+No synchronization.  Concurrent accesses will interfere with one
+another, so external synchronization is needed.  @xref{Synchronizing
+File Access}, for more details.
+
+@item
+File size is fixed at creation time.  Because the root directory is
+represented as a file, the number of files that may be created is also
+limited.
+
+@item
+File data is allocated as a single extent, that is, data in a single
+file must occupy a contiguous range of sectors on disk.  External
+fragmentation can therefore become a serious problem as a file system is
+used over time.
+
+@item
+No subdirectories.
+
+@item
+File names are limited to 14 characters.
+
+@item
+A system crash mid-operation may corrupt the disk in a way
+that cannot be repaired automatically.  No `fsck' tool is
+provided in any case.
+@end itemize
+
+However one important feature is included:
+
+@itemize @bullet
+@item
+Unix-like semantics for filesys_remove() are implemented.
+That is, if a file is open when it is removed, its blocks
+are not deallocated and it may still be accessed by the
+threads that have it open until the last one closes it.  @xref{Removing
+an Open File}, for more information.
+@end itemize
  
  You need to be able to create and format simulated disks.  The
  @command{pintos} program provides this functionality with its
-@option{make-disk} command.  From the @file{filesys/build} directory,
+@option{make-disk} command.  From the @file{userprog/build} directory,
  execute @code{pintos make-disk fs.dsk 2}.  This command creates a 2 MB
  simulated disk named @file{fs.dsk}.  (It does not actually start
  Pintos.)  Then format the disk by passing the @option{-f} option to
@@ -156,6 +192,19 @@ you're very curious, you can look at the @command{pintos} program as
  well as @file{filesys/fsutil.c} to learn the implementation details,
  but it's really not relevant for this project.
  
+Here's a summary of how you would create and format a disk, copy the
+@command{echo} program into the new disk, and then run @command{echo}.
+It assumes that you've already built the tests in
+@file{tests/userprog} and that the current directory is
+@file{userprog/build}:
+
+@example
+pintos make-disk fs.dsk 2
+pintos run -f
+pintos put ../../tests/userprog/echo echo
+pintos run -ex echo
+@end example
+
  You can delete a file from the Pintos file system using the @option{-r
  @var{file}} kernel option, e.g.@: @code{pintos run -r @var{file}}.
  Also, @option{-ls} lists the files in the file system and @option{-p
@@ -166,31 +215,30 @@ Also, @option{-ls} lists the files in the file system and @option{-p
  
  Pintos can run normal C programs.  In fact, it can run any program you
  want, provided it's compiled into the proper file format, and uses
-only the system calls you implement.  (For example, @code{malloc()}
+only the system calls you implement.  (For example, @func{malloc}
  makes use of functionality that isn't provided by any of the syscalls
  we require you to support.)  The only other limitation is that Pintos
  can't run programs using floating point operations, since it doesn't
  include the necessary kernel functionality to save and restore the
  processor's floating-point unit when switching threads.  You can look
-in @file{test} directory for some examples.
+in @file{tests/userprog} directory for some examples.
  
  Pintos loads ELF executables, where ELF is an executable format used
  by Linux, Solaris, and many other Unix and Unix-like systems.
  Therefore, you can use any compiler and linker that produce
  80@var{x}86 ELF executables to produce programs for Pintos.  We
-recommend using the tools we provide in the @file{tests} directory.  By
-default, the @file{Makefile} in this directory will compile the test
-programs we provide.  You can edit the @file{Makefile} to compile your
-own test programs as well.
-
-One thing you should realize immediately is that, until you use the
-above operation to copy a test program to the emulated disk, Pintos
-will be unable to do very much useful work.  You will also find that
-you won't be able to do interesting things until you copy a variety of
-programs to the disk.  A useful technique is to create a clean
-reference disk and copy that over whenever you trash your
-@file{fs.dsk} beyond a useful state, which may happen occasionally
-while debugging.
+recommend using the tools we provide in the @file{tests/userprog}
+directory.  By default, the @file{Makefile} in this directory will
+compile the test programs we provide.  You can edit the
+@file{Makefile} to compile your own test programs as well.
+
+One thing you should realize immediately is that, until you copy a
+test program to the emulated disk, Pintos will be unable to do very
+much useful work.  You will also find that you won't be able to do
+interesting things until you copy a variety of programs to the disk.
+A useful technique is to create a clean reference disk and copy that
+over whenever you trash your @file{fs.dsk} beyond a useful state,
+which may happen occasionally while debugging.
  
  @node Virtual Memory Layout
  @section Virtual Memory Layout
@@ -206,8 +254,9 @@ User virtual memory is per-process.  Conceptually, each process is
  free to use the entire space of user virtual memory however it
  chooses.  When the kernel switches from one process to another, it
  also switches user virtual address spaces by switching the processor's
-page directory base register (see @code{pagedir_activate() in
-@file{userprog/pagedir.c}}.
+page directory base register (see @func{pagedir_activate in
+@file{userprog/pagedir.c}}.  @struct{thread} contains a pointer to a
+process's page directory.
  
  Kernel virtual memory is global.  It is always mapped the same way,
  regardless of what user process or kernel thread is running.  In
@@ -219,42 +268,104 @@ physical memory.
  
  User programs can only access user virtual memory.  An attempt to
  access kernel virtual memory will cause a page fault, handled by
-@code{page_fault()} in @file{userprog/exception.c}, and the process
+@func{page_fault} in @file{userprog/exception.c}, and the process
  will be terminated.  Kernel threads can access both kernel virtual
  memory and, if a user process is running, the user virtual memory of
-the running process.  However, an attempt to access memory at a user
-virtual address that doesn't have a page mapped into it will also
-cause a page fault.
+the running process.  However, even in the kernel, an attempt to
+access memory at a user virtual address that doesn't have a page
+mapped into it will cause a page fault.
+
+You must handle memory fragmentation gracefully, that is, a process
+that needs @var{N} pages of memory must not require that all @var{N}
+be contiguous.  In fact, it must not require that any of the pages be
+contiguous.
  
  @node Global Requirements
  @section Global Requirements
  
-For testing and grading purposes, we have some simple requirements for
-your output.  The kernel should print out the program's name and exit
-status whenever a process exits, e.g.@: @code{shell: exit(-1)}.  Aside
-from this, it should print out no other messages.  You may understand
-all those debug messages, but we won't, and it just clutters our
-ability to see the stuff we care about.
+For testing and grading purposes, we have some simple overall
+requirements:
+
+@itemize @bullet
+@item
+The kernel should print out the program's name and exit status whenever
+a process terminates, whether termination is caused by the @code{exit}
+system call or for another reason.
+
+@itemize @minus
+@item
+The message must be formatted exactly as if it was printed with
+@code{printf ("%s: exit(%d)\n", @dots{});} given appropriate arguments.
+
+@item
+The name printed should be the full name passed to
+@func{process_execute}, except that it is acceptable to truncate it to
+15 characters to allow for the limited space in @struct{thread}.  The
+name printed need not include arguments.
+
+@item
+Do not print a message when a kernel thread that is not a process
+terminates.
+
+@item
+Do not print messages about process termination for the @code{halt}
+system call.
+
+@item
+No message need be printed when a process fails to load.
+@end itemize
+
+@item
+Aside from this, the kernel should print out no other messages that
+Pintos as provided doesn't already print.  You
+may understand all those debug messages, but we won't, and it just
+clutters our ability to see the stuff we care about.
  
+@item
  Additionally, while it may be useful to hard-code which process will
  run at startup while debugging, before you submit your code you must
  make sure that it takes the start-up process name and arguments from
  the @samp{-ex} argument.  For example, running @code{pintos run -ex
  "testprogram 1 2 3 4"} will spawn @samp{testprogram 1 2 3 4} as the
  first process.
+@end itemize
  
  @node Problem 2-1 Argument Passing
  @section Problem 2-1: Argument Passing
  
-Currently, @code{thread_execute()} does not support passing arguments
+Currently, @func{process_execute} does not support passing arguments
  to new processes.  UNIX and other operating systems do allow passing
  command line arguments to a program, which accesses them via the argc,
  argv arguments to main.  You must implement this functionality by
-extending @code{thread_execute()} so that instead of simply taking a
-program file name, it can take a program name with arguments as a
-single string.  That is, @code{thread_execute("grep foo *.c")} should
-be a legal call.  @xref{80x86 Calling Convention}, for information on
-exactly how this works.
+extending @func{process_execute} so that instead of simply taking a
+program file name as its argument, it divides it into words at spaces.
+The first word is the program name, the second word is the first
+argument, and so on.  That is, @code{process_execute("grep foo bar")}
+should run @command{grep} passing two arguments @code{foo} and
+@file{bar}.  A few details:
+
+@itemize
+@item
+Multiple spaces are considered the same as a single space, so that
+@code{process_execute("grep foo bar")} would be equivalent to our
+original example.
+
+@item
+You can impose a reasonable limit on the length of the command line
+arguments.  For example, you could limit the arguments to those that
+will fit in a single page (4 kB).
+
+@item
+You can parse the argument strings any way you like.  If you're lost,
+look at @func{strtok_r}, prototyped in @file{lib/string.h} and
+implemented with thorough comments in @file{lib/string.c}.  You can
+find more about it by looking at the man page (run @code{man strtok_r}
+at the prompt).
+
+@item
+@xref{80x86 Calling Convention}, for information on exactly how you
+need to set up the stack.
+@end itemize
  
  @strong{This functionality is extremely important.}  Almost all our
  test cases rely on being able to pass arguments, so if you don't get
@@ -278,21 +389,25 @@ called by user programs are prototyped in @file{lib/user/syscall.h}:
  @table @code
  @item SYS_halt
  @itemx void halt (void)
-Stops Pintos and prints out performance statistics.  Note that this
-should be seldom used, since then you lose some information about
-possible deadlock situations, etc.
+Stops Pintos by calling @func{power_off} (declared in
+@file{threads/init.h}).  Note that this should be seldom used, since
+then you lose some information about possible deadlock situations,
+etc.
  
  @item SYS_exit
  @itemx void exit (int @var{status})
  Terminates the current user program, returning @var{status} to the
-kernel.  A @var{status} of 0 indicates a successful exit.  Other
-values may be used to indicate user-defined error conditions.
+kernel.  If the process's parent @func{join}s it, this is the status
+that will be returned.  Conventionally, a @var{status} of 0 indicates
+a successful exit.  Other values may be used to indicate user-defined
+conditions (usually errors).
  
  @item SYS_exec
-@itemx pid_t exec (const char *@var{file})
-Run the executable in @var{file} and return the new process's program
-id (pid).  If there is an error loading this program, returns pid -1,
-which otherwise should not be a valid id number.
+@itemx pid_t exec (const char *@var{cmd_line})
+Runs the executable whose name is given in @var{cmd_line}, passing any
+given arguments, and returns the new process's program id (pid).  Must
+return pid -1, which otherwise should not be a valid program id, if
+there is an error loading this program.
  
  @item SYS_join
  @itemx int join (pid_t @var{pid})
@@ -304,37 +419,50 @@ process, the return value is undefined (but kernel operation must not
  be disrupted).
  
  @item SYS_create
-@itemx bool create (const char *@var{file})
-Create a new file called @var{file}.  Returns -1 if failed, 0 if OK.
+@itemx bool create (const char *@var{file}, unsigned @var{initial_size})
+Create a new file called @var{file} initially @var{initial_size} bytes
+in size.  Returns true if successful, false otherwise.
  
  @item SYS_remove
  @itemx bool remove (const char *@var{file})
-Delete the file called @var{file}.  Returns -1 if failed, 0 if OK.
+Delete the file called @var{file}.  Returns true if successful, false
+otherwise.
  
  @item SYS_open
  @itemx int open (const char *@var{file})
  Open the file called @var{file}.  Returns a nonnegative integer handle
  called a ``file descriptor'' (fd), or -1 if the file could not be
-opened.  File descriptors numbered 0 and 1 are reserved for the
-console.  All open files associated with a process should be closed
+opened.  All open files associated with a process should be closed
  when the process exits or is terminated.
  
+File descriptors numbered 0 and 1 are reserved for the console: fd 0
+is standard input (@code{stdin}), fd 1 is standard output
+(@code{stdout}).  These special file descriptors are valid as system
+call arguments only as explicitly described below.
+
  @item SYS_filesize
  @itemx int filesize (int @var{fd})
-Returns the size, in bytes, of the file open as @var{fd}, or -1 if the
-file is invalid.
+Returns the size, in bytes, of the file open as @var{fd}.
  
  @item SYS_read
  @itemx int read (int @var{fd}, void *@var{buffer}, unsigned @var{size})
  Read @var{size} bytes from the file open as @var{fd} into
-@var{buffer}.  Returns the number of bytes actually read, or -1 if the
-file could not be read.
+@var{buffer}.  Returns the number of bytes actually read (0 at end of
+file), or -1 if the file could not be read (due to a condition other
+than end of file).  Fd 0 reads from the keyboard using
+@func{kbd_getc}.
  
  @item SYS_write
  @itemx int write (int @var{fd}, const void *@var{buffer}, unsigned @var{size})
  Write @var{size} bytes from @var{buffer} to the open file @var{fd}.
  Returns the number of bytes actually written, or -1 if the file could
-not be written.
+not be written.   
+
+Fd 1 writes to the console.  Your code to write to the console should
+write all of @var{buffer} in one call to @func{putbuf}, at least as
+long as @var{size} is not bigger than a few hundred bytes.  Otherwise,
+lines of text output by different processes may end up interleaved on
+the console, confusing both human readers and our grading scripts.
  
  @item SYS_seek
  @itemx void seek (int @var{fd}, unsigned @var{position})
@@ -342,6 +470,14 @@ Changes the next byte to be read or written in open file @var{fd} to
  @var{position}, expressed in bytes from the beginning of the file.
  (Thus, a @var{position} of 0 is the file's start.)
  
+A seek past the current end of a file is not an error.  A later read
+obtains 0 bytes, indicating end of file.  A later write extends the
+file, filling any unwritten gap with zeros.  (However, in Pintos files
+have a fixed length until project 4 is complete, so writes past end of
+file will return an error.)  These semantics are implemented in the
+file system and do not require any special effort in system call
+implementation.
+
  @item SYS_tell
  @itemx unsigned tell (int @var{fd})
  Returns the position of the next byte to be read or written in open
@@ -367,43 +503,97 @@ on the user's stack in the user's virtual address space.  We recommend
  writing and testing this code before implementing any other system
  call functionality.
  
+@anchor{Synchronizing File Access}
  You must make sure that system calls are properly synchronized so that
  any number of user processes can make them at once.  In particular, it
-is not safe to call into the filesystem code provided in the
+is not safe to call into the file system code provided in the
  @file{filesys} directory from multiple threads at once.  For now, we
-recommend adding a single lock that controls access to the filesystem
+recommend adding a single lock that controls access to the file system
  code.  You should acquire this lock before calling any functions in
  the @file{filesys} directory, and release it afterward.  Don't forget
-that @file{addrspace_load()} also accesses files.  @strong{For now, we
+that @func{process_execute} also accesses files.  @strong{For now, we
  recommend against modifying code in the @file{filesys} directory.}
  
-We have provided you a function for each system call in
+We have provided you a user-level function for each system call in
  @file{lib/user/syscall.c}.  These provide a way for user processes to
-invoke each system call from a C program.  Each of them calls an
-assembly language routine in @file{lib/user/syscall-stub.S}, which in
-turn invokes the system call interrupt and returns.
+invoke each system call from a C program.  Each uses a little inline
+assembly code to invoke the system call and (if appropriate) returns the
+system call's return value.
  
  When you're done with this part, and forevermore, Pintos should be
  bulletproof.  Nothing that a user program can do should ever cause the
-OS to crash, halt, assert fail, or otherwise stop running.  The sole
-exception is a call to the @code{halt} system call.
+OS to crash, halt, assert fail, or otherwise stop running.  It is
+important to emphasize this point: our tests will try to break your
+system calls in many, many ways.  You need to think of all the corner
+cases and handle them.  The sole way a user program should be able to
+cause the OS to halt is by invoking the @code{halt} system call.
+
+If a system call is passed an invalid argument, acceptable options
+include returning an error value (for those calls that return a
+value), returning an undefined value, or terminating the process.
  
  @xref{System Calls}, for more information on how syscalls work.
  
  @node User Programs FAQ
  @section FAQ
  
-@enumerate 1
-@item General FAQs
-
  @enumerate 1
  @item
  @b{Do we need a working project 1 to implement project 2?}
  
-You may find the code for @code{thread_join()} to be useful in
+You may find the code for @func{thread_join} to be useful in
  implementing the join syscall, but besides that, you can use
  the original code provided for project 1.
  
+@item
+@b{@samp{pintos put} always panics.}
+
+Here are the most common causes:
+
+@itemize @bullet
+@item
+The disk hasn't yet been formatted (with @samp{pintos run -f}).
+
+@item
+The file name specified is too long.  The file system limits file names
+to 14 characters.  If you're using a command like @samp{pintos put
+../../tests/userprog/echo}, that overflows the limit.  Use
+@samp{pintos put ../../tests/userprog/echo echo} to put the file under
+the name @file{echo} instead.
+
+@item
+The file system is full.
+
+@item
+The file system already contains 10 files.  (There's a 10-file limit for
+the base Pintos file system.)
+
+@item
+The file system is so fragmented that there's not enough contiguous
+space for your file.
+@end itemize
+
+@item
+@b{All my user programs die with page faults.}
+
+This will generally happen if you haven't implemented problem 2-1
+yet.  The reason is that the basic C library for user programs tries
+to read @var{argc} and @var{argv} off the stack.  Because the stack
+isn't properly set up yet, this causes a page fault.
+
+@item
+@b{I implemented 2-1 and now all my user programs die with
+@samp{system call!}.}
+
+Every reasonable program tries to make at least one system call
+(@func{exit}) and most programs make more than that.  Notably,
+@func{printf} invokes the @code{write} system call.  The default
+system call handler just prints @samp{system call!} and terminates the
+program.  You'll have to implement 2-2 before you see anything more
+interesting.  Until then, you can use @func{hex_dump} to convince
+yourself that 2-1 is implemented correctly (@pxref{Argument Passing to
+main}).
+
  @item
  @b{Is there a way I can disassemble user programs?}
  
@@ -421,11 +611,18 @@ the features that are expected of a real operating system's C library.
  The C library must be built specifically for the operating system (and
  architecture), since it must make system calls for I/O and memory
  allocation.  (Not all functions do, of course, but usually the library
-is compiled as a unit.)  If you wish to port libraries to Pintos, feel
-free.
+is compiled as a unit.)
+
+@item
+@b{Can I use lib@var{foo} in my Pintos programs?}
+
+The chances are good that lib@var{foo} uses parts of the C library
+that Pintos doesn't implement.  It will probably take at least some
+porting effort to make it work under Pintos.  Notably, the Pintos
+userland C library does not have a @func{malloc} implementation.
  
  @item
-@b{How do I compile new user programs? How do I make 'echo' compile?}
+@b{How do I compile new user programs?}
  
  You need to modify @file{tests/Makefile}.
  
@@ -433,8 +630,8 @@ You need to modify @file{tests/Makefile}.
  @b{What's the difference between @code{tid_t} and @code{pid_t}?}
  
  A @code{tid_t} identifies a kernel thread, which may have a user
-process running in it (if created with @code{thread_execute()}) or not
-(if created with @code{thread_create()}).  It is a data type used only
+process running in it (if created with @func{process_execute}) or not
+(if created with @func{thread_create}).  It is a data type used only
  in the kernel.
  
  A @code{pid_t} identifies a user process.  It is used by user
@@ -450,19 +647,64 @@ same process, or you can use a more complex mapping.  It's up to you.
  @b{I can't seem to figure out how to read from and write to user
  memory. What should I do?}
  
-The kernel must treat user memory delicately.  The user can pass a
-null pointer or an invalid pointer (one that doesn't point to any
-memory at all), or a kernel pointer (above @code{PHYS_BASE}).  All of
-these must be rejected without harm to the kernel or other running
-processes.
-
-There are at least two reasonable ways to access user memory.  First,
-you can translate user addresses (below @code{PHYS_BASE}) into kernel
-addresses (above @code{PHYS_BASE}) using the functions in
-@file{pagedir.c}, and then access kernel memory.  Second, you can
-dereference user pointers directly and handle page faults by
-terminating the process.  In either case, you'll need to reject kernel
-pointers as a special case.
+The kernel must treat user memory delicately.  As part of a system
+call, the user can pass to the kernel a null pointer, a pointer to
+unmapped virtual memory, or a pointer to kernel virtual address space
+(above @code{PHYS_BASE}).  All of these types of invalid pointers must
+be rejected without harm to the kernel or other running processes.  At
+your option, the kernel may handle invalid pointers by terminating the
+process or returning from the system call with an error.
+
+There are at least two reasonable ways to do this correctly.  The
+first method is to ``verify then access'':@footnote{These terms are
+made up for this document.  They are not standard terminology.} verify
+the validity of a user-provided pointer, then dereference it.  If you
+choose this route, you'll want to look at the functions in
+@file{userprog/pagedir.c} and in @file{threads/mmu.h}.  This is the
+simplest way to handle user memory access.
+
+The second method is to ``assume and react'': directly dereference
+user pointers, after checking that they point below @code{PHYS_BASE}.
+Invalid user pointers will then cause a ``page fault'' that you can
+handle by modifying the code for @func{page_fault} in
+@file{userprog/exception.cc}.  This technique is normally faster
+because it takes advantage of the processor's MMU, so it tends to be
+used in real kernels (including Linux).
+
+In either case, you need to make sure not to ``leak'' resources.  For
+example, suppose that your system call has acquired a lock or
+allocated a page of memory.  If you encounter an invalid user pointer
+afterward, you must still be sure to release the lock or free the page
+of memory.  If you choose to ``verify then access,'' then this should
+be straightforward, but for ``assume and react'' it's more difficult,
+because there's no way to return an error code from a memory access.
+Therefore, for those who want to try the latter technique, we'll
+provide a little bit of helpful code:
+
+@verbatim
+/* Tries to copy a byte from user address USRC to kernel address DST.
+   Returns true if successful, false if USRC is invalid. */
+static inline bool get_user (uint8_t *dst, const uint8_t *usrc) {
+  int eax;
+  asm ("mov %%eax, offset 1f; mov %%al, %2; mov %0, %%al; 1:"
+       : "=m" (*dst), "=&a" (eax) : "m" (*usrc));
+  return eax != 0;
+}
+
+/* Tries write BYTE to user address UDST.
+   Returns true if successful, false if UDST is invalid. */
+static inline bool put_user (uint8_t *udst, uint8_t byte) {
+  int eax;
+  asm ("mov %%eax, offset 1f; mov %0, %b2; 1:"
+       : "=m" (*udst), "=&a" (eax) : "r" (byte));
+  return eax != 0;
+}
+@end verbatim
+
+Each of these functions assumes that the user address has already been
+verified to be below @code{PHYS_BASE}.  They also assume that you've
+modified @func{page_fault} so that a page fault in the kernel causes
+@code{eax} to be set to 0 and its former value copied into @code{eip}.
  
  @item
  @b{I'm also confused about reading from and writing to the stack. Can
@@ -485,32 +727,23 @@ the location.
  @item
  Each character is 1 byte.
  @end itemize
-@end enumerate
-
-@item Argument Passing FAQs
  
-@enumerate 1
  @item
-@b{What will be the format of command line arguments?}
-
-You should assume that command line arguments are delimited by white
-space.
+@b{Why doesn't keyboard input work with @samp{pintos -v}?}
  
-@item
-@b{What is the maximum length of the command line arguments?}
-
-You can impose some reasonable maximum as long as you're prepared to
-defend it in your @file{DESIGNDOC}.
+Serial input isn't implemented.  Don't use @samp{pintos -v} if you
+want to use the shell or otherwise provide keyboard input.
+@end enumerate
  
-@item
-@b{How do I parse all these argument strings?}
+@menu
+* Problem 2-1 Argument Passing FAQ::  
+* Problem 2-2 System Calls FAQ::  
+@end menu
  
-You're welcome to use any technique you please, as long as it works.
-If you're lost, look at @code{strtok_r()}, prototyped in
-@file{lib/string.h} and implemented with thorough comments in
-@file{lib/string.c}.  You can find more about it by looking at the man
-page (run @code{man strtok_r} at the prompt).
+@node Problem 2-1 Argument Passing FAQ
+@subsection Problem 2-1: Argument Passing FAQ
  
+@enumerate 1
  @item
  @b{Why is the top of the stack at @t{0xc0000000}?  Isn't that off the
  top of user virtual memory?  Shouldn't it be @t{0xbfffffff}?}
@@ -530,17 +763,12 @@ any multiple of @t{0x10000000} from @t{0x80000000} to @t{0xc0000000},
  simply via recompilation.
  @end enumerate
  
-@item System Calls FAQs
+@node Problem 2-2 System Calls FAQ
+@subsection Problem 2-2: System Calls FAQ
  
  @enumerate 1
  @item
-@b{What should I do with the parameter passed to @code{exit()}?}
-
-This value, the exit status of the process, must be returned to the
-thread's parent when @code{join()} is called.
-
-@item
-@b{Can I just cast a pointer to a @code{struct file} object to get a
+@b{Can I just cast a pointer to a @struct{file} object to get a
  unique file descriptor?  Can I just cast a @code{struct thread *} to a
  @code{pid_t}?  It's so much simpler that way!}
  
@@ -558,6 +786,7 @@ maximum.  That said, if your design calls for it, you may impose a
  limit of 128 open files per process (as the Solaris machines here do).
  
  @item
+@anchor{Removing an Open File}
  @b{What happens when two (or more) processes have a file open and one of
  them removes it?}
  
@@ -569,28 +798,12 @@ and no other processes will be able to open it, but it will continue
  to exist until all file descriptors referring to the file are closed
  or the machine shuts down.
  
-@item
-@b{What happens if a system call is passed an invalid argument, such
-as Open being called with an invalid filename?}
-
-Pintos should not crash.  Acceptable options include returning an
-error value (for those calls that return a value), returning an
-undefined value, or terminating the process.
-
  @item
  @b{I've discovered that some of my user programs need more than one 4
  kB page of stack space.  What should I do?}
  
  You may modify the stack setup code to allocate more than one page of
  stack space for each process.
-
-@item
-@b{What do I need to print on thread completion?}
-
-You should print the complete thread name (as specified in the
-@code{SYS_exec} call) followed by the exit status code,
-e.g.@: @samp{example 1 2 3 4: 0}.
-@end enumerate
  @end enumerate
  
  @node 80x86 Calling Convention
@@ -602,7 +815,7 @@ calling convention.  Some of the basics should be familiar from CS
  have seen even more of it.  I've omitted some of the complexity, since
  this isn't a class in how function calls work, so don't expect this to
  be exactly correct in full, gory detail.  If you do want all the
-details, you can refer to @cite{[SysV-i386]}.
+details, you can refer to @bibref{SysV-i386}.
  
  Whenever a function call happens, you need to put the arguments on the
  call stack for that function, before the code for that function
@@ -638,18 +851,18 @@ some of your caches.  This is why inlining code can be much faster.
  @node Argument Passing to main
  @subsection Argument Passing to @code{main()}
  
-In @code{main()}'s case, there is no caller to prepare the stack
+In @func{main}'s case, there is no caller to prepare the stack
  before it runs.  Therefore, the kernel needs to do it.  Fortunately,
  since there's no caller, there are no registers to save, no return
  address to deal with, etc.  The only difficult detail to take care of,
-after loading the code, is putting the arguments to @code{main()} on
+after loading the code, is putting the arguments to @func{main} on
  the stack.
  
  (The above is a small lie: most compilers will emit code where main
  isn't strictly speaking the first function.  This isn't an important
  detail.  If you want to look into it more, try disassembling a program
  and looking around a bit.  However, you can just act as if
-@code{main()} is the very first function called.)
+@func{main} is the very first function called.)
  
  Pintos is written for the 80@var{x}86 architecture.  Therefore, we
  need to adhere to the 80@var{x}86 calling convention.  Basically, you
@@ -660,16 +873,16 @@ have a caller, its stack frame must have the same layout as any other
  function's.  The program will assume that the stack has been laid out
  this way when it begins running.
  
-So, what are the arguments to @code{main()}? Just two: an @samp{int}
+So, what are the arguments to @func{main}? Just two: an @samp{int}
  (@code{argc}) and a @samp{char **} (@code{argv}).  @code{argv} is an
  array of strings, and @code{argc} is the number of strings in that
  array.  However, the hard part isn't these two things.  The hard part
  is getting all the individual strings in the right place.  As we go
  through the procedure, let us consider the following example command:
-@samp{/bin/ls -l *.h *.c}.
+@samp{/bin/ls -l foo bar}.
  
  The first thing to do is to break the command line into individual
-strings: @samp{/bin/ls}, @samp{-l}, @samp{*.h}, and @samp{*.c}.  These
+strings: @samp{/bin/ls}, @samp{-l}, @samp{foo}, and @samp{bar}.  These
  constitute the arguments of the command, including the program name
  itself (which belongs in @code{argv[0]}).
  
@@ -685,7 +898,7 @@ After we push all of the strings onto the stack, we adjust the stack
  pointer so that it is word-aligned: that is, we move it down to the
  next 4-byte boundary.  This is required because we will next be
  placing several words of data on the stack, and they must be aligned
-in order to be read correctly.  In our example, as you'll see below,
+to be read correctly.  In our example, as you'll see below,
  the strings start at address @t{0xffed}.  One word below that would be
  at @t{0xffe9}, so we could in theory put the next word on the stack
  there.  However, since the stack pointer should always be
@@ -693,7 +906,7 @@ word-aligned, we instead leave the stack pointer at @t{0xffe8}.
  
  Once we align the stack pointer, we then push the elements of the
  argument vector, that is, a null pointer, then the addresses of the
-strings @samp{/bin/ls}, @samp{-l}, @samp{*.h}, and @samp{*.c}) onto
+strings @samp{/bin/ls}, @samp{-l}, @samp{foo}, and @samp{bar}) onto
  the stack.  This must be done in reverse order, such that
  @code{argv[0]} is at the lowest virtual address, again because the
  stack is growing downward.  (The null pointer pushed first is because
@@ -710,7 +923,7 @@ Then we push @code{argv} (that is, the address of the first element of
  the @code{argv} array) onto the stack, along with the length of the
  argument vector (@code{argc}, 4 in this example).  This must also be
  done in this order, since @code{argc} is the first argument to
-@code{main()} and therefore is on first (smaller address) on the
+@func{main} and therefore is on first (smaller address) on the
  stack.  Finally, we push a fake ``return address'' and leave the stack
  pointer to point to its location.
  
@@ -725,8 +938,8 @@ user program (assuming for this example that the stack bottom is
  @end html
  @multitable {@t{0xbfffffff}} {``return address''} {@t{/bin/ls\0}}
  @item Address @tab Name @tab Data
-@item @t{0xbffffffc} @tab @code{*argv[3]} @tab @samp{*.c\0}
-@item @t{0xbffffff8} @tab @code{*argv[2]} @tab @samp{*.h\0}
+@item @t{0xbffffffc} @tab @code{*argv[3]} @tab @samp{bar\0}
+@item @t{0xbffffff8} @tab @code{*argv[2]} @tab @samp{foo\0}
  @item @t{0xbffffff5} @tab @code{*argv[1]} @tab @samp{-l\0}
  @item @t{0xbfffffed} @tab @code{*argv[0]} @tab @samp{/bin/ls\0}
  @item @t{0xbfffffec} @tab word-align @tab @samp{\0}
@@ -735,7 +948,7 @@ user program (assuming for this example that the stack bottom is
  @item @t{0xbfffffe0} @tab @code{argv[2]} @tab @t{0xbffffff8}
  @item @t{0xbfffffdc} @tab @code{argv[1]} @tab @t{0xbffffff5}
  @item @t{0xbfffffd8} @tab @code{argv[0]} @tab @t{0xbfffffed}
-@item @t{0xbfffffd4} @tab @code{argv} @tab @t{0xbffffffd8}
+@item @t{0xbfffffd4} @tab @code{argv} @tab @t{0xbfffffd8}
  @item @t{0xbfffffd0} @tab @code{argc} @tab 4
  @item @t{0xbfffffcc} @tab ``return address'' @tab 0
  @end multitable
@@ -750,17 +963,17 @@ As shown above, your code should start the stack at the very top of
  the user virtual address space, in the page just below virtual address
  @code{PHYS_BASE} (defined in @file{threads/mmu.h}).
  
-You may find the non-standard @code{hex_dump()} function, declared in
+You may find the non-standard @func{hex_dump} function, declared in
  @file{<stdio.h>}, useful for debugging your argument passing code.
  Here's what it would show in the above example, given that
  @code{PHYS_BASE} is @t{0xc0000000}:
  
-@example
+@verbatim
  bfffffc0                                      00 00 00 00 |            ....|
  bfffffd0  04 00 00 00 d8 ff ff bf-ed ff ff bf f5 ff ff bf |................|
  bfffffe0  f8 ff ff bf fc ff ff bf-00 00 00 00 00 2f 62 69 |............./bi|
-bffffff0  6e 2f 6c 73 00 2d 6c 00-2a 2e 68 00 2a 2e 63 00 |n/ls.-l.*.h.*.c.|
-@end example
+bffffff0  6e 2f 6c 73 00 2d 6c 00-66 6f 6f 00 62 61 72 00 |n/ls.-l.foo.bar.|
+@end verbatim
  
  @node System Calls
  @section System Calls
@@ -786,11 +999,11 @@ interrupt.
  
  The normal calling convention pushes function arguments on the stack
  from right to left and the stack grows downward.  Thus, when the
-system call handler @code{syscall_handler()} gets control, the system
+system call handler @func{syscall_handler} gets control, the system
  call number is in the 32-bit word at the caller's stack pointer, the
  first argument is in the 32-bit word at the next higher address, and
  so on.  The caller's stack pointer is accessible to
-@code{syscall_handler()} as the @samp{esp} member of the @code{struct
+@func{syscall_handler} as the @samp{esp} member of the @code{struct
  intr_frame} passed to it.
  
  Here's an example stack frame for calling a system call numbered 10
@@ -800,7 +1013,7 @@ arbitrary:
  @html
  <CENTER>
  @end html
-@multitable {Address} {Value}
+@multitable {@t{0xbffffe7c}} {Value}
  @item Address @tab Value
  @item @t{0xbffffe7c} @tab 3
  @item @t{0xbffffe78} @tab 2
@@ -816,4 +1029,10 @@ In this example, the caller's stack pointer would be at
  
  The 80@var{x}86 convention for function return values is to place them
  in the @samp{EAX} register.  System calls that return a value can do
-so by modifying the @samp{eax} member of @code{struct intr_frame}.
+so by modifying the @samp{eax} member of @struct{intr_frame}.
+
+You should try to avoid writing large amounts of repetitive code for
+implementing system calls.  Each system call argument, whether an
+integer or a pointer, takes up 4 bytes on the stack.  You should be able
+to take advantage of this to avoid writing much near-identical code for
+retrieving each system call's arguments from the stack.