Clarifications.

[pintos-anon] / doc / userprog.texi
diff --git a/doc/userprog.texi b/doc/userprog.texi

index a729418fa18fa2b1c6756d027fd598563bca5a2e..9277b087f5d9d9ba30e5a90d20ab8951f9b1284d 100644 (file)
--- a/doc/userprog.texi
+++ b/doc/userprog.texi
@@ -14,7 +14,7 @@ assignment.  However, you will also be interacting with almost every
  other part of the code for this assignment. We will describe the
  relevant parts below. If you are confident in your HW1 code, you can
  build on top of it.  However, if you wish you can start with a fresh
-copy of the code and re-implement @code{thread_join()}, which is the
+copy of the code and re-implement @func{thread_join}, which is the
  only part of project #1 required for this assignment.  Your submission
  should define @code{THREAD_JOIN_IMPLEMENTED} in @file{constants.h}
  (@pxref{Conditional Compilation}).
@@ -34,8 +34,6 @@ this illusion.
  
  Before we delve into the details of the new code that you'll be
  working with, you should probably undo the test cases from project 1.
-All you need to do is make sure the original @file{threads/test.c} is
-in place.  This will stop the tests from being run.
  
  @menu
  * Project 2 Code::              
@@ -68,7 +66,7 @@ Loads ELF binaries and starts processes.
  A simple manager for 80@var{x} page directories and page tables.
  Although you probably won't want to modify this code for this project,
  you may want to call some of its functions.  In particular,
-@code{pagedir_get_page()} may be helpful for accessing user memory.
+@func{pagedir_get_page} may be helpful for accessing user memory.
  
  @item syscall.c
  @itemx syscall.h
@@ -87,7 +85,7 @@ distinction between them, although the Intel processor manuals define
  them slightly differently on 80@var{x}86.}  These files handle
  exceptions.  Currently all exceptions simply print a message and
  terminate the process.  Some, but not all, solutions to project 2
-require modifying @code{page_fault()} in this file.
+require modifying @func{page_fault} in this file.
  
  @item gdt.c
  @itemx gdt.c
@@ -103,7 +101,7 @@ The Task-State Segment (TSS) is used for 80@var{x}86 architectural
  task switching.  Pintos uses the TSS only for switching stacks when a
  user process enters an interrupt handler, as does Linux.  @strong{You
  should not need to modify these files for any of the projects.}
-However, you can read the code if you're interested in how the GDT
+However, you can read the code if you're interested in how the TSS
  works.
  @end table
  
@@ -131,7 +129,7 @@ system implementation.
  
  You need to be able to create and format simulated disks.  The
  @command{pintos} program provides this functionality with its
-@option{make-disk} command.  From the @file{filesys/build} directory,
+@option{make-disk} command.  From the @file{userprog/build} directory,
  execute @code{pintos make-disk fs.dsk 2}.  This command creates a 2 MB
  simulated disk named @file{fs.dsk}.  (It does not actually start
  Pintos.)  Then format the disk by passing the @option{-f} option to
@@ -163,7 +161,7 @@ Also, @option{-ls} lists the files in the file system and @option{-p
  
  Pintos can run normal C programs.  In fact, it can run any program you
  want, provided it's compiled into the proper file format, and uses
-only the system calls you implement.  (For example, @code{malloc()}
+only the system calls you implement.  (For example, @func{malloc}
  makes use of functionality that isn't provided by any of the syscalls
  we require you to support.)  The only other limitation is that Pintos
  can't run programs using floating point operations, since it doesn't
@@ -203,7 +201,7 @@ User virtual memory is per-process.  Conceptually, each process is
  free to use the entire space of user virtual memory however it
  chooses.  When the kernel switches from one process to another, it
  also switches user virtual address spaces by switching the processor's
-page directory base register (see @code{pagedir_activate() in
+page directory base register (see @func{pagedir_activate in
  @file{userprog/pagedir.c}}.
  
  Kernel virtual memory is global.  It is always mapped the same way,
@@ -216,7 +214,7 @@ physical memory.
  
  User programs can only access user virtual memory.  An attempt to
  access kernel virtual memory will cause a page fault, handled by
-@code{page_fault()} in @file{userprog/exception.c}, and the process
+@func{page_fault} in @file{userprog/exception.c}, and the process
  will be terminated.  Kernel threads can access both kernel virtual
  memory and, if a user process is running, the user virtual memory of
  the running process.  However, even in the kernel, an attempt to
@@ -243,11 +241,11 @@ first process.
  @node Problem 2-1 Argument Passing
  @section Problem 2-1: Argument Passing
  
-Currently, @code{process_execute()} does not support passing arguments
+Currently, @func{process_execute} does not support passing arguments
  to new processes.  UNIX and other operating systems do allow passing
  command line arguments to a program, which accesses them via the argc,
  argv arguments to main.  You must implement this functionality by
-extending @code{process_execute()} so that instead of simply taking a
+extending @func{process_execute} so that instead of simply taking a
  program file name, it can take a program name with arguments as a
  single string.  That is, @code{process_execute("grep foo *.c")} should
  be a legal call.  @xref{80x86 Calling Convention}, for information on
@@ -275,9 +273,10 @@ called by user programs are prototyped in @file{lib/user/syscall.h}:
  @table @code
  @item SYS_halt
  @itemx void halt (void)
-Stops Pintos and prints out performance statistics.  Note that this
-should be seldom used, since then you lose some information about
-possible deadlock situations, etc.
+Stops Pintos by calling @func{power_off} (declared in
+@file{threads/init.h}).  Note that this should be seldom used, since
+then you lose some information about possible deadlock situations,
+etc.
  
  @item SYS_exit
  @itemx void exit (int @var{status})
@@ -301,8 +300,9 @@ process, the return value is undefined (but kernel operation must not
  be disrupted).
  
  @item SYS_create
-@itemx bool create (const char *@var{file})
-Create a new file called @var{file}.  Returns -1 if failed, 0 if OK.
+@itemx bool create (const char *@var{file}, unsigned @var{initial_size})
+Create a new file called @var{file} initially @var{initial_size} bytes
+in size.  Returns -1 if failed, 0 if OK.
  
  @item SYS_remove
  @itemx bool remove (const char *@var{file})
@@ -371,7 +371,7 @@ is not safe to call into the filesystem code provided in the
  recommend adding a single lock that controls access to the filesystem
  code.  You should acquire this lock before calling any functions in
  the @file{filesys} directory, and release it afterward.  Don't forget
-that @file{process_execute()} also accesses files.  @strong{For now, we
+that @func{process_execute} also accesses files.  @strong{For now, we
  recommend against modifying code in the @file{filesys} directory.}
  
  We have provided you a function for each system call in
@@ -390,17 +390,22 @@ exception is a call to the @code{halt} system call.
  @node User Programs FAQ
  @section FAQ
  
-@enumerate 1
-@item General FAQs
-
  @enumerate 1
  @item
  @b{Do we need a working project 1 to implement project 2?}
  
-You may find the code for @code{thread_join()} to be useful in
+You may find the code for @func{thread_join} to be useful in
  implementing the join syscall, but besides that, you can use
  the original code provided for project 1.
  
+@item
+@b{All my user programs die with page faults.}
+
+This will generally happen if you haven't implemented problem 2-1
+yet.  The reason is that the basic C library for user programs tries
+to read @var{argc} and @var{argv} off the stack.  Because the stack
+isn't properly set up yet, this causes a page fault.
+
  @item
  @b{Is there a way I can disassemble user programs?}
  
@@ -422,7 +427,7 @@ is compiled as a unit.)  If you wish to port libraries to Pintos, feel
  free.
  
  @item
-@b{How do I compile new user programs? How do I make 'echo' compile?}
+@b{How do I compile new user programs?}
  
  You need to modify @file{tests/Makefile}.
  
@@ -430,8 +435,8 @@ You need to modify @file{tests/Makefile}.
  @b{What's the difference between @code{tid_t} and @code{pid_t}?}
  
  A @code{tid_t} identifies a kernel thread, which may have a user
-process running in it (if created with @code{process_execute()}) or not
-(if created with @code{thread_create()}).  It is a data type used only
+process running in it (if created with @func{process_execute}) or not
+(if created with @func{thread_create}).  It is a data type used only
  in the kernel.
  
  A @code{pid_t} identifies a user process.  It is used by user
@@ -466,7 +471,7 @@ simplest way to handle user memory access.
  The second method is to ``assume and react'': directly dereference
  user pointers, after checking that they point below @code{PHYS_BASE}.
  Invalid user pointers will then cause a ``page fault'' that you can
-handle by modifying the code for @code{page_fault()} in
+handle by modifying the code for @func{page_fault} in
  @file{userprog/exception.cc}.  This technique is normally faster
  because it takes advantage of the processor's MMU, so it tends to be
  used in real kernels (including Linux).
@@ -481,29 +486,29 @@ because there's no way to return an error code from a memory access.
  Therefore, for those who want to try the latter technique, we'll
  provide a little bit of helpful code:
  
-@example
+@verbatim
  /* Tries to copy a byte from user address USRC to kernel address DST.
     Returns true if successful, false if USRC is invalid. */
-static inline bool get_user (uint8_t *dst, const uint8_t *usrc) @{
+static inline bool get_user (uint8_t *dst, const uint8_t *usrc) {
    int eax;
    asm ("movl $1f, %%eax; movb %2, %%al; movb %%al, %0; 1:"
         : "=m" (*dst), "=&a" (eax) : "m" (*usrc));
    return eax != 0;
-@}
+}
  
  /* Tries write BYTE to user address UDST.
     Returns true if successful, false if UDST is invalid. */
-static inline bool put_user (uint8_t *udst, uint8_t byte) @{
+static inline bool put_user (uint8_t *udst, uint8_t byte) {
    int eax;
    asm ("movl $1f, %%eax; movb %b2, %0; 1:"
         : "=m" (*udst), "=&a" (eax) : "r" (byte));
    return eax != 0;
-@}
-@end example
+}
+@end verbatim
  
  Each of these functions assumes that the user address has already been
  verified to be below @code{PHYS_BASE}.  They also assume that you've
-modified @code{page_fault()} so that a page fault in the kernel causes
+modified @func{page_fault} so that a page fault in the kernel causes
  @code{eax} to be set to 0 and its former value copied into @code{eip}.
  
  @item
@@ -529,13 +534,19 @@ Each character is 1 byte.
  @end itemize
  
  @item
-@b{Why doesn't keyboard input work with @option{-nv}?}
+@b{Why doesn't keyboard input work with @option{-v}?}
  
-Serial input isn't implemented.  Don't use @option{-nv} if you want to
+Serial input isn't implemented.  Don't use @option{-v} if you want to
  use the shell or otherwise type at the keyboard.
  @end enumerate
  
-@item Argument Passing FAQs
+@menu
+* Problem 2-1 Argument Passing FAQ::  
+* Problem 2-2 System Calls FAQ::  
+@end menu
+
+@node Problem 2-1 Argument Passing FAQ
+@subsection Problem 2-1: Argument Passing FAQ
  
  @enumerate 1
  @item
@@ -554,7 +565,7 @@ defend it in your @file{DESIGNDOC}.
  @b{How do I parse all these argument strings?}
  
  You're welcome to use any technique you please, as long as it works.
-If you're lost, look at @code{strtok_r()}, prototyped in
+If you're lost, look at @func{strtok_r}, prototyped in
  @file{lib/string.h} and implemented with thorough comments in
  @file{lib/string.c}.  You can find more about it by looking at the man
  page (run @code{man strtok_r} at the prompt).
@@ -578,17 +589,18 @@ any multiple of @t{0x10000000} from @t{0x80000000} to @t{0xc0000000},
  simply via recompilation.
  @end enumerate
  
-@item System Calls FAQs
+@node Problem 2-2 System Calls FAQ
+@subsection Problem 2-2: System Calls FAQ
  
  @enumerate 1
  @item
-@b{What should I do with the parameter passed to @code{exit()}?}
+@b{What should I do with the parameter passed to @func{exit}?}
  
  This value, the exit status of the process, must be returned to the
-thread's parent when @code{join()} is called.
+thread's parent when @func{join} is called.
  
  @item
-@b{Can I just cast a pointer to a @code{struct file} object to get a
+@b{Can I just cast a pointer to a @struct{file} object to get a
  unique file descriptor?  Can I just cast a @code{struct thread *} to a
  @code{pid_t}?  It's so much simpler that way!}
  
@@ -606,6 +618,7 @@ maximum.  That said, if your design calls for it, you may impose a
  limit of 128 open files per process (as the Solaris machines here do).
  
  @item
+@anchor{Removing an Open File}
  @b{What happens when two (or more) processes have a file open and one of
  them removes it?}
  
@@ -639,7 +652,6 @@ You should print the complete thread name (as specified in the
  @code{SYS_exec} call) followed by the exit status code,
  e.g.@: @samp{example 1 2 3 4: 0}.
  @end enumerate
-@end enumerate
  
  @node 80x86 Calling Convention
  @section 80@var{x}86 Calling Convention
@@ -650,7 +662,7 @@ calling convention.  Some of the basics should be familiar from CS
  have seen even more of it.  I've omitted some of the complexity, since
  this isn't a class in how function calls work, so don't expect this to
  be exactly correct in full, gory detail.  If you do want all the
-details, you can refer to @cite{[SysV-i386]}.
+details, you can refer to @bibref{SysV-i386}.
  
  Whenever a function call happens, you need to put the arguments on the
  call stack for that function, before the code for that function
@@ -686,18 +698,18 @@ some of your caches.  This is why inlining code can be much faster.
  @node Argument Passing to main
  @subsection Argument Passing to @code{main()}
  
-In @code{main()}'s case, there is no caller to prepare the stack
+In @func{main}'s case, there is no caller to prepare the stack
  before it runs.  Therefore, the kernel needs to do it.  Fortunately,
  since there's no caller, there are no registers to save, no return
  address to deal with, etc.  The only difficult detail to take care of,
-after loading the code, is putting the arguments to @code{main()} on
+after loading the code, is putting the arguments to @func{main} on
  the stack.
  
  (The above is a small lie: most compilers will emit code where main
  isn't strictly speaking the first function.  This isn't an important
  detail.  If you want to look into it more, try disassembling a program
  and looking around a bit.  However, you can just act as if
-@code{main()} is the very first function called.)
+@func{main} is the very first function called.)
  
  Pintos is written for the 80@var{x}86 architecture.  Therefore, we
  need to adhere to the 80@var{x}86 calling convention.  Basically, you
@@ -708,7 +720,7 @@ have a caller, its stack frame must have the same layout as any other
  function's.  The program will assume that the stack has been laid out
  this way when it begins running.
  
-So, what are the arguments to @code{main()}? Just two: an @samp{int}
+So, what are the arguments to @func{main}? Just two: an @samp{int}
  (@code{argc}) and a @samp{char **} (@code{argv}).  @code{argv} is an
  array of strings, and @code{argc} is the number of strings in that
  array.  However, the hard part isn't these two things.  The hard part
@@ -758,7 +770,7 @@ Then we push @code{argv} (that is, the address of the first element of
  the @code{argv} array) onto the stack, along with the length of the
  argument vector (@code{argc}, 4 in this example).  This must also be
  done in this order, since @code{argc} is the first argument to
-@code{main()} and therefore is on first (smaller address) on the
+@func{main} and therefore is on first (smaller address) on the
  stack.  Finally, we push a fake ``return address'' and leave the stack
  pointer to point to its location.
  
@@ -783,7 +795,7 @@ user program (assuming for this example that the stack bottom is
  @item @t{0xbfffffe0} @tab @code{argv[2]} @tab @t{0xbffffff8}
  @item @t{0xbfffffdc} @tab @code{argv[1]} @tab @t{0xbffffff5}
  @item @t{0xbfffffd8} @tab @code{argv[0]} @tab @t{0xbfffffed}
-@item @t{0xbfffffd4} @tab @code{argv} @tab @t{0xbffffffd8}
+@item @t{0xbfffffd4} @tab @code{argv} @tab @t{0xbfffffd8}
  @item @t{0xbfffffd0} @tab @code{argc} @tab 4
  @item @t{0xbfffffcc} @tab ``return address'' @tab 0
  @end multitable
@@ -798,17 +810,17 @@ As shown above, your code should start the stack at the very top of
  the user virtual address space, in the page just below virtual address
  @code{PHYS_BASE} (defined in @file{threads/mmu.h}).
  
-You may find the non-standard @code{hex_dump()} function, declared in
+You may find the non-standard @func{hex_dump} function, declared in
  @file{<stdio.h>}, useful for debugging your argument passing code.
  Here's what it would show in the above example, given that
  @code{PHYS_BASE} is @t{0xc0000000}:
  
-@example
+@verbatim
  bfffffc0                                      00 00 00 00 |            ....|
  bfffffd0  04 00 00 00 d8 ff ff bf-ed ff ff bf f5 ff ff bf |................|
  bfffffe0  f8 ff ff bf fc ff ff bf-00 00 00 00 00 2f 62 69 |............./bi|
  bffffff0  6e 2f 6c 73 00 2d 6c 00-2a 2e 68 00 2a 2e 63 00 |n/ls.-l.*.h.*.c.|
-@end example
+@end verbatim
  
  @node System Calls
  @section System Calls
@@ -834,11 +846,11 @@ interrupt.
  
  The normal calling convention pushes function arguments on the stack
  from right to left and the stack grows downward.  Thus, when the
-system call handler @code{syscall_handler()} gets control, the system
+system call handler @func{syscall_handler} gets control, the system
  call number is in the 32-bit word at the caller's stack pointer, the
  first argument is in the 32-bit word at the next higher address, and
  so on.  The caller's stack pointer is accessible to
-@code{syscall_handler()} as the @samp{esp} member of the @code{struct
+@func{syscall_handler} as the @samp{esp} member of the @code{struct
  intr_frame} passed to it.
  
  Here's an example stack frame for calling a system call numbered 10
@@ -848,7 +860,7 @@ arbitrary:
  @html
  <CENTER>
  @end html
-@multitable {Address} {Value}
+@multitable {@t{0xbffffe7c}} {Value}
  @item Address @tab Value
  @item @t{0xbffffe7c} @tab 3
  @item @t{0xbffffe78} @tab 2
@@ -864,4 +876,4 @@ In this example, the caller's stack pointer would be at
  
  The 80@var{x}86 convention for function return values is to place them
  in the @samp{EAX} register.  System calls that return a value can do
-so by modifying the @samp{eax} member of @code{struct intr_frame}.
+so by modifying the @samp{eax} member of @struct{intr_frame}.