X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=doc%2Fuserprog.texi;h=903d9c53311dcd856860b10a5749ffdbd234d3ab;hb=db1421e7321efc7bb3aa3b232fbde94ce154dd23;hp=a729418fa18fa2b1c6756d027fd598563bca5a2e;hpb=dcc3b1bc434d3c91e9a7d4728d120797b522b59d;p=pintos-anon diff --git a/doc/userprog.texi b/doc/userprog.texi index a729418..903d9c5 100644 --- a/doc/userprog.texi +++ b/doc/userprog.texi @@ -14,7 +14,7 @@ assignment. However, you will also be interacting with almost every other part of the code for this assignment. We will describe the relevant parts below. If you are confident in your HW1 code, you can build on top of it. However, if you wish you can start with a fresh -copy of the code and re-implement @code{thread_join()}, which is the +copy of the code and re-implement @func{thread_join}, which is the only part of project #1 required for this assignment. Your submission should define @code{THREAD_JOIN_IMPLEMENTED} in @file{constants.h} (@pxref{Conditional Compilation}). @@ -34,8 +34,6 @@ this illusion. Before we delve into the details of the new code that you'll be working with, you should probably undo the test cases from project 1. -All you need to do is make sure the original @file{threads/test.c} is -in place. This will stop the tests from being run. @menu * Project 2 Code:: @@ -68,7 +66,7 @@ Loads ELF binaries and starts processes. A simple manager for 80@var{x} page directories and page tables. Although you probably won't want to modify this code for this project, you may want to call some of its functions. In particular, -@code{pagedir_get_page()} may be helpful for accessing user memory. +@func{pagedir_get_page} may be helpful for accessing user memory. @item syscall.c @itemx syscall.h @@ -87,7 +85,7 @@ distinction between them, although the Intel processor manuals define them slightly differently on 80@var{x}86.} These files handle exceptions. Currently all exceptions simply print a message and terminate the process. Some, but not all, solutions to project 2 -require modifying @code{page_fault()} in this file. +require modifying @func{page_fault} in this file. @item gdt.c @itemx gdt.c @@ -103,7 +101,7 @@ The Task-State Segment (TSS) is used for 80@var{x}86 architectural task switching. Pintos uses the TSS only for switching stacks when a user process enters an interrupt handler, as does Linux. @strong{You should not need to modify these files for any of the projects.} -However, you can read the code if you're interested in how the GDT +However, you can read the code if you're interested in how the TSS works. @end table @@ -163,7 +161,7 @@ Also, @option{-ls} lists the files in the file system and @option{-p Pintos can run normal C programs. In fact, it can run any program you want, provided it's compiled into the proper file format, and uses -only the system calls you implement. (For example, @code{malloc()} +only the system calls you implement. (For example, @func{malloc} makes use of functionality that isn't provided by any of the syscalls we require you to support.) The only other limitation is that Pintos can't run programs using floating point operations, since it doesn't @@ -203,7 +201,7 @@ User virtual memory is per-process. Conceptually, each process is free to use the entire space of user virtual memory however it chooses. When the kernel switches from one process to another, it also switches user virtual address spaces by switching the processor's -page directory base register (see @code{pagedir_activate() in +page directory base register (see @func{pagedir_activate in @file{userprog/pagedir.c}}. Kernel virtual memory is global. It is always mapped the same way, @@ -216,7 +214,7 @@ physical memory. User programs can only access user virtual memory. An attempt to access kernel virtual memory will cause a page fault, handled by -@code{page_fault()} in @file{userprog/exception.c}, and the process +@func{page_fault} in @file{userprog/exception.c}, and the process will be terminated. Kernel threads can access both kernel virtual memory and, if a user process is running, the user virtual memory of the running process. However, even in the kernel, an attempt to @@ -243,11 +241,11 @@ first process. @node Problem 2-1 Argument Passing @section Problem 2-1: Argument Passing -Currently, @code{process_execute()} does not support passing arguments +Currently, @func{process_execute} does not support passing arguments to new processes. UNIX and other operating systems do allow passing command line arguments to a program, which accesses them via the argc, argv arguments to main. You must implement this functionality by -extending @code{process_execute()} so that instead of simply taking a +extending @func{process_execute} so that instead of simply taking a program file name, it can take a program name with arguments as a single string. That is, @code{process_execute("grep foo *.c")} should be a legal call. @xref{80x86 Calling Convention}, for information on @@ -371,7 +369,7 @@ is not safe to call into the filesystem code provided in the recommend adding a single lock that controls access to the filesystem code. You should acquire this lock before calling any functions in the @file{filesys} directory, and release it afterward. Don't forget -that @file{process_execute()} also accesses files. @strong{For now, we +that @func{process_execute} also accesses files. @strong{For now, we recommend against modifying code in the @file{filesys} directory.} We have provided you a function for each system call in @@ -390,14 +388,11 @@ exception is a call to the @code{halt} system call. @node User Programs FAQ @section FAQ -@enumerate 1 -@item General FAQs - @enumerate 1 @item @b{Do we need a working project 1 to implement project 2?} -You may find the code for @code{thread_join()} to be useful in +You may find the code for @func{thread_join} to be useful in implementing the join syscall, but besides that, you can use the original code provided for project 1. @@ -430,8 +425,8 @@ You need to modify @file{tests/Makefile}. @b{What's the difference between @code{tid_t} and @code{pid_t}?} A @code{tid_t} identifies a kernel thread, which may have a user -process running in it (if created with @code{process_execute()}) or not -(if created with @code{thread_create()}). It is a data type used only +process running in it (if created with @func{process_execute}) or not +(if created with @func{thread_create}). It is a data type used only in the kernel. A @code{pid_t} identifies a user process. It is used by user @@ -466,7 +461,7 @@ simplest way to handle user memory access. The second method is to ``assume and react'': directly dereference user pointers, after checking that they point below @code{PHYS_BASE}. Invalid user pointers will then cause a ``page fault'' that you can -handle by modifying the code for @code{page_fault()} in +handle by modifying the code for @func{page_fault} in @file{userprog/exception.cc}. This technique is normally faster because it takes advantage of the processor's MMU, so it tends to be used in real kernels (including Linux). @@ -481,29 +476,29 @@ because there's no way to return an error code from a memory access. Therefore, for those who want to try the latter technique, we'll provide a little bit of helpful code: -@example +@verbatim /* Tries to copy a byte from user address USRC to kernel address DST. Returns true if successful, false if USRC is invalid. */ -static inline bool get_user (uint8_t *dst, const uint8_t *usrc) @{ +static inline bool get_user (uint8_t *dst, const uint8_t *usrc) { int eax; asm ("movl $1f, %%eax; movb %2, %%al; movb %%al, %0; 1:" : "=m" (*dst), "=&a" (eax) : "m" (*usrc)); return eax != 0; -@} +} /* Tries write BYTE to user address UDST. Returns true if successful, false if UDST is invalid. */ -static inline bool put_user (uint8_t *udst, uint8_t byte) @{ +static inline bool put_user (uint8_t *udst, uint8_t byte) { int eax; asm ("movl $1f, %%eax; movb %b2, %0; 1:" : "=m" (*udst), "=&a" (eax) : "r" (byte)); return eax != 0; -@} -@end example +} +@end verbatim Each of these functions assumes that the user address has already been verified to be below @code{PHYS_BASE}. They also assume that you've -modified @code{page_fault()} so that a page fault in the kernel causes +modified @func{page_fault} so that a page fault in the kernel causes @code{eax} to be set to 0 and its former value copied into @code{eip}. @item @@ -529,13 +524,19 @@ Each character is 1 byte. @end itemize @item -@b{Why doesn't keyboard input work with @option{-nv}?} +@b{Why doesn't keyboard input work with @option{-v}?} -Serial input isn't implemented. Don't use @option{-nv} if you want to +Serial input isn't implemented. Don't use @option{-v} if you want to use the shell or otherwise type at the keyboard. @end enumerate -@item Argument Passing FAQs +@menu +* Problem 2-1 Argument Passing FAQ:: +* Problem 2-2 System Calls FAQ:: +@end menu + +@node Problem 2-1 Argument Passing FAQ +@subsection Problem 2-1: Argument Passing FAQ @enumerate 1 @item @@ -554,7 +555,7 @@ defend it in your @file{DESIGNDOC}. @b{How do I parse all these argument strings?} You're welcome to use any technique you please, as long as it works. -If you're lost, look at @code{strtok_r()}, prototyped in +If you're lost, look at @func{strtok_r}, prototyped in @file{lib/string.h} and implemented with thorough comments in @file{lib/string.c}. You can find more about it by looking at the man page (run @code{man strtok_r} at the prompt). @@ -578,17 +579,18 @@ any multiple of @t{0x10000000} from @t{0x80000000} to @t{0xc0000000}, simply via recompilation. @end enumerate -@item System Calls FAQs +@node Problem 2-2 System Calls FAQ +@subsection Problem 2-2: System Calls FAQ @enumerate 1 @item -@b{What should I do with the parameter passed to @code{exit()}?} +@b{What should I do with the parameter passed to @func{exit}?} This value, the exit status of the process, must be returned to the -thread's parent when @code{join()} is called. +thread's parent when @func{join} is called. @item -@b{Can I just cast a pointer to a @code{struct file} object to get a +@b{Can I just cast a pointer to a @struct{file} object to get a unique file descriptor? Can I just cast a @code{struct thread *} to a @code{pid_t}? It's so much simpler that way!} @@ -606,6 +608,7 @@ maximum. That said, if your design calls for it, you may impose a limit of 128 open files per process (as the Solaris machines here do). @item +@anchor{Removing an Open File} @b{What happens when two (or more) processes have a file open and one of them removes it?} @@ -639,7 +642,6 @@ You should print the complete thread name (as specified in the @code{SYS_exec} call) followed by the exit status code, e.g.@: @samp{example 1 2 3 4: 0}. @end enumerate -@end enumerate @node 80x86 Calling Convention @section 80@var{x}86 Calling Convention @@ -650,7 +652,7 @@ calling convention. Some of the basics should be familiar from CS have seen even more of it. I've omitted some of the complexity, since this isn't a class in how function calls work, so don't expect this to be exactly correct in full, gory detail. If you do want all the -details, you can refer to @cite{[SysV-i386]}. +details, you can refer to @bibref{SysV-i386}. Whenever a function call happens, you need to put the arguments on the call stack for that function, before the code for that function @@ -686,18 +688,18 @@ some of your caches. This is why inlining code can be much faster. @node Argument Passing to main @subsection Argument Passing to @code{main()} -In @code{main()}'s case, there is no caller to prepare the stack +In @func{main}'s case, there is no caller to prepare the stack before it runs. Therefore, the kernel needs to do it. Fortunately, since there's no caller, there are no registers to save, no return address to deal with, etc. The only difficult detail to take care of, -after loading the code, is putting the arguments to @code{main()} on +after loading the code, is putting the arguments to @func{main} on the stack. (The above is a small lie: most compilers will emit code where main isn't strictly speaking the first function. This isn't an important detail. If you want to look into it more, try disassembling a program and looking around a bit. However, you can just act as if -@code{main()} is the very first function called.) +@func{main} is the very first function called.) Pintos is written for the 80@var{x}86 architecture. Therefore, we need to adhere to the 80@var{x}86 calling convention. Basically, you @@ -708,7 +710,7 @@ have a caller, its stack frame must have the same layout as any other function's. The program will assume that the stack has been laid out this way when it begins running. -So, what are the arguments to @code{main()}? Just two: an @samp{int} +So, what are the arguments to @func{main}? Just two: an @samp{int} (@code{argc}) and a @samp{char **} (@code{argv}). @code{argv} is an array of strings, and @code{argc} is the number of strings in that array. However, the hard part isn't these two things. The hard part @@ -758,7 +760,7 @@ Then we push @code{argv} (that is, the address of the first element of the @code{argv} array) onto the stack, along with the length of the argument vector (@code{argc}, 4 in this example). This must also be done in this order, since @code{argc} is the first argument to -@code{main()} and therefore is on first (smaller address) on the +@func{main} and therefore is on first (smaller address) on the stack. Finally, we push a fake ``return address'' and leave the stack pointer to point to its location. @@ -783,7 +785,7 @@ user program (assuming for this example that the stack bottom is @item @t{0xbfffffe0} @tab @code{argv[2]} @tab @t{0xbffffff8} @item @t{0xbfffffdc} @tab @code{argv[1]} @tab @t{0xbffffff5} @item @t{0xbfffffd8} @tab @code{argv[0]} @tab @t{0xbfffffed} -@item @t{0xbfffffd4} @tab @code{argv} @tab @t{0xbffffffd8} +@item @t{0xbfffffd4} @tab @code{argv} @tab @t{0xbfffffd8} @item @t{0xbfffffd0} @tab @code{argc} @tab 4 @item @t{0xbfffffcc} @tab ``return address'' @tab 0 @end multitable @@ -798,17 +800,17 @@ As shown above, your code should start the stack at the very top of the user virtual address space, in the page just below virtual address @code{PHYS_BASE} (defined in @file{threads/mmu.h}). -You may find the non-standard @code{hex_dump()} function, declared in +You may find the non-standard @func{hex_dump} function, declared in @file{}, useful for debugging your argument passing code. Here's what it would show in the above example, given that @code{PHYS_BASE} is @t{0xc0000000}: -@example +@verbatim bfffffc0 00 00 00 00 | ....| bfffffd0 04 00 00 00 d8 ff ff bf-ed ff ff bf f5 ff ff bf |................| bfffffe0 f8 ff ff bf fc ff ff bf-00 00 00 00 00 2f 62 69 |............./bi| bffffff0 6e 2f 6c 73 00 2d 6c 00-2a 2e 68 00 2a 2e 63 00 |n/ls.-l.*.h.*.c.| -@end example +@end verbatim @node System Calls @section System Calls @@ -834,11 +836,11 @@ interrupt. The normal calling convention pushes function arguments on the stack from right to left and the stack grows downward. Thus, when the -system call handler @code{syscall_handler()} gets control, the system +system call handler @func{syscall_handler} gets control, the system call number is in the 32-bit word at the caller's stack pointer, the first argument is in the 32-bit word at the next higher address, and so on. The caller's stack pointer is accessible to -@code{syscall_handler()} as the @samp{esp} member of the @code{struct +@func{syscall_handler} as the @samp{esp} member of the @code{struct intr_frame} passed to it. Here's an example stack frame for calling a system call numbered 10 @@ -848,7 +850,7 @@ arbitrary: @html
@end html -@multitable {Address} {Value} +@multitable {@t{0xbffffe7c}} {Value} @item Address @tab Value @item @t{0xbffffe7c} @tab 3 @item @t{0xbffffe78} @tab 2 @@ -864,4 +866,4 @@ In this example, the caller's stack pointer would be at The 80@var{x}86 convention for function return values is to place them in the @samp{EAX} register. System calls that return a value can do -so by modifying the @samp{eax} member of @code{struct intr_frame}. +so by modifying the @samp{eax} member of @struct{intr_frame}.