X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=doc%2Fuserprog.texi;h=8ac03dacfdda7ddd3831437460b833d925319465;hb=f7a42afa278e736fe835f328570749d38a0bcbd1;hp=64d429fd1b38dd3d89596dff86dc1f406a6066b1;hpb=98c2fc1ab7d395bb92cf4a57233fe432539d26a9;p=pintos-anon diff --git a/doc/userprog.texi b/doc/userprog.texi index 64d429f..8ac03da 100644 --- a/doc/userprog.texi +++ b/doc/userprog.texi @@ -32,12 +32,12 @@ this illusion. Before we delve into the details of the new code that you'll be working with, you should probably undo the test cases from project 1. -All you need to do is make sure the original -@file{threads/pintostest.c} is in place. This will stop the tests -from being run. +All you need to do is make sure the original @file{threads/test.c} is +in place. This will stop the tests from being run. @menu -* Project 2 Code to Hack:: +* Project 2 Code:: +* Using the File System:: * How User Programs Work:: * Global Requirements:: * Problem 2-1 Argument Passing:: @@ -47,8 +47,8 @@ from being run. * System Calls:: @end menu -@node Project 2 Code to Hack -@section Code to Hack +@node Project 2 Code +@section Code The easiest way to get an overview of the programming you will be doing is to simply go over each part you'll be working with. In @@ -66,6 +66,13 @@ it knows which memory the process is using). Address spaces also handle loading the program into memory and starting up the process's execution. +@item pagedir.c +@itemx pagedir.h +A simple manager for 80@var{x} page directories and page tables. +Although you probably won't want to modify this code for this project, +you may want to call some of its functions. In particular, +@code{pagedir_get_page()} may be helpful for accessing user memory. + @item syscall.c @itemx syscall.h Whenever a user process wants to access some kernel functionality, it @@ -82,8 +89,8 @@ will treat these terms as synonymous. There is no standard distinction between them, although the Intel processor manuals define them slightly differently on 80@var{x}86.} These files handle exceptions. Currently all exceptions simply print a message and -terminate the process. @strong{You should not need to modify this -file for project 2.} +terminate the process. Some, but not all, solutions to project 2 +require modifying @code{page_fault()} in this file. @item gdt.c @itemx gdt.c @@ -103,17 +110,6 @@ However, you can read the code if you're interested in how the GDT works. @end table -Elsewhere in the kernel, you will need to use some file system code. -You will not actually write a file system until the end of the -quarter, but since user programs need files to do anything -interesting, we have provided a simple file system in the -@file{filesys} directory. You will want to look over the -@file{filesys.h} and @file{file.h} interfaces to understand how to use -the file system. However, @strong{you should not modify the file -system code for this project}. Proper use of the file system routines -now will make life much easier for project 4, when you improve the -file system implementation. - Finally, in @file{lib/kernel}, you might want to use @file{bitmap.[ch]}. A bitmap is basically an array of bits, each of which can be true or false. Bitmaps are typically used to keep track @@ -121,6 +117,50 @@ of the usage of a large array of (identical) resources: if resource @var{n} is in use, then bit @var{n} of the bitmap is true. You might find it useful for tracking memory pages, for example. +@node Using the File System +@section Using the File System + +You will need to use some file system code for this project. First, +user programs are loaded from the file system. Second, many of the +system calls you must implement deal with the file system. However, +the focus of this project is not on the file system code, so we have +provided a simple file system in the @file{filesys} directory. You +will want to look over the @file{filesys.h} and @file{file.h} +interfaces to understand how to use the file system, and especially +its many limitations. @strong{You should not modify the file system +code for this project}. Proper use of the file system routines now +will make life much easier for project 4, when you improve the file +system implementation. + +You need to be able to create and format simulated disks. The +@command{pintos} program provides this functionality with its +@option{make-disk} command. From the @file{filesys/build} directory, +execute @code{pintos make-disk fs.dsk 2}. This command creates a 2 MB +simulated disk named @file{fs.dsk}. (It does not actually start +Pintos.) Then format the disk by passing the @option{-f} option to +Pintos on the kernel's command line: @code{pintos run -f}. + +You'll need a way to get files in and out of the simulated file +system. The @code{pintos} @option{put} and @option{get} commands are +designed for this. To copy @file{@var{file}} into the Pintos file +system, use the command @file{pintos put @var{file}}. To copy it to +the Pintos file system under the name @file{@var{newname}}, add the +new name to the end of the command: @file{pintos put @var{file} +@var{newname}}. The commands for copying files out of a VM are +similar, but substitute @option{get} for @option{get}. + +Incidentally, these commands work by passing special options +@option{-ci} and @option{-co} on the kernel's command line and copying +to and from a special simulated disk named @file{scratch.dsk}. If +you're very curious, you can look at the @command{pintos} program as +well as @file{filesys/fsutil.c} to learn the implementation details, +but it's really not relevant for this project. + +You can delete a file from the Pintos file system using the @option{-r +@var{file}} kernel option, e.g.@: @code{pintos run -r @var{file}}. +Also, @option{-ls} lists the files in the file system and @option{-p +@var{file}} prints a file's contents to the display. + @node How User Programs Work @section How User Programs Work @@ -138,26 +178,70 @@ Pintos loads ELF executables, where ELF is an executable format used by Linux, Solaris, and many other Unix and Unix-like systems. Therefore, you can use any compiler and linker that produce 80@var{x}86 ELF executables to produce programs for Pintos. We -recommend using the tools we provide in the @file{test} directory. By +recommend using the tools we provide in the @file{tests} directory. By default, the @file{Makefile} in this directory will compile the test programs we provide. You can edit the @file{Makefile} to compile your own test programs as well. +One thing you should realize immediately is that, until you use the +above operation to copy a test program to the emulated disk, Pintos +will be unable to do very much useful work. You will also find that +you won't be able to do interesting things until you copy a variety of +programs to the disk. A useful technique is to create a clean +reference disk and copy that over whenever you trash your +@file{fs.dsk} beyond a useful state, which may happen occasionally +while debugging. + +@node Virtual Memory Layout +@section Virtual Memory Layout + +Virtual memory in Pintos is divided into two regions: user virtual +memory and kernel virtual memory. User virtual memory ranges from +virtual address 0 up to @code{PHYS_BASE}, which is defined in +@file{threads/mmu.h} and defaults to @t{0xc0000000} (3 GB). Kernel +virtual memory occupies the rest of the virtual address space, from +@code{PHYS_BASE} up to 4 GB. + +User virtual memory is per-process. Conceptually, each process is +free to use the entire space of user virtual memory however it +chooses. When the kernel switches from one process to another, it +also switches user virtual address spaces by switching the processor's +page directory base register (see @code{pagedir_activate() in +@file{userprog/pagedir.c}}. + +Kernel virtual memory is global. It is always mapped the same way, +regardless of what user process or kernel thread is running. In +Pintos, kernel virtual memory is mapped one-to-one to physical +memory. That is, virtual address @code{PHYS_ADDR} accesses physical +address 0, virtual address @code{PHYS_ADDR} + @t{0x1234} access +physical address @t{0x1234}, and so on up to the size of the machine's +physical memory. + +User programs can only access user virtual memory. An attempt to +access kernel virtual memory will cause a page fault, handled by +@code{page_fault()} in @file{userprog/exception.c}, and the process +will be terminated. Kernel threads can access both kernel virtual +memory and, if a user process is running, the user virtual memory of +the running process. However, an attempt to access memory at a user +virtual address that doesn't have a page mapped into it will also +cause a page fault. + @node Global Requirements @section Global Requirements For testing and grading purposes, we have some simple requirements for your output. The kernel should print out the program's name and exit -status whenever a process exits. Aside from this, it should print out -no other messages. You may understand all those debug messages, but -we won't, and it just clutters our ability to see the stuff we care -about. Additionally, while it may be useful to hard-code which -process will run at startup while debugging, before you submit your -code you must make sure that it takes the start-up process name and -arguments from the @samp{-ex} argument. The infrastructure for this -is already there---you just need to make sure you enable it! For -example, running @code{pintos -ex "testprogram 1 2 3 4"} will spawn -@samp{testprogram 1 2 3 4} as the first process. +status whenever a process exits, e.g.@: @code{shell: exit(-1)}. Aside +from this, it should print out no other messages. You may understand +all those debug messages, but we won't, and it just clutters our +ability to see the stuff we care about. + +Additionally, while it may be useful to hard-code which process will +run at startup while debugging, before you submit your code you must +make sure that it takes the start-up process name and arguments from +the @samp{-ex} argument. For example, running @code{pintos run -ex +"testprogram 1 2 3 4"} will spawn @samp{testprogram 1 2 3 4} as the +first process. @node Problem 2-1 Argument Passing @section Problem 2-1: Argument Passing @@ -187,10 +271,9 @@ it ``handles'' system calls by terminating the process. You will need to decipher system call arguments and take the appropriate action for each. -In addition, implement system calls and system call handling. You are -required to support the following system calls, whose syscall numbers -are defined in @file{lib/syscall-nr.h} and whose C functions called by -user programs are prototyped in @file{lib/user/syscall.h}: +You are required to support the following system calls, whose syscall +numbers are defined in @file{lib/syscall-nr.h} and whose C functions +called by user programs are prototyped in @file{lib/user/syscall.h}: @table @code @item SYS_halt @@ -216,9 +299,9 @@ which otherwise should not be a valid id number. Joins the process @var{pid}, using the join rules from the last assignment, and returns the process's exit status. If the process was terminated by the kernel (i.e.@: killed due to an exception), the exit -status should be -1. If the process was not a child process, the -return value is undefined (but kernel operation must not be -disrupted). +status should be -1. If the process was not a child of the calling +process, the return value is undefined (but kernel operation must not +be disrupted). @item SYS_create @itemx bool create (const char *@var{file}) @@ -253,6 +336,17 @@ Write @var{size} bytes from @var{buffer} to the open file @var{fd}. Returns the number of bytes actually written, or -1 if the file could not be written. +@item SYS_seek +@itemx void seek (int @var{fd}, unsigned @var{position}) +Changes the next byte to be read or written in open file @var{fd} to +@var{position}, expressed in bytes from the beginning of the file. +(Thus, a @var{position} of 0 is the file's start.) + +@item SYS_tell +@itemx unsigned tell (int @var{fd}) +Returns the position of the next byte to be read or written in open +file @var{fd}, expressed in bytes from the beginning of the file. + @item SYS_close @itemx void close (int @var{fd}) Close file descriptor @var{fd}. @@ -279,9 +373,8 @@ is not safe to call into the filesystem code provided in the @file{filesys} directory from multiple threads at once. For now, we recommend adding a single lock that controls access to the filesystem code. You should acquire this lock before calling any functions in -the @file{filesys} directory, and release it afterward. Because it -calls into @file{filesys} functions, you will have to modify -@file{addrspace_load()} in the same way. @strong{For now, we +the @file{filesys} directory, and release it afterward. Don't forget +that @file{addrspace_load()} also accesses files. @strong{For now, we recommend against modifying code in the @file{filesys} directory.} We have provided you a function for each system call in @@ -314,11 +407,11 @@ the original code provided for project 1. @item @b{Is there a way I can disassemble user programs?} -@c FIXME -The @command{objdump} utility can disassemble entire user programs or -object files. Invoke it as @code{objdump -d @var{file}}. You can -also use @code{gdb}'s @command{disassemble} command to disassemble -individual functions in object files compiled with debug information. +The @command{i386-elf-objdump} utility can disassemble entire user +programs or object files. Invoke it as @code{i386-elf-objdump -d +@var{file}}. You can also use @code{i386-elf-gdb}'s +@command{disassemble} command to disassemble individual functions in +object files compiled with debug information. @item @b{Why can't I use many C include files in my Pintos programs?} @@ -337,35 +430,44 @@ free. You need to modify @file{tests/Makefile}. @item -@b{Help, Solaris only allows 128 open files at once!} +@b{What's the difference between @code{tid_t} and @code{pid_t}?} -Solaris limits the number of file descriptors a process may keep open -at any given time. The default limit is 128 open file descriptors. +A @code{tid_t} identifies a kernel thread, which may have a user +process running in it (if created with @code{thread_execute()}) or not +(if created with @code{thread_create()}). It is a data type used only +in the kernel. -To see the current limit for all new processes type @samp{limit} at -the shell prompt and look at the line titled ``descriptors''. To -increase this limit to the maximum allowed type @code{ulimit -descriptors} in a @command{csh} derived shell or @code{unlimit -descriptors} in a @command{sh} derived shell. This will increase the -number of open file descriptors your Pintos process can use, but it -will still be limited. +A @code{pid_t} identifies a user process. It is used by user +processes and the kernel in the @code{exec} and @code{join} system +calls. -Refer to the @command{limit(1)} man page for more information. +You can choose whatever suitable types you like for @code{tid_t} and +@code{pid_t}. By default, they're both @code{int}. You can make them +a one-to-one mapping, so that the same values in both identify the +same process, or you can use a more complex mapping. It's up to you. @item -@b{I can't seem to figure out how to read from and write to +@b{I can't seem to figure out how to read from and write to user memory. What should I do?} -Here are some pointers: +The kernel must treat user memory delicately. The user can pass a +null pointer or an invalid pointer (one that doesn't point to any +memory at all), or a kernel pointer (above @code{PHYS_BASE}). All of +these must be rejected without harm to the kernel or other running +processes. -FIXME +There are at least two reasonable ways to access user memory. First, +you can translate user addresses (below @code{PHYS_BASE}) into kernel +addresses (above @code{PHYS_BASE}) using the functions in +@file{pagedir.c}, and then access kernel memory. Second, you can +dereference user pointers directly and handle page faults by +terminating the process. In either case, you'll need to reject kernel +pointers as a special case. @item @b{I'm also confused about reading from and writing to the stack. Can you help?} -FIXME: relevant? - @itemize @bullet @item Only non-@samp{char} values will have issues when writing them to @@ -394,10 +496,17 @@ Each character is 1 byte. You should assume that command line arguments are delimited by white space. +@item +@b{What is the maximum length of the command line arguments?} + +You can impose some reasonable maximum as long as you're prepared to +defend it in your @file{DESIGNDOC}. + @item @b{How do I parse all these argument strings?} -We recommend you look at @code{strtok_r()}, prototyped in +You're welcome to use any technique you please, as long as it works. +If you're lost, look at @code{strtok_r()}, prototyped in @file{lib/string.h} and implemented with thorough comments in @file{lib/string.c}. You can find more about it by looking at the man page (run @code{man strtok_r} at the prompt). @@ -412,6 +521,13 @@ will be at address @t{0xbffffffc}. Also, the stack should always be aligned to a 4-byte boundary, but @t{0xbfffffff} isn't. + +@item +@b{Is @code{PHYS_BASE} fixed?} + +No. You should be able to support @code{PHYS_BASE} values that are +any multiple of @t{0x10000000} from @t{0x80000000} to @t{0xc0000000}, +simply via recompilation. @end enumerate @item System Calls FAQs @@ -457,8 +573,9 @@ or the machine shuts down. @b{What happens if a system call is passed an invalid argument, such as Open being called with an invalid filename?} -Pintos should not crash. You should have your system calls check for -invalid arguments and return error codes. +Pintos should not crash. Acceptable options include returning an +error value (for those calls that return a value), returning an +undefined value, or terminating the process. @item @b{I've discovered that some of my user programs need more than one 4 @@ -535,16 +652,19 @@ and looking around a bit. However, you can just act as if @code{main()} is the very first function called.) Pintos is written for the 80@var{x}86 architecture. Therefore, we -need to adhere to the 80@var{x}86 calling convention, which is -detailed in the FAQ. Basically, you put all the arguments on the -stack and move the stack pointer appropriately. The program will -assume that this has been done when it begins running. +need to adhere to the 80@var{x}86 calling convention. Basically, you +put all the arguments on the stack and move the stack pointer +appropriately. You also need to insert space for the function's +``return address'': even though the initial function doesn't really +have a caller, its stack frame must have the same layout as any other +function's. The program will assume that the stack has been laid out +this way when it begins running. So, what are the arguments to @code{main()}? Just two: an @samp{int} (@code{argc}) and a @samp{char **} (@code{argv}). @code{argv} is an array of strings, and @code{argc} is the number of strings in that -array. However, the hard part isn't these two things. The hard part is -getting all the individual strings in the right place. As we go +array. However, the hard part isn't these two things. The hard part +is getting all the individual strings in the right place. As we go through the procedure, let us consider the following example command: @samp{/bin/ls -l *.h *.c}. @@ -572,59 +692,75 @@ there. However, since the stack pointer should always be word-aligned, we instead leave the stack pointer at @t{0xffe8}. Once we align the stack pointer, we then push the elements of the -argument vector (that is, the addresses of the strings @samp{/bin/ls}, -@samp{-l}, @samp{*.h}, and @samp{*.c}) onto the stack. This must be -done in reverse order, such that @code{argv[0]} is at the lowest -virtual address (again, because the stack is growing downward). This -is because we are now writing the actual array of strings; if we write -them in the wrong order, then the strings will be in the wrong order -in the array. This is also why, strictly speaking, it doesn't matter -what order the strings themselves are placed on the stack: as long as -the pointers are in the right order, the strings themselves can really -be anywhere. After we finish, we note the stack address of the first -element of the argument vector, which is @code{argv} itself. - -Finally, we push @code{argv} (that is, the address of the first -element of the @code{argv} array) onto the stack, along with the -length of the argument vector (@code{argc}, 4 in this example). This -must also be done in this order, since @code{argc} is the first -argument to main and therefore is on first (smaller address) on the -stack. We leave the stack pointer to point to the location where -@code{argc} is, because it is at the top of the stack, the location -directly below @code{argc}. - -All of which may sound very confusing, so here's a picture which will +argument vector, that is, a null pointer, then the addresses of the +strings @samp{/bin/ls}, @samp{-l}, @samp{*.h}, and @samp{*.c}) onto +the stack. This must be done in reverse order, such that +@code{argv[0]} is at the lowest virtual address, again because the +stack is growing downward. (The null pointer pushed first is because +@code{argv[argc]} must be a null pointer.) This is because we are now +writing the actual array of strings; if we write them in the wrong +order, then the strings will be in the wrong order in the array. This +is also why, strictly speaking, it doesn't matter what order the +strings themselves are placed on the stack: as long as the pointers +are in the right order, the strings themselves can really be anywhere. +After we finish, we note the stack address of the first element of the +argument vector, which is @code{argv} itself. + +Then we push @code{argv} (that is, the address of the first element of +the @code{argv} array) onto the stack, along with the length of the +argument vector (@code{argc}, 4 in this example). This must also be +done in this order, since @code{argc} is the first argument to +@code{main()} and therefore is on first (smaller address) on the +stack. Finally, we push a fake ``return address'' and leave the stack +pointer to point to its location. + +All this may sound very confusing, so here's a picture which will hopefully clarify what's going on. This represents the state of the stack and the relevant registers right before the beginning of the -user program (assuming for this example a 16-bit virtual address space -with addresses from @t{0x0000} to @t{0xffff}): +user program (assuming for this example that the stack bottom is +@t{0xc0000000}): @html
@end html -@multitable {@t{0xffff}} {word-align} {@t{/bin/ls\0}} +@multitable {@t{0xbfffffff}} {``return address''} {@t{/bin/ls\0}} @item Address @tab Name @tab Data -@item @t{0xfffc} @tab @code{*argv[3]} @tab @samp{*.c\0} -@item @t{0xfff8} @tab @code{*argv[2]} @tab @samp{*.h\0} -@item @t{0xfff5} @tab @code{*argv[1]} @tab @samp{-l\0} -@item @t{0xffed} @tab @code{*argv[0]} @tab @samp{/bin/ls\0} -@item @t{0xffec} @tab word-align @tab @samp{\0} -@item @t{0xffe8} @tab @code{argv[3]} @tab @t{0xfffc} -@item @t{0xffe4} @tab @code{argv[2]} @tab @t{0xfff8} -@item @t{0xffe0} @tab @code{argv[1]} @tab @t{0xfff5} -@item @t{0xffdc} @tab @code{argv[0]} @tab @t{0xffed} -@item @t{0xffd8} @tab @code{argv} @tab @t{0xffdc} -@item @t{0xffd4} @tab @code{argc} @tab 4 +@item @t{0xbffffffc} @tab @code{*argv[3]} @tab @samp{*.c\0} +@item @t{0xbffffff8} @tab @code{*argv[2]} @tab @samp{*.h\0} +@item @t{0xbffffff5} @tab @code{*argv[1]} @tab @samp{-l\0} +@item @t{0xbfffffed} @tab @code{*argv[0]} @tab @samp{/bin/ls\0} +@item @t{0xbfffffec} @tab word-align @tab @samp{\0} +@item @t{0xbfffffe8} @tab @code{argv[4]} @tab @t{0} +@item @t{0xbfffffe4} @tab @code{argv[3]} @tab @t{0xbffffffc} +@item @t{0xbfffffe0} @tab @code{argv[2]} @tab @t{0xbffffff8} +@item @t{0xbfffffdc} @tab @code{argv[1]} @tab @t{0xbffffff5} +@item @t{0xbfffffd8} @tab @code{argv[0]} @tab @t{0xbfffffed} +@item @t{0xbfffffd4} @tab @code{argv} @tab @t{0xbffffffd8} +@item @t{0xbfffffd0} @tab @code{argc} @tab 4 +@item @t{0xbfffffcc} @tab ``return address'' @tab 0 @end multitable @html
@end html -In this example, the stack pointer would be initialized to @t{0xffd4}. +In this example, the stack pointer would be initialized to +@t{0xbfffffcc}. + +As shown above, your code should start the stack at the very top of +the user virtual address space, in the page just below virtual address +@code{PHYS_BASE} (defined in @file{threads/mmu.h}). -Your code should start the stack at the very top of the user virtual -address space, in the page just below virtual address @code{PHYS_BASE} -(defined in @file{threads/mmu.h}). +You may find the non-standard @code{hex_dump()} function, declared in +@file{}, useful for debugging your argument passing code. +Here's what it would show in the above example, given that +@code{PHYS_BASE} is @t{0xc0000000}: + +@example +bfffffc0 00 00 00 00 | ....| +bfffffd0 04 00 00 00 d8 ff ff bf-ed ff ff bf f5 ff ff bf |................| +bfffffe0 f8 ff ff bf fc ff ff bf-00 00 00 00 00 2f 62 69 |............./bi| +bffffff0 6e 2f 6c 73 00 2d 6c 00-2a 2e 68 00 2a 2e 63 00 |n/ls.-l.*.h.*.c.| +@end example @node System Calls @section System Calls @@ -640,19 +776,10 @@ errors such as a page fault or division by zero. However, exceptions are also the means by which a user program can request services (``system calls'') from the operating system. -Some exceptions are ``restartable'': the condition that caused the -exception can be fixed and the instruction retried. For example, page -faults call the operating system, but the user code should re-start on -the load or store that caused the exception (not the next one) so that -the memory access actually occurs. On the 80@var{x}86, restartable -exceptions are called ``faults,'' whereas most non-restartable -exceptions are classed as ``traps.'' Other architectures may define -these terms differently. - In the 80@var{x}86 architecture, the @samp{int} instruction is the most commonly used means for invoking system calls. This instruction -is handled in the same way that other software exceptions. In Pintos, -user program invoke @samp{int $0x30} to make a system call. The +is handled in the same way as other software exceptions. In Pintos, +user programs invoke @samp{int $0x30} to make a system call. The system call number and any additional arguments are expected to be pushed on the stack in the normal fashion before invoking the interrupt. @@ -675,16 +802,17 @@ arbitrary: @end html @multitable {Address} {Value} @item Address @tab Value -@item @t{0xfe7c} @tab 3 -@item @t{0xfe78} @tab 2 -@item @t{0xfe74} @tab 1 -@item @t{0xfe70} @tab 10 +@item @t{0xbffffe7c} @tab 3 +@item @t{0xbffffe78} @tab 2 +@item @t{0xbffffe74} @tab 1 +@item @t{0xbffffe70} @tab 10 @end multitable @html @end html -In this example, the caller's stack pointer would be at @t{0xfe70}. +In this example, the caller's stack pointer would be at +@t{0xbffffe70}. The 80@var{x}86 convention for function return values is to place them in the @samp{EAX} register. System calls that return a value can do