X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=doc%2Ffilesys.texi;h=6ebc4d9707e1cb4da43a44058beb2d9fdd353974;hb=7a3dff52c8a44deeadd071ea93f19b9cee2a67fa;hp=df1b25c4d1d4a115d338e16371bb958e0b81add4;hpb=a1fd1c678b39fb719fb129852b05fe983682453a;p=pintos-anon diff --git a/doc/filesys.texi b/doc/filesys.texi index df1b25c..6ebc4d9 100644 --- a/doc/filesys.texi +++ b/doc/filesys.texi @@ -21,17 +21,15 @@ parts work together so that you can run VM and your filesystem at the same time. Plus, keeping VM is a great way to stress-test your filesystem implementation. -Your submission should define @code{THREAD_JOIN_IMPLEMENTED} in -@file{constants.h} (@pxref{Conditional Compilation}). - @menu * File System New Code:: -* Problem 4-1 Large Files:: +* File System Synchronization:: +* Problem 4-1 Indexed Files:: * Problem 4-2 File Growth:: * Problem 4-3 Subdirectories:: * Problem 4-4 Buffer Cache:: * File System Design Document Requirements:: -* File System FAQ:: +* File System FAQ:: @end menu @node File System New Code @@ -47,7 +45,9 @@ kernel command line. @item filesys.h @itemx filesys.c -Top-level interface to the file system. +Top-level interface to the file system. Please read the long comment +near the top of @file{filesys.c}, which introduces some details of the +file system code as provided. @item directory.h @itemx directory.c @@ -84,19 +84,59 @@ which you will remove. While most of your work will be in @file{filesys}, you should be prepared for interactions with all previous parts (as usual). -@node Problem 4-1 Large Files -@section Problem 4-1: Large Files +@node File System Synchronization +@section Synchronization + +The provided file system requires external synchronization, that is, +callers must ensure that only one thread can be running in the file +system code at once. Your submission should use a finer-grained +synchronization strategy. You will need to consider synchronization +issues for each type of file system object. The provided code uses the +following strategies: + +@itemize @bullet +@item +The free map and root directory are read each time they are needed for +an operation, and if they are modified, they are written back before the +operation completes. Thus, the free map is always consistent from an +external viewpoint. -Modify the file system to allow the maximum size of a file to be as -large as the disk. You can assume that the disk will not be larger -than 8 MB. In the basic file system, each file is limited to a file -size of just under 64 kB. Each file has a header called an index node -or @dfn{inode} (represented by @struct{inode}) that is a table of -direct pointers to the disk blocks for that file. Since the inode is -stored in one disk sector, the maximum size of a file is limited by -the number of pointers that will fit in one disk sector. Increasing -the limit to 8 MB will require you to implement doubly-indirect -blocks. +@item +Inodes are immutable in the provided file system, that is, their content +never changes between creation and deletion. Furthermore, only one copy +of an inode's data is maintained in memory at once, even if the file is +open in multiple contexts. + +@item +File data doesn't have to be consistent because it's just not part of +the model. In Unix and many other operating systems, a read of a file +by one process when the file is being written by another process can +show inconsistent results: it can show that none, all, or part of the +write has completed. (However, after the write system call returns to +its caller, all subsequent readers must see the change.) Similarly, +when two threads write to the same part of a file at the same time, +their data may be interleaved. External synchronization of the provided +file system ensures that reads and writes are fully serialized, but your +file system doesn't have to maintain full serialization as long as it +follows the rules above. +@end itemize + +@node Problem 4-1 Indexed Files +@section Problem 4-1: Indexed Files + +The basic file system allocates files as a single extent, making it +vulnerable to external fragmentation. Eliminate this problem by +modifying the inode structure. In practice, this probably means using +an index structure with direct, indirect, and doubly indirect blocks. +(You are welcome to choose a different scheme as long as you explain the +rationale for it in your design documentation, and as long as it does +not suffer from external fragmentation.) + +You can assume that the disk will not be larger than 8 MB. You must +support files as large as the disk (minus metadata). Each inode is +stored in one disk sector, limiting the number of block pointers that it +can contain. Supporting 8 MB files will require you to implement +doubly-indirect blocks. @node Problem 4-2 File Growth @section Problem 4-2: File Growth @@ -106,31 +146,65 @@ is specified when the file is created. One advantage of this is that the inode data structure, once created, never changes. In UNIX and most other file systems, a file is initially created with size 0 and is then expanded every time a write is made off the end of the file. -Modify the file system to allow this. As one test case, allow the -root directory file to expand beyond its current limit of ten files. +Modify the file system to allow this. Make sure that concurrent accesses to the inode remain properly synchronized. +There should be no predetermined limit on the size of a file, except +that a disk cannot exceed the size of the disk (minus metadata). This +also applies to the root directory file, which should now be allowed +to expand beyond its initial limit of ten files. + +The user is allowed to seek beyond the current end-of-file (EOF). The +seek itself does not extend the file. Writing at a position past EOF +extends the file to the position being written, and any gap between the +previous EOF and the start of the write must be filled with zeros. A +read starting from a position past EOF returns no bytes. + +Writing far beyond EOF can cause many blocks to be entirely zero. Some +file systems allocate and write real data blocks for these implicitly +zeroed blocks. Other file systems do not allocate these blocks at all +until they are explicitly written. The latter file systems are said to +support ``sparse files.'' You may adopt either allocation strategy in +your file system. + @node Problem 4-3 Subdirectories @section Problem 4-3: Subdirectories Implement a hierarchical name space. In the basic file system, all -files live in a single directory. Modify this to allow directories to -point to either files or other directories. To do this, you will need -to implement routines that parse path names into a sequence of -directories, as well as routines that change the current working -directory and that list the contents of the current directory. For -performance, allow concurrent updates to different directories, but -use mutual exclusion to ensure that updates to the same directory are -performed atomically (for example, to ensure that a file is deleted -only once). +files live in a single directory. Modify this to allow directory +entries to point to files or to other directories. You will need +routines to parse a path name into a sequence of directories, to +change the current working directory, and to list the contents of the +current directory. For performance, allow concurrent updates to +different directories, but use mutual exclusion to ensure that updates +to the same directory are performed atomically (for example, to ensure +that a file is deleted only once). Make sure that directories can expand beyond their original size just -as any other file can. +as any other file can. + +The basic file system has a 14-character limit on file names. You may +retain this limit for individual file name components, or may extend +it, at your option. In any case you must allow full path names to be +much longer than 14 characters. + +The current directory is maintained separately for each process. At +startup, the initial process has the root directory as its current +directory. When one process starts another with the @code{exec} +system call, the child process inherits its parent's current +directory. After that, the two processes' current directories are +independent, so that either changing its own current directory has no +effect on the other. Update the existing system calls so that, anywhere a file name is provided by the caller, an absolute or relative path name may used. -Also, implement the following new system calls: + +Update the @code{remove} system call so that it can delete empty +directories in addition to regular files. Directories can only be +deleted if they do not contain files or subdirectories. + +Implement the following new system calls: @table @code @item SYS_chdir @@ -143,18 +217,27 @@ successful, false on failure. @itemx bool mkdir (const char *dir) Attempts to create the directory named @var{dir}, which may be either relative or absolute. Returns true if successful, false on failure. +Fails if @var{dir} already exists or if any directory name in +@var{dir}, besides the last, does not already exist. That is, +@code{mkdir("/a/b/c")} succeeds only if @file{/a/b} already exists and +@file{/a/b/c} does not. @item SYS_lsdir @itemx void lsdir (void) Prints a list of files in the current directory to @code{stdout}, one -per line. +per line, in no particular order. @end table -Also write the @command{ls} and @command{mkdir} user programs. This -is straightforward once the above syscalls are implemented. In Unix, +We have provided @command{ls} and @command{mkdir} user programs, which +are straightforward once the above syscalls are implemented. In Unix, these are programs rather than built-in shell commands, but @command{cd} is a shell command. (Why?) +The @code{pintos} @option{put} and @option{get} commands should now +accept full path names, assuming that the directories used in the +paths have already been created. This should not require any extra +effort on your part. + @node Problem 4-4 Buffer Cache @section Problem 4-4: Buffer Cache @@ -207,7 +290,11 @@ Note that write-behind makes your filesystem more fragile in the face of crashes. Therefore, you should periodically write all cached blocks to disk. If you have @func{timer_sleep} from the first project working, this is an -excellent application for it. +excellent application for it. (If you're still using the base +implementation of @func{timer_sleep}, be aware that it busy-waits, which +is not an acceptable solution.) If @func{timer_sleep}'s delays seem too +short or too long, reread the explanation of the @option{-r} option to +@command{pintos} (@pxref{Debugging versus Testing}). Likewise, read-ahead is only really useful when done asynchronously. That is, if a process wants disk block 1 from the file, it needs to @@ -229,6 +316,9 @@ As always, submit a design document file summarizing your design. Be sure to cover the following points: @itemize @bullet +@item +How did you choose to synchronize file system operations? + @item How did you structure your inodes? How many blocks did you access directly, via single-indirection, and/or via double-indirection? Why? @@ -294,9 +384,9 @@ pintos -ci shell 12345 pintos -ex "shell" @end example -If you don't change the filesystem interface, then this should already -be implemented properly in @file{threads/init.c} and -@file{filesys/fsutil.c}. +If you don't change the filesystem interface, none of this should +require any special effort on your part. They are already implemented +properly in @file{threads/init.c} and @file{filesys/fsutil.c}. You must also implement the @option{-q} option and make sure that data gets flushed out to disk properly when it is used. @@ -354,17 +444,6 @@ easier to change the current working directory of the shell process if user process. In fact, Unix-like systems don't provide any way for one process to change another process's current working directory. -@item -@b{When the spec states that directories should be able to grow beyond -ten files, does this mean that there can still be a set maximum number -of files per directory that is greater than ten, or should directories -now support unlimited growth (bounded by the maximum supported file -size)?} - -We're looking for directories that can support arbitrarily large -numbers of files. Now that directories can grow, we want you to -remove the concept of a preset maximum file limit. - @item @b{When should the @code{lsdir} system call return?} @@ -382,7 +461,7 @@ This code should create the following output: @example Start of directory -... directory contents ... +@dots{}directory contents@dots{} End of directory @end example @@ -390,13 +469,6 @@ End of directory @b{Do we have to implement both absolute and relative pathnames?} Yes. Implementing @file{.} and @file{..} is extra credit, though. - -@item -@b{Should @func{remove} also be able to remove directories?} - -Yes. The @code{remove} system call should handle removal of both -regular files and directories. You may assume that directories can -only be deleted if they are empty, as in Unix. @end enumerate @node Problem 4-4 Buffer Cache FAQ @@ -404,42 +476,37 @@ only be deleted if they are empty, as in Unix. @enumerate 1 @item -@b{We're limited to a 64-block cache, but can we also keep a copy of -each @struct{inode} for an open file inside @struct{file}, -the way the stub code does?} - -No, you shouldn't keep any disk sectors stored anywhere outside the -cache. That means you'll have to change the way the file -implementation accesses its corresponding inode right now, since it -currently just creates a new @struct{inode} in its constructor -and reads the corresponding sector in from disk when it's created. - -There are two reasons for not storing inodes in @struct{file}. -First, keeping extra copies of inodes would be cheating the 64-block -limitation that we place on your cache. Second, if two processes have -the same file open, you will create a huge synchronization headache -for yourself if each @struct{file} has its own copy of the inode. - -Note that you can store pointers to inodes in @struct{file} if -you want, and you can store some other small amount of information to -help you find the inode when you need it. - -Similarly, if you want to store one block of data plus some small -amount of metadata for each of your 64 cache entries, that's fine. - -@item -@b{But why can't we store copies of inodes in @struct{file}? We -don't understand the answer to the previous question.} - -The issue regarding storing @struct{inode}s has to do with -implementation of the buffer cache. Basically, you can't store a -@code{struct inode *} in @struct{inode}. Each time you need -to read a @struct{inode}, you'll have to get it either from the -buffer cache or from disk. - -If you look at @func{file_read_at}, it uses the inode directly -without having first read in that sector from wherever it was in the -storage hierarchy. You are no longer allowed to do this. You will -need to change @code{file_read_at} (and similar functions) so that it -reads the inode from the storage hierarchy before using it. +@b{We're limited to a 64-block cache, but can we also keep an +@struct{inode_disk} inside @struct{inode}, the way the provided code +does?} + +The goal of the 64-block limit is to bound the amount of cached file +system data. If you keep a block of disk data---whether file data or +metadata---anywhere in kernel memory then you have to count it against +the 64-block limit. The same rule applies to anything that's +``similar'' to a block of disk data, such as a @struct{inode_disk} +without the @code{length} or @code{sector_cnt} members. + +You can keep a cached copy of the free map in memory permanently if you +like. It doesn't have to count against the cache size. + +That means you'll have to change the way the inode implementation +accesses its corresponding on-disk inode right now, since it currently +just embeds a @struct{inode_disk} in @struct{inode} and reads the +corresponding sector in from disk when it's created. Keeping extra +copies of inodes would be cheating the 64-block limitation that we place +on your cache. + +You can store pointers to inode data in @struct{inode}, if you want, and +you can store some other small amount of information to help you find +the inode when you need it. Similarly, if you want to store one block +of data plus some small amount of metadata for each of your 64 cache +entries, that's fine. + +If you look at @func{inode_byte_to_sector}, it uses the +@struct{inode_disk} directly without having first read in that sector +from wherever it was in the storage hierarchy. This will no longer +work. You will need to change @func{inode_byte_to_sector} so that it +reads the @struct{inode_disk} from the storage hierarchy before using +it. @end enumerate