X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=doc%2Ffilesys.texi;h=02808bb821c74902187a7f6b51886383860a5d71;hb=6070611faac84bdf95c4405b3970c6928202f26b;hp=cc00b402db7dd038e4f59fea49c2e0bb159a22b0;hpb=e08b0009571075c76b710f3051aac1b2f204ae5d;p=pintos-anon diff --git a/doc/filesys.texi b/doc/filesys.texi index cc00b40..02808bb 100644 --- a/doc/filesys.texi +++ b/doc/filesys.texi @@ -21,17 +21,15 @@ parts work together so that you can run VM and your filesystem at the same time. Plus, keeping VM is a great way to stress-test your filesystem implementation. -Your submission should define @code{THREAD_JOIN_IMPLEMENTED} in -@file{constants.h} (@pxref{Conditional Compilation}). - @menu * File System New Code:: -* Problem 4-1 Large Files:: +* File System Synchronization:: +* Problem 4-1 Indexed Files:: * Problem 4-2 File Growth:: * Problem 4-3 Subdirectories:: * Problem 4-4 Buffer Cache:: * File System Design Document Requirements:: -* File System FAQ:: +* File System FAQ:: @end menu @node File System New Code @@ -47,7 +45,9 @@ kernel command line. @item filesys.h @itemx filesys.c -Top-level interface to the file system. +Top-level interface to the file system. Please read the long comment +near the top of @file{filesys.c}, which introduces some details of the +file system code as provided. @item directory.h @itemx directory.c @@ -84,19 +84,59 @@ which you will remove. While most of your work will be in @file{filesys}, you should be prepared for interactions with all previous parts (as usual). -@node Problem 4-1 Large Files -@section Problem 4-1: Large Files +@node File System Synchronization +@section Synchronization + +The provided file system requires external synchronization, that is, +callers must ensure that only one thread can be running in the file +system code at once. Your submission should use a finer-grained +synchronization strategy. You will need to consider synchronization +issues for each type of file system object. The provided code uses the +following strategies: + +@itemize @bullet +@item +The free map and root directory are read each time they are needed for +an operation, and if they are modified, they are written back before the +operation completes. Thus, the free map is always consistent from an +external viewpoint. + +@item +Inodes are immutable in the provided file system, that is, their content +never changes between creation and deletion. Furthermore, only one copy +of an inode's data is maintained in memory at once, even if the file is +open in multiple contexts. + +@item +File data doesn't have to be consistent because it's just not part of +the model. In Unix and many other operating systems, a read of a file +by one process when the file is being written by another process can +show inconsistent results: it can show that none, all, or part of the +write has completed. (However, after the write system call returns to +its caller, all subsequent readers must see the change.) Similarly, +when two threads write to the same part of a file at the same time, +their data may be interleaved. External synchronization of the provided +file system ensures that reads and writes are fully serialized, but your +file system doesn't have to maintain full serialization as long as it +follows the rules above. +@end itemize + +@node Problem 4-1 Indexed Files +@section Problem 4-1: Indexed Files + +The basic file system allocates files as a single extent, making it +vulnerable to external fragmentation. Eliminate this problem by +modifying the inode structure. In practice, this probably means using +an index structure with direct, indirect, and doubly indirect blocks. +(You are welcome to choose a different scheme as long as you explain the +rationale for it in your design documentation, and as long as it does +not suffer from external fragmentation.) -Modify the file system to allow the maximum size of a file to be as -large as the disk. You can assume that the disk will not be larger -than 8 MB. In the basic file system, each file is limited to a file -size of just under 64 kB. Each file has a header called an index node -or @dfn{inode} (represented by @struct{inode}) that is a table of -direct pointers to the disk blocks for that file. Since the inode is -stored in one disk sector, the maximum size of a file is limited by -the number of pointers that will fit in one disk sector. Increasing -the limit to 8 MB will require you to implement doubly-indirect -blocks. +You can assume that the disk will not be larger than 8 MB. You must +support files as large as the disk (minus metadata). Each inode is +stored in one disk sector, limiting the number of block pointers that it +can contain. Supporting 8 MB files will require you to implement +doubly-indirect blocks. @node Problem 4-2 File Growth @section Problem 4-2: File Growth @@ -111,6 +151,19 @@ root directory file to expand beyond its current limit of ten files. Make sure that concurrent accesses to the inode remain properly synchronized. +The user is allowed to seek beyond the current end-of-file (EOF). The +seek itself does not extend the file. Writing at a position past EOF +extends the file to the position being written, and any gap between the +previous EOF and the start of the write must be filled with zeros. A +read past EOF returns zero bytes. + +Writing far beyond EOF can cause many blocks to be entirely zero. Some +file systems allocate and write real data blocks for these implicitly +zeroed blocks. Other file systems do not allocate these blocks at all +until they are explicitly written. The latter file systems are said to +support ``sparse files.'' You may adopt either allocation strategy in +your file system. + @node Problem 4-3 Subdirectories @section Problem 4-3: Subdirectories @@ -128,7 +181,7 @@ only once). Make sure that directories can expand beyond their original size just as any other file can. -Each program has its own current directory. When one process starts +Each process has its own current directory. When one process starts another with the @code{exec} system call, the child process inherits its parent's current directory. After that, the two processes' current directories are independent, so that either changing its own current @@ -213,7 +266,11 @@ Note that write-behind makes your filesystem more fragile in the face of crashes. Therefore, you should periodically write all cached blocks to disk. If you have @func{timer_sleep} from the first project working, this is an -excellent application for it. +excellent application for it. (If you're still using the base +implementation of @func{timer_sleep}, be aware that it busy-waits, which +is not an acceptable solution.) If @func{timer_sleep}'s delays seem too +short or too long, reread the explanation of the @option{-r} option to +@command{pintos} (@pxref{Debugging versus Testing}). Likewise, read-ahead is only really useful when done asynchronously. That is, if a process wants disk block 1 from the file, it needs to @@ -235,6 +292,9 @@ As always, submit a design document file summarizing your design. Be sure to cover the following points: @itemize @bullet +@item +How did you choose to synchronize file system operations? + @item How did you structure your inodes? How many blocks did you access directly, via single-indirection, and/or via double-indirection? Why? @@ -411,37 +471,36 @@ only be deleted if they are empty, as in Unix. @enumerate 1 @item @b{We're limited to a 64-block cache, but can we also keep an -@struct{inode_disk} inside @struct{file}, the way the provided code +@struct{inode_disk} inside @struct{inode}, the way the provided code does?} The goal of the 64-block limit is to bound the amount of cached file -system data. If you keep a block of disk data anywhere in kernel -memory, whether it's file data or metadata, then you have to count it -against the 64-block limit. The same rule applies to anything that's +system data. If you keep a block of disk data---whether file data or +metadata---anywhere in kernel memory then you have to count it against +the 64-block limit. The same rule applies to anything that's ``similar'' to a block of disk data, such as a @struct{inode_disk} without the @code{length} or @code{sector_cnt} members. -That means you'll have to change the way the file implementation -accesses its corresponding inode right now, since it currently just -creates a new @struct{inode} containing an @struct{inode_disk} in -@func{inode_open} and reads the corresponding sector in from disk when -it's created. - -There are two reasons for not storing inode data in @struct{inode}. -First, keeping extra copies of inodes would be cheating the 64-block -limitation that we place on your cache. Second, if two processes have -the same file open, you will create a huge synchronization headache for -yourself if each @struct{inode} has its own copy of the on-disk inode. - -You can store pointers to inode data in @struct{inode_disk}, if you -want, and you can store some other small amount of information to help -you find the inode when you need it. Similarly, if you want to store -one block of data plus some small amount of metadata for each of your 64 -cache entries, that's fine. - -If you look at @func{file_read_at}, it uses the inode directly without -having first read in that sector from wherever it was in the storage -hierarchy. This will no longer work. You will need to change -@code{file_read_at} (and similar functions) so that it reads the inode -from the storage hierarchy before using it. +You can keep a cached copy of the free map in memory permanently if you +like. It doesn't have to count against the cache size. + +That means you'll have to change the way the inode implementation +accesses its corresponding on-disk inode right now, since it currently +just embeds a @struct{inode_disk} in @struct{inode} and reads the +corresponding sector in from disk when it's created. Keeping extra +copies of inodes would be cheating the 64-block limitation that we place +on your cache. + +You can store pointers to inode data in @struct{inode}, if you want, and +you can store some other small amount of information to help you find +the inode when you need it. Similarly, if you want to store one block +of data plus some small amount of metadata for each of your 64 cache +entries, that's fine. + +If you look at @func{inode_byte_to_sector}, it uses the +@struct{inode_disk} directly without having first read in that sector +from wherever it was in the storage hierarchy. This will no longer +work. You will need to change @func{inode_byte_to_sector} so that it +reads the @struct{inode_disk} from the storage hierarchy before using +it. @end enumerate