Comment.

[pintos-anon] / doc / filesys.texi
diff --git a/doc/filesys.texi b/doc/filesys.texi

index 8b7bdb46f1d753032bd2892bcd19e35b2b2f0d3f..3624aa28141f016abbfbada74399c0461967c137 100644 (file)
--- a/doc/filesys.texi
+++ b/doc/filesys.texi
@@ -47,7 +47,9 @@ kernel command line.
  
  @item filesys.h
  @itemx filesys.c
-Top-level interface to the file system.
+Top-level interface to the file system.  Please read the long comment
+near the top of @file{filesys.c}, which introduces some details of the
+file system code as provided.
  
  @item directory.h
  @itemx directory.c
@@ -84,6 +86,44 @@ which you will remove.
  While most of your work will be in @file{filesys}, you should be
  prepared for interactions with all previous parts (as usual).
  
+@node File System Synchronization
+@section Synchronization
+
+The file system as provided requires external synchronization, that is,
+callers must ensure that only one thread can be running in the file
+system code at once.  Your submission should use a more finely granular
+synchronization strategy.  You will need to consider synchronization for
+each type of file system object.  The provided code uses the following
+strategies:
+
+@itemize @bullet
+@item
+The free map and root directory are read each time they are needed for
+an operation, and if they are modified, they are written back before the
+operation completes.  Thus, the free map is always consistent from an
+external viewpoint.
+
+@item
+Inodes are immutable in the provided file system, that is, their content
+never changes between creation and deletion, and furthermore only one
+copy of an inode's data is maintained in memory at once, even if the
+file is open in multiple contexts.
+
+@item
+File data doesn't have to be consistent because it's just not part of
+the model.  In Unix and many other operating systems, a read of a file
+by one process when the file is being written by another process can
+show inconsistent results: it can show that none, all, or part of the
+write has completed.  (However, after the write system call returns to
+its caller, all subsequent readers must see the change.)  Similarly,
+when two threads write to the same part of a file at the same time,
+their data may be arbitrarily interleaved.
+
+External synchronization of the provided file system ensures that reads
+and writes are fully serialized, but your file system doesn't have to
+maintain full serialization as long as it follows the rules above.
+@end itemize
+
  @node Problem 4-1 Large Files
  @section Problem 4-1: Large Files
  
@@ -111,6 +151,19 @@ root directory file to expand beyond its current limit of ten files.
  Make sure that concurrent accesses to the inode remain properly
  synchronized.
  
+The user is allowed to seek beyond the current end-of-file (EOF).  The
+seek itself does not extend the file.  Writing at a position past EOF
+extends the file to the position being written, and any gap between the
+previous EOF and the start of the write must be filled with zeros.  A
+read past EOF returns zero bytes.
+
+Writing far beyond EOF can cause many blocks to be entirely zero.  Some
+file systems allocate and write real data blocks for these implicitly
+zeroed blocks.  Other file systems do not allocate these blocks at all
+until they are explicitly written.  The latter file systems are said to
+support ``sparse files.''  You may adopt either allocation strategy in
+your file system.
+
  @node Problem 4-3 Subdirectories
  @section Problem 4-3: Subdirectories
  
@@ -128,8 +181,15 @@ only once).
  Make sure that directories can expand beyond their original size just
  as any other file can.
  
-To take advantage of hierarchical name spaces in user programs,
-provide the following syscalls:
+Each process has its own current directory.  When one process starts
+another with the @code{exec} system call, the child process inherits its
+parent's current directory.  After that, the two processes' current
+directories are independent, so that either changing its own current
+directory has no effect on the other.
+
+Update the existing system calls so that, anywhere a file name is
+provided by the caller, an absolute or relative path name may used.
+Also, implement the following new system calls:
  
  @table @code
  @item SYS_chdir
@@ -169,6 +229,18 @@ measured by the number of disk accesses.  (For example, metadata is
  generally more valuable to cache than data.)  Document your
  replacement algorithm in your design document.
  
+The provided file system code uses a ``bounce buffer'' in @struct{file}
+to translate the disk's sector-by-sector interface into the system call
+interface's byte-by-byte interface.  It needs per-file buffers because,
+without them, there's no other good place to put sector
+data.@footnote{The stack is not a good place because large objects
+should not be allocated on the stack.  A 512-byte sector is pushing the
+limit there.}  As part of implementing the buffer cache, you should get
+rid of these bounce buffers.  Instead, copy data into and out of sectors
+in the buffer cache directly.  You will probably need some
+synchronization to prevent sectors from being evicted from the cache
+while you are using them.
+
  In addition to the basic file caching scheme, your implementation
  should also include the following enhancements:
  
@@ -216,6 +288,9 @@ As always, submit a design document file summarizing your design.  Be
  sure to cover the following points:
  
  @itemize @bullet
+@item
+How did you choose to synchronize file system operations?
+
  @item
  How did you structure your inodes? How many blocks did you access
  directly, via single-indirection, and/or via double-indirection?  Why?
@@ -391,42 +466,34 @@ only be deleted if they are empty, as in Unix.
  
  @enumerate 1
  @item
-@b{We're limited to a 64-block cache, but can we also keep a copy of
-each @struct{inode} for an open file inside @struct{file},
-the way the stub code does?}
-
-No, you shouldn't keep any disk sectors stored anywhere outside the
-cache.  That means you'll have to change the way the file
-implementation accesses its corresponding inode right now, since it
-currently just creates a new @struct{inode} in its constructor
-and reads the corresponding sector in from disk when it's created.
-
-There are two reasons for not storing inodes in @struct{file}.
-First, keeping extra copies of inodes would be cheating the 64-block
-limitation that we place on your cache.  Second, if two processes have
-the same file open, you will create a huge synchronization headache
-for yourself if each @struct{file} has its own copy of the inode.
-
-Note that you can store pointers to inodes in @struct{file} if
-you want, and you can store some other small amount of information to
-help you find the inode when you need it.
-
-Similarly, if you want to store one block of data plus some small
-amount of metadata for each of your 64 cache entries, that's fine.
-
-@item
-@b{But why can't we store copies of inodes in @struct{file}? We
-don't understand the answer to the previous question.}
-
-The issue regarding storing @struct{inode}s has to do with
-implementation of the buffer cache.  Basically, you can't store a
-@code{struct inode *} in @struct{inode}.  Each time you need
-to read a @struct{inode}, you'll have to get it either from the
-buffer cache or from disk.
-
-If you look at @func{file_read_at}, it uses the inode directly
-without having first read in that sector from wherever it was in the
-storage hierarchy.  You are no longer allowed to do this.  You will
-need to change @code{file_read_at} (and similar functions) so that it
-reads the inode from the storage hierarchy before using it.
+@b{We're limited to a 64-block cache, but can we also keep an
+@struct{inode_disk} inside @struct{inode}, the way the provided code
+does?}
+
+The goal of the 64-block limit is to bound the amount of cached file
+system data.  If you keep a block of disk data---whether file data or
+metadata---anywhere in kernel memory then you have to count it against
+the 64-block limit.  The same rule applies to anything that's
+``similar'' to a block of disk data, such as a @struct{inode_disk}
+without the @code{length} or @code{sector_cnt} members.
+
+That means you'll have to change the way the inode implementation
+accesses its corresponding on-disk inode right now, since it currently
+just embeds a @struct{inode_disk} in @struct{inode} and reads the
+corresponding sector in from disk when it's created.  Keeping extra
+copies of inodes would be cheating the 64-block limitation that we place
+on your cache.
+
+You can store pointers to inode data in @struct{inode}, if you want, and
+you can store some other small amount of information to help you find
+the inode when you need it.  Similarly, if you want to store one block
+of data plus some small amount of metadata for each of your 64 cache
+entries, that's fine.
+
+If you look at @func{inode_byte_to_sector}, it uses the
+@struct{inode_disk} directly without having first read in that sector
+from wherever it was in the storage hierarchy.  This will no longer
+work.  You will need to change @func{inode_byte_to_sector} so that it
+reads the @struct{inode_disk} from the storage hierarchy before using
+it.
  @end enumerate