pintos-os.org Git - pintos-anon/blob - doc/filesys.texi

   1 @node Project 4--File Systems, Multilevel Feedback Scheduling, Project 3--Virtual Memory, Top
   2 @chapter Project 4: File Systems
   3
   4 In the previous two assignments, you made extensive use of a
   5 filesystem without actually worrying about how it was implemented
   6 underneath.  For this last assignment, you will fill in the
   7 implementation of the filesystem.  You will be working primarily in
   8 the @file{filesys} directory.
   9
  10 You should build on the code you wrote for the previous assignments.
  11 However, if you wish, you may turn off your VM features, as they are
  12 not vital to making the filesystem work.  (You will need to edit
  13 @file{filesys/Makefile.vars} to fully disable VM.)  All of the
  14 functionality needed for project 2 (argument passing, syscalls and
  15 multiprogramming) must work in your filesys submission.
  16
  17 On the other hand, one of the particular charms of working on
  18 operating systems is being able to use what you build, and building
  19 full-featured systems.  Therefore, you should strive to make all the
  20 parts work together so that you can run VM and your filesystem at the
  21 same time.  Plus, keeping VM is a great way to stress-test your
  22 filesystem implementation.
  23
  24 Your submission should define @code{THREAD_JOIN_IMPLEMENTED} in
  25 @file{constants.h} (@pxref{Conditional Compilation}).
  26
  27 @menu
  28 * File System New Code::
  29 * Problem 4-1 Large Files::
  30 * Problem 4-2 File Growth::
  31 * Problem 4-3 Subdirectories::
  32 * Problem 4-4 Buffer Cache::
  33 * File System Design Document Requirements::
  34 * File System FAQs::
  35 @end menu
  36
  37 @node File System New Code
  38 @section New Code
  39
  40 Here are some files that are probably new to you.  These are in the
  41 @file{filesys} directory except where indicated:
  42
  43 @table @file
  44 @item fsutil.c
  45 Simple utilities for the filesystem that are accessible from the
  46 kernel command line.
  47
  48 @item filesys.h
  49 @itemx filesys.c
  50 Top-level interface to the file system.
  51
  52 @item directory.h
  53 @itemx directory.c
  54 Translates file names to disk file headers; the
  55 directory data structure is stored as a file.
  56
  57 @item filehdr.h
  58 @itemx filehdr.c
  59 Manages the data structure representing the layout of a
  60 file's data on disk.
  61
  62 @item file.h
  63 @itemx file.c
  64 Translates file reads and writes to disk sector reads
  65 and writes.
  66
  67 @item devices/disk.h
  68 @itemx devices/disk.c
  69 Provides access to the physical disk, abstracting away the rather
  70 awful IDE interface.
  71
  72 @item lib/kernel/bitmap.h
  73 @itemx lib/kernel/bitmap.c
  74 A bitmap data structure along with routines for reading and writing
  75 the bitmap to disk files.
  76 @end table
  77
  78 Our file system has a Unix-like interface, so you may also wish to
  79 read the Unix man pages for @code{creat}, @code{open}, @code{close},
  80 @code{read}, @code{write}, @code{lseek}, and @code{unlink}.  Our file
  81 system has calls that are similar, but not identical, to these.  The
  82 file system translates these calls into physical disk operations.
  83
  84 All the basic functionality is there in the code above, so that the
  85 filesystem is usable right off the bat.  In fact, you've been using it
  86 in the previous two projects.  However, it has severe limitations
  87 which you will remove.
  88
  89 While most of your work will be in @file{filesys}, you should be
  90 prepared for interactions with all previous parts (as usual).
  91
  92 @node Problem 4-1 Large Files
  93 @section Problem 4-1: Large Files
  94
  95 Modify the file system to allow the maximum size of a file to be as
  96 large as the disk.  You can assume that the disk will not be larger
  97 than 8 MB.  In the basic file system, each file is limited to a file
  98 size of just under 64 kB.  Each file has a header (@code{struct
  99 filehdr}) that is a table of direct pointers to the disk blocks for
 100 that file.  Since the header is stored in one disk sector, the maximum
 101 size of a file is limited by the number of pointers that will fit in
 102 one disk sector.  Increasing the limit to 8 MB will require you to
 103 implement doubly-indirect blocks.
 104
 105 @node Problem 4-2 File Growth
 106 @section Problem 4-2: File Growth
 107
 108 Implement extensible files.  In the basic file system, the file size
 109 is specified when the file is created.  One advantage of this is that
 110 the FileHeader data structure, once created, never changes.  In UNIX
 111 and most other file systems, a file is initially created with size 0
 112 and is then expanded every time a write is made off the end of the
 113 file.  Modify the file system to allow this.  As one test case, allow
 114 the root directory file to expand beyond its current limit of ten
 115 files.  Make sure that concurrent accesses to the file header remain
 116 properly synchronized.
 117
 118 @node Problem 4-3 Subdirectories
 119 @section Problem 4-3: Subdirectories
 120
 121 Implement a hierarchical name space.  In the basic file system, all
 122 files live in a single directory.  Modify this to allow directories to
 123 point to either files or other directories.  To do this, you will need
 124 to implement routines that parse path names into a sequence of
 125 directories, as well as routines that change the current working
 126 directory and that list the contents of the current directory.  For
 127 performance, allow concurrent updates to different directories, but
 128 use mutual exclusion to ensure that updates to the same directory are
 129 performed atomically (for example, to ensure that a file is deleted
 130 only once).
 131
 132 Make sure that directories can expand beyond their original size just
 133 as any other file can.
 134
 135 To take advantage of hierarchical name spaces in user programs,
 136 provide the following syscalls:
 137
 138 @table @code
 139 @item SYS_chdir
 140 @itemx bool chdir (const char *@var{dir})
 141 Attempts to change the current working directory of the process to
 142 @var{dir}, which may be either relative or absolute.  Returns true if
 143 successful, false on failure.
 144
 145 @item SYS_mkdir
 146 @itemx bool mkdir (const char *dir)
 147 Attempts to create the directory named @var{dir}, which may be either
 148 relative or absolute.  Returns true if successful, false on failure.
 149
 150 @item SYS_lsdir
 151 @itemx void lsdir (void)
 152 Prints a list of files in the current directory to @code{stdout}, one
 153 per line.
 154 @end table
 155
 156 Also write the @command{ls} and @command{mkdir} user programs.  This
 157 is straightforward once the above syscalls are implemented.  If Unix,
 158 these are programs rather than built-in shell commands, but
 159 @command{cd} is a shell command.  (Why?)
 160
 161 @node Problem 4-4 Buffer Cache
 162 @section Problem 4-4: Buffer Cache
 163
 164 Modify the file system to keep a cache of file blocks.  When a request
 165 is made to read or write a block, check to see if it is stored in the
 166 cache, and if so, fetch it immediately from the cache without going to
 167 disk.  (Otherwise, fetch the block from disk into cache, evicting an
 168 older entry if necessary.)  You are limited to a cache no greater than
 169 64 sectors in size.  Be sure to choose an intelligent cache
 170 replacement algorithm.  Experiment to see what combination of use,
 171 dirty, and other information results in the best performance, as
 172 measured by the number of disk accesses.  (For example, metadata is
 173 generally more valuable to cache than data.)  Document your
 174 replacement algoritm in your design document.
 175
 176 In addition to the basic file caching scheme, your implementation
 177 should also include the following enhancements:
 178
 179 @table @b
 180 @item write-behind:
 181 Instead of always immediately writing modified data to disk, dirty
 182 blocks can be kept in the cache and written out sometime later.  Your
 183 buffer cache should write behind whenever a block is evicted from the
 184 cache.
 185
 186 @item read-ahead:
 187 Your buffer cache should automatically fetch the next block of a file
 188 into the cache when one block of a file is read, in case that block is
 189 about to be read.
 190 @end table
 191
 192 For each of these three optimizations, design a file I/O workload that
 193 is likely to benefit from the enhancement, explain why you expect it
 194 to perform better than on the original file system implementation, and
 195 demonstrate the performance improvement.
 196
 197 Note that write-behind makes your filesystem more fragile in the face
 198 of crashes.  Therefore, you should implement some manner to
 199 periodically write all cached blocks to disk.  If you have
 200 @code{timer_sleep()} from the first project working, this is an
 201 excellent application for it.
 202
 203 Likewise, read-ahead is only really useful when done asynchronously.
 204 That is, if a process wants disk block 1 from the file, it needs to
 205 block until disk block 1 is read in, but once that read is complete,
 206 control should return to the process immediately while the read
 207 request for disk block 2 is handled asynchronously.  In other words,
 208 the process will block to wait for disk block 1, but should not block
 209 waiting for disk block 2.
 210
 211 When you're implementing this, please make sure you have a scheme for
 212 making any read-ahead and write-behind threads halt when Pintos is
 213 ``done'' (when the user program has completed, etc), so that Pintos
 214 will halt normally and the disk contents will be consistent.
 215
 216 @node File System Design Document Requirements
 217 @section Design Document Requirements
 218
 219 As always, submit a design document file summarizing your design.  Be
 220 sure to cover the following points:
 221
 222 @itemize @bullet
 223 @item
 224 How did you structure your inodes? How many blocks did you access
 225 directly, via single-indirection, and/or via double-indirection?  Why?
 226
 227 @item
 228 How did you structure your buffer cache? How did you perform a lookup
 229 in the cache? How did you choose elements to evict from the cache?
 230
 231 @item
 232 How and when did you flush the cache?
 233 @end itemize
 234
 235 @node File System FAQs
 236 @section FAQ
 237
 238 @enumerate 1
 239 @item
 240 @b{What extra credit opportunities are available for this assignment?}
 241
 242 @itemize @bullet
 243 @item
 244 We'll give out extra credit to groups that implement Unix-style
 245 support for @file{.} and @file{..} in relative paths in their projects.
 246
 247 @item
 248 We'll give some extra credit if you submit with VM enabled.  If you do
 249 this, make sure you show us that you can run multiple programs
 250 concurrently.  A particularly good demonstration is running
 251 @file{capitalize} (with a reduced words file that fits comfortably on
 252 your disk, of course).  So submit a file system disk that contains a
 253 VM-heavy program like @file{capitalize}, so we can try it out.  And also
 254 include the results in your test case file.
 255
 256 We feel that you will be much more satisfied with your cs140 ``final
 257 product'' if you can get your VM working with your file system.  It's
 258 also a great stress test for your FS, but obviously you have to be
 259 pretty confident with your VM if you're going to submit this extra
 260 credit, since you'll still lose points for failing FS-related tests,
 261 even if the problem is in your VM code.
 262
 263 @item
 264 A point of extra credit can be assigned if a user can recursively
 265 remove directories from the shell command prompt.  Note that the
 266 typical semantic is to just fail if a directory is not empty.
 267 @end itemize
 268
 269 Make sure that you discuss any extra credit in your @file{README}
 270 file.  We're likely to miss it if it gets buried in your design
 271 document.
 272
 273 @item
 274 @b{What exec modes for running Pintos do I absolutely need to
 275 support?}
 276
 277 You also need to support the @option{-f}, @option{-ci}, and
 278 @option{-ex} flags individually, and you need to handle them when
 279 they're combined, like this: @samp{pintos -f -ci shell 12345 -ex
 280 "shell"}.  Thus, you should be able to treat the above as equivalent
 281 to:
 282
 283 @example
 284 pintos -f
 285 pintos -ci shell 12345
 286 pintos -ex "shell"
 287 @end example
 288
 289 If you don't change the filesystem interface, then this should already
 290 be implemented properly in @file{threads/init.c} and
 291 @file{filesys/fsutil.c}.
 292
 293 @item
 294 @b{Will you test our file system with a different @code{DISK_SECTOR_SIZE}?}
 295
 296 No, @code{DISK_SECTOR_SIZE} will not change.
 297
 298 @item
 299 @b{Will the @code{struct filehdr} take up space on the disk too?}
 300
 301 Yes.  Anything stored in @code{struct filehdr} takes up space on disk,
 302 so you must include this in your calculation of how many entires will
 303 fit in a single disk sector.
 304
 305 @item
 306 File Growth FAQs
 307
 308 @enumerate 1
 309 @item
 310 @b{What is the largest file size that we are supposed to support?}
 311
 312 The disk we create will be 8 MB or smaller.  However, individual files
 313 will have to be smaller than the disk to accommodate the metadata.
 314 You'll need to consider this when deciding your @code{struct filehdr}
 315 (inode) organization.
 316 @end enumerate
 317
 318 @item
 319 Subdirectory FAQs
 320
 321 @enumerate 1
 322 @item
 323 @b{What's the answer to the question in the spec about why
 324 @command{ls} and @command{mkdir} are user programs, while @command{cd}
 325 is a shell command?}
 326
 327 Each process maintains its own current working directory, so it's much
 328 easier to change the current working directory of the shell process if
 329 @command{cd} is implemented as a shell command rather than as another
 330 user process.  In fact, Unix-like systems don't provide any way for
 331 one process to change another process's current working directory.
 332
 333 @item
 334 @b{When the spec states that directories should be able to grow beyond
 335 ten files, does this mean that there can still be a set maximum number
 336 of files per directory that is greater than ten, or should directories
 337 now support unlimited growth (bounded by the maximum supported file
 338 size)?}
 339
 340 We're looking for directories that can support arbitrarily large
 341 numbers of files.  Now that directories can grow, we want you to
 342 remove the concept of a preset maximum file limit.
 343
 344 @item
 345 @b{When should the @code{lsdir} system call return?}
 346
 347 The @code{lsdir} system call should not return until after the
 348 directory has been printed.  Here's a code fragment, and the desired
 349 output:
 350
 351 @example
 352 printf ("Start of directory\n");
 353 lsdir ();
 354 printf ("End of directory\n");
 355 @end example
 356
 357 This code should create the following output:
 358
 359 @example
 360 Start of directory
 361 ...  directory contents ...
 362 End of directory
 363 @end example
 364
 365 @item
 366 @b{Do we have to implement both absolute and relative pathnames?}
 367
 368 Yes.  Implementing @file{.} and @file{..} is extra credit, though.
 369
 370 @item
 371 @b{Should @code{remove()} also be able to remove directories?}
 372
 373 Yes.  The @code{remove} system call should handle removal of both
 374 regular files and directories.  You may assume that directories can
 375 only be deleted if they are empty, as in Unix.
 376 @end enumerate
 377
 378 @item
 379 Buffer Cache FAQs
 380
 381 @enumerate 1
 382 @item
 383 @b{We're limited to a 64-block cache, but can we also keep a copy of
 384 each @code{struct filehdr} for an open file inside @code{struct file},
 385 the way the stub code does?}
 386
 387 No, you shouldn't keep any disk sectors stored anywhere outside the
 388 cache.  That means you'll have to change the way the file
 389 implementation accesses its corresponding inode right now, since it
 390 currently just creates a new @code{struct filehdr} in its constructor
 391 and reads the corresponding sector in from disk when it's created.
 392
 393 There are two reasons for not storing inodes in @code{struct file}.
 394 First, keeping extra copies of inodes would be cheating the 64-block
 395 limitation that we place on your cache.  Second, if two processes have
 396 the same file open, you will create a huge synchronization headache
 397 for yourself if each @code{struct file} has its own copy of the inode.
 398
 399 Note that you can store pointers to inodes in @code{struct file} if
 400 you want, and you can store some other small amount of information to
 401 help you find the inode when you need it.
 402
 403 Similarly, if you want to store one block of data plus some small
 404 amount of metadata for each of your 64 cache entries, that's fine.
 405
 406 @item
 407 @b{But why can't we store copies of inodes in @code{struct file}? We
 408 don't understand the answer to the previous question.}
 409
 410 The issue regarding storing @code{struct filehdr}s has to do with
 411 implementation of the buffer cache.  Basically, you can't store a
 412 @code{struct filehdr *} in @code{struct filehdr}.  Each time you need
 413 to read a @code{struct filehdr}, you'll have to get it either from the
 414 buffer cache or from disk.
 415
 416 If you look at @code{file_read_at()}, it uses @code{hdr} directly
 417 without having first read in that sector from wherever it was in the
 418 storage hierarchy.  You are no longer allowed to do this.  You will
 419 need to change @code{file_read_at} (and similar functions) so that it
 420 reads @code{hdr} from the storage hierarchy before using it.
 421 @end enumerate
 422 @end enumerate