pintos-os.org Git - pintos-anon/blob - doc/filesys.texi

   1 @node Project 4--File Systems, References, Project 3--Virtual Memory, Top
   2 @chapter Project 4: File Systems
   3
   4 In the previous two assignments, you made extensive use of a
   5 filesystem without actually worrying about how it was implemented
   6 underneath.  For this last assignment, you will fill in the
   7 implementation of the filesystem.  You will be working primarily in
   8 the @file{filesys} directory.
   9
  10 You should build on the code you wrote for the previous assignments.
  11 However, if you wish, you may turn off your VM features, as they are
  12 not vital to making the filesystem work.  (You will need to edit
  13 @file{filesys/Makefile.vars} to fully disable VM.)  All of the
  14 functionality needed for project 2 (argument passing, syscalls and
  15 multiprogramming) must work in your filesys submission.
  16
  17 On the other hand, one of the particular charms of working on
  18 operating systems is being able to use what you build, and building
  19 full-featured systems.  Therefore, you should strive to make all the
  20 parts work together so that you can run VM and your filesystem at the
  21 same time.  Plus, keeping VM is a great way to stress-test your
  22 filesystem implementation.
  23
  24 Your submission should define @code{THREAD_JOIN_IMPLEMENTED} in
  25 @file{constants.h} (@pxref{Conditional Compilation}).
  26
  27 @menu
  28 * File System New Code::
  29 * Problem 4-1 Large Files::
  30 * Problem 4-2 File Growth::
  31 * Problem 4-3 Subdirectories::
  32 * Problem 4-4 Buffer Cache::
  33 * File System Design Document Requirements::
  34 * File System FAQ::
  35 @end menu
  36
  37 @node File System New Code
  38 @section New Code
  39
  40 Here are some files that are probably new to you.  These are in the
  41 @file{filesys} directory except where indicated:
  42
  43 @table @file
  44 @item fsutil.c
  45 Simple utilities for the filesystem that are accessible from the
  46 kernel command line.
  47
  48 @item filesys.h
  49 @itemx filesys.c
  50 Top-level interface to the file system.
  51
  52 @item directory.h
  53 @itemx directory.c
  54 Translates file names to inodes.  The directory data structure is
  55 stored as a file.
  56
  57 @item inode.h
  58 @itemx inode.c
  59 Manages the data structure representing the layout of a
  60 file's data on disk.
  61
  62 @item file.h
  63 @itemx file.c
  64 Translates file reads and writes to disk sector reads
  65 and writes.
  66
  67 @item lib/kernel/bitmap.h
  68 @itemx lib/kernel/bitmap.c
  69 A bitmap data structure along with routines for reading and writing
  70 the bitmap to disk files.
  71 @end table
  72
  73 Our file system has a Unix-like interface, so you may also wish to
  74 read the Unix man pages for @code{creat}, @code{open}, @code{close},
  75 @code{read}, @code{write}, @code{lseek}, and @code{unlink}.  Our file
  76 system has calls that are similar, but not identical, to these.  The
  77 file system translates these calls into physical disk operations.
  78
  79 All the basic functionality is there in the code above, so that the
  80 filesystem is usable right off the bat.  In fact, you've been using it
  81 in the previous two projects.  However, it has severe limitations
  82 which you will remove.
  83
  84 While most of your work will be in @file{filesys}, you should be
  85 prepared for interactions with all previous parts (as usual).
  86
  87 @node Problem 4-1 Large Files
  88 @section Problem 4-1: Large Files
  89
  90 Modify the file system to allow the maximum size of a file to be as
  91 large as the disk.  You can assume that the disk will not be larger
  92 than 8 MB.  In the basic file system, each file is limited to a file
  93 size of just under 64 kB.  Each file has a header called an index node
  94 or @dfn{inode} (represented by @struct{inode}) that is a table of
  95 direct pointers to the disk blocks for that file.  Since the inode is
  96 stored in one disk sector, the maximum size of a file is limited by
  97 the number of pointers that will fit in one disk sector.  Increasing
  98 the limit to 8 MB will require you to implement doubly-indirect
  99 blocks.
 100
 101 @node Problem 4-2 File Growth
 102 @section Problem 4-2: File Growth
 103
 104 Implement extensible files.  In the basic file system, the file size
 105 is specified when the file is created.  One advantage of this is that
 106 the inode data structure, once created, never changes.  In UNIX and
 107 most other file systems, a file is initially created with size 0 and
 108 is then expanded every time a write is made off the end of the file.
 109 Modify the file system to allow this.  As one test case, allow the
 110 root directory file to expand beyond its current limit of ten files.
 111 Make sure that concurrent accesses to the inode remain properly
 112 synchronized.
 113
 114 @node Problem 4-3 Subdirectories
 115 @section Problem 4-3: Subdirectories
 116
 117 Implement a hierarchical name space.  In the basic file system, all
 118 files live in a single directory.  Modify this to allow directories to
 119 point to either files or other directories.  To do this, you will need
 120 to implement routines that parse path names into a sequence of
 121 directories, as well as routines that change the current working
 122 directory and that list the contents of the current directory.  For
 123 performance, allow concurrent updates to different directories, but
 124 use mutual exclusion to ensure that updates to the same directory are
 125 performed atomically (for example, to ensure that a file is deleted
 126 only once).
 127
 128 Make sure that directories can expand beyond their original size just
 129 as any other file can.
 130
 131 To take advantage of hierarchical name spaces in user programs,
 132 provide the following syscalls:
 133
 134 @table @code
 135 @item SYS_chdir
 136 @itemx bool chdir (const char *@var{dir})
 137 Attempts to change the current working directory of the process to
 138 @var{dir}, which may be either relative or absolute.  Returns true if
 139 successful, false on failure.
 140
 141 @item SYS_mkdir
 142 @itemx bool mkdir (const char *dir)
 143 Attempts to create the directory named @var{dir}, which may be either
 144 relative or absolute.  Returns true if successful, false on failure.
 145
 146 @item SYS_lsdir
 147 @itemx void lsdir (void)
 148 Prints a list of files in the current directory to @code{stdout}, one
 149 per line.
 150 @end table
 151
 152 Also write the @command{ls} and @command{mkdir} user programs.  This
 153 is straightforward once the above syscalls are implemented.  In Unix,
 154 these are programs rather than built-in shell commands, but
 155 @command{cd} is a shell command.  (Why?)
 156
 157 @node Problem 4-4 Buffer Cache
 158 @section Problem 4-4: Buffer Cache
 159
 160 Modify the file system to keep a cache of file blocks.  When a request
 161 is made to read or write a block, check to see if it is stored in the
 162 cache, and if so, fetch it immediately from the cache without going to
 163 disk.  (Otherwise, fetch the block from disk into cache, evicting an
 164 older entry if necessary.)  You are limited to a cache no greater than
 165 64 sectors in size.  Be sure to choose an intelligent cache
 166 replacement algorithm.  Experiment to see what combination of accessed,
 167 dirty, and other information results in the best performance, as
 168 measured by the number of disk accesses.  (For example, metadata is
 169 generally more valuable to cache than data.)  Document your
 170 replacement algorithm in your design document.
 171
 172 The provided file system code uses a ``bounce buffer'' in @struct{file}
 173 to translate the disk's sector-by-sector interface into the system call
 174 interface's byte-by-byte interface.  It needs per-file buffers because,
 175 without them, there's no other good place to put sector
 176 data.@footnote{The stack is not a good place because large objects
 177 should not be allocated on the stack.  A 512-byte sector is pushing the
 178 limit there.}  As part of implementing the buffer cache, you should get
 179 rid of these bounce buffers.  Instead, copy data into and out of sectors
 180 in the buffer cache directly.  You will probably need some
 181 synchronization to prevent sectors from being evicted from the cache
 182 while you are using them.
 183
 184 In addition to the basic file caching scheme, your implementation
 185 should also include the following enhancements:
 186
 187 @table @b
 188 @item write-behind:
 189 Instead of always immediately writing modified data to disk, dirty
 190 blocks can be kept in the cache and written out sometime later.  Your
 191 buffer cache should write behind whenever a block is evicted from the
 192 cache.
 193
 194 @item read-ahead:
 195 Your buffer cache should automatically fetch the next block of a file
 196 into the cache when one block of a file is read, in case that block is
 197 about to be read.
 198 @end table
 199
 200 For each of these three optimizations, design a file I/O workload that
 201 is likely to benefit from the enhancement, explain why you expect it
 202 to perform better than on the original file system implementation, and
 203 demonstrate the performance improvement.
 204
 205 Note that write-behind makes your filesystem more fragile in the face
 206 of crashes.  Therefore, you should
 207 periodically write all cached blocks to disk.  If you have
 208 @func{timer_sleep} from the first project working, this is an
 209 excellent application for it.
 210
 211 Likewise, read-ahead is only really useful when done asynchronously.
 212 That is, if a process wants disk block 1 from the file, it needs to
 213 block until disk block 1 is read in, but once that read is complete,
 214 control should return to the process immediately while the read
 215 request for disk block 2 is handled asynchronously.  In other words,
 216 the process will block to wait for disk block 1, but should not block
 217 waiting for disk block 2.
 218
 219 When you're implementing this, please make sure you have a scheme for
 220 making any read-ahead and write-behind threads halt when Pintos is
 221 ``done'' (when the user program has completed, etc), so that Pintos
 222 will halt normally and the disk contents will be consistent.
 223
 224 @node File System Design Document Requirements
 225 @section Design Document Requirements
 226
 227 As always, submit a design document file summarizing your design.  Be
 228 sure to cover the following points:
 229
 230 @itemize @bullet
 231 @item
 232 How did you structure your inodes? How many blocks did you access
 233 directly, via single-indirection, and/or via double-indirection?  Why?
 234
 235 @item
 236 How did you structure your buffer cache? How did you perform a lookup
 237 in the cache? How did you choose elements to evict from the cache?
 238
 239 @item
 240 How and when did you flush the cache?
 241 @end itemize
 242
 243 @node File System FAQ
 244 @section FAQ
 245
 246 @enumerate 1
 247 @item
 248 @b{What extra credit opportunities are available for this assignment?}
 249
 250 @itemize @bullet
 251 @item
 252 We'll give out extra credit to groups that implement Unix-style
 253 support for @file{.} and @file{..} in relative paths in their projects.
 254
 255 @item
 256 We'll give some extra credit if you submit with VM enabled.  If you do
 257 this, make sure you show us that you can run multiple programs
 258 concurrently.  A particularly good demonstration is running
 259 @file{capitalize} (with a reduced words file that fits comfortably on
 260 your disk, of course).  So submit a file system disk that contains a
 261 VM-heavy program like @file{capitalize}, so we can try it out.  And also
 262 include the results in your test case file.
 263
 264 We feel that you will be much more satisfied with your cs140 ``final
 265 product'' if you can get your VM working with your file system.  It's
 266 also a great stress test for your FS, but obviously you have to be
 267 pretty confident with your VM if you're going to submit this extra
 268 credit, since you'll still lose points for failing FS-related tests,
 269 even if the problem is in your VM code.
 270
 271 @item
 272 A point of extra credit can be assigned if a user can recursively
 273 remove directories from the shell command prompt.  Note that the
 274 typical semantic is to just fail if a directory is not empty.
 275 @end itemize
 276
 277 Make sure that you discuss any extra credit in your @file{README}
 278 file.  We're likely to miss it if it gets buried in your design
 279 document.
 280
 281 @item
 282 @b{What exec modes for running Pintos do I absolutely need to
 283 support?}
 284
 285 You also need to support the @option{-f}, @option{-ci}, @option{-co},
 286 and @option{-ex} flags individually, and you need to handle them when
 287 they're combined, like this: @samp{pintos -f -ci shell 12345 -ex
 288 "shell"}.  Thus, you should be able to treat the above as equivalent to:
 289
 290 @example
 291 pintos -f
 292 pintos -ci shell 12345
 293 pintos -ex "shell"
 294 @end example
 295
 296 If you don't change the filesystem interface, then this should already
 297 be implemented properly in @file{threads/init.c} and
 298 @file{filesys/fsutil.c}.
 299
 300 You must also implement the @option{-q} option and make sure that data
 301 gets flushed out to disk properly when it is used.
 302
 303 @item
 304 @b{Will you test our file system with a different @code{DISK_SECTOR_SIZE}?}
 305
 306 No, @code{DISK_SECTOR_SIZE} is fixed at 512.  This is a fixed property
 307 of IDE disk hardware.
 308
 309 @item
 310 @b{Will the @struct{inode} take up space on the disk too?}
 311
 312 Yes.  Anything stored in @struct{inode} takes up space on disk,
 313 so you must include this in your calculation of how many entires will
 314 fit in a single disk sector.
 315
 316 @item
 317 @b{What's the directory separator character?}
 318
 319 Forward slash (@samp{/}).
 320 @end enumerate
 321
 322 @menu
 323 * Problem 4-2 File Growth FAQ::
 324 * Problem 4-3 Subdirectory FAQ::
 325 * Problem 4-4 Buffer Cache FAQ::
 326 @end menu
 327
 328 @node Problem 4-2 File Growth FAQ
 329 @subsection Problem 4-2: File Growth FAQ
 330
 331 @enumerate 1
 332 @item
 333 @b{What is the largest file size that we are supposed to support?}
 334
 335 The disk we create will be 8 MB or smaller.  However, individual files
 336 will have to be smaller than the disk to accommodate the metadata.
 337 You'll need to consider this when deciding your @struct{inode}
 338 organization.
 339 @end enumerate
 340
 341 @node Problem 4-3 Subdirectory FAQ
 342 @subsection Problem 4-3: Subdirectory FAQ
 343
 344 @enumerate 1
 345 @item
 346 @b{What's the answer to the question in the spec about why
 347 @command{ls} and @command{mkdir} are user programs, while @command{cd}
 348 is a shell command?}
 349
 350 Each process maintains its own current working directory, so it's much
 351 easier to change the current working directory of the shell process if
 352 @command{cd} is implemented as a shell command rather than as another
 353 user process.  In fact, Unix-like systems don't provide any way for
 354 one process to change another process's current working directory.
 355
 356 @item
 357 @b{When the spec states that directories should be able to grow beyond
 358 ten files, does this mean that there can still be a set maximum number
 359 of files per directory that is greater than ten, or should directories
 360 now support unlimited growth (bounded by the maximum supported file
 361 size)?}
 362
 363 We're looking for directories that can support arbitrarily large
 364 numbers of files.  Now that directories can grow, we want you to
 365 remove the concept of a preset maximum file limit.
 366
 367 @item
 368 @b{When should the @code{lsdir} system call return?}
 369
 370 The @code{lsdir} system call should not return until after the
 371 directory has been printed.  Here's a code fragment, and the desired
 372 output:
 373
 374 @example
 375 printf ("Start of directory\n");
 376 lsdir ();
 377 printf ("End of directory\n");
 378 @end example
 379
 380 This code should create the following output:
 381
 382 @example
 383 Start of directory
 384 ...  directory contents ...
 385 End of directory
 386 @end example
 387
 388 @item
 389 @b{Do we have to implement both absolute and relative pathnames?}
 390
 391 Yes.  Implementing @file{.} and @file{..} is extra credit, though.
 392
 393 @item
 394 @b{Should @func{remove} also be able to remove directories?}
 395
 396 Yes.  The @code{remove} system call should handle removal of both
 397 regular files and directories.  You may assume that directories can
 398 only be deleted if they are empty, as in Unix.
 399 @end enumerate
 400
 401 @node Problem 4-4 Buffer Cache FAQ
 402 @subsection Problem 4-4: Buffer Cache FAQ
 403
 404 @enumerate 1
 405 @item
 406 @b{We're limited to a 64-block cache, but can we also keep a copy of
 407 each @struct{inode} for an open file inside @struct{file},
 408 the way the stub code does?}
 409
 410 No, you shouldn't keep any disk sectors stored anywhere outside the
 411 cache.  That means you'll have to change the way the file
 412 implementation accesses its corresponding inode right now, since it
 413 currently just creates a new @struct{inode} in its constructor
 414 and reads the corresponding sector in from disk when it's created.
 415
 416 There are two reasons for not storing inodes in @struct{file}.
 417 First, keeping extra copies of inodes would be cheating the 64-block
 418 limitation that we place on your cache.  Second, if two processes have
 419 the same file open, you will create a huge synchronization headache
 420 for yourself if each @struct{file} has its own copy of the inode.
 421
 422 Note that you can store pointers to inodes in @struct{file} if
 423 you want, and you can store some other small amount of information to
 424 help you find the inode when you need it.
 425
 426 Similarly, if you want to store one block of data plus some small
 427 amount of metadata for each of your 64 cache entries, that's fine.
 428
 429 @item
 430 @b{But why can't we store copies of inodes in @struct{file}? We
 431 don't understand the answer to the previous question.}
 432
 433 The issue regarding storing @struct{inode}s has to do with
 434 implementation of the buffer cache.  Basically, you can't store a
 435 @code{struct inode *} in @struct{inode}.  Each time you need
 436 to read a @struct{inode}, you'll have to get it either from the
 437 buffer cache or from disk.
 438
 439 If you look at @func{file_read_at}, it uses the inode directly
 440 without having first read in that sector from wherever it was in the
 441 storage hierarchy.  You are no longer allowed to do this.  You will
 442 need to change @code{file_read_at} (and similar functions) so that it
 443 reads the inode from the storage hierarchy before using it.
 444 @end enumerate