pintos-os.org Git - pintos-anon/blob - doc/debug.texi

   1 @node Debugging Tools, Development Tools, Project Documentation, Top
   2 @appendix Debugging Tools
   3
   4 Many tools lie at your disposal for debugging Pintos.  This appendix
   5 introduces you to a few of them.
   6
   7 @menu
   8 * printf::
   9 * ASSERT::
  10 * Function and Parameter Attributes::
  11 * Backtraces::
  12 * gdb::
  13 * Debugging by Infinite Loop::
  14 * Modifying Bochs::
  15 * Debugging Tips::
  16 @end menu
  17
  18 @node printf
  19 @section @code{printf()}
  20
  21 Don't underestimate the value of @func{printf}.  The way
  22 @func{printf} is implemented in Pintos, you can call it from
  23 practically anywhere in the kernel, whether it's in a kernel thread or
  24 an interrupt handler, almost regardless of what locks are held (but see
  25 @ref{printf Reboots} for a counterexample).
  26
  27 @func{printf} is useful for more than just examining data.
  28 It can also help figure out when and where something goes wrong, even
  29 when the kernel crashes or panics without a useful error message.  The
  30 strategy is to sprinkle calls to @func{print} with different strings
  31 (e.g.@: @code{"<1>"}, @code{"<2>"}, @dots{}) throughout the pieces of
  32 code you suspect are failing.  If you don't even see @code{<1>} printed,
  33 then something bad happened before that point, if you see @code{<1>}
  34 but not @code{<2>}, then something bad happened between those two
  35 points, and so on.  Based on what you learn, you can then insert more
  36 @func{printf} calls in the new, smaller region of code you suspect.
  37 Eventually you can narrow the problem down to a single statement.
  38 @xref{Debugging by Infinite Loop}, for a related technique.
  39
  40 @node ASSERT
  41 @section @code{ASSERT}
  42
  43 Assertions are useful because they can catch problems early, before
  44 they'd otherwise be noticed.  Pintos provides the
  45 @code{ASSERT}, defined in @file{<debug.h>}, for assertions.
  46 Ideally, each function should begin with a set of
  47 assertions that check its arguments for validity.  (Initializers for
  48 functions' local variables are evaluated before assertions are
  49 checked, so be careful not to assume that an argument is valid in an
  50 initializer.)  You can also sprinkle assertions throughout the body of
  51 functions in places where you suspect things are likely to go wrong.
  52 They are especially useful for checking loop invariants.
  53
  54 When an assertion proves untrue, the kernel panics.  The panic message
  55 should help you to find the problem.  See the description of
  56 backtraces below for more information.
  57
  58 @node Function and Parameter Attributes
  59 @section Function and Parameter Attributes
  60
  61 These macros defined in @file{<debug.h>} tell the compiler special
  62 attributes of a function or function parameter.  Their expansions are
  63 GCC-specific.
  64
  65 @defmac UNUSED
  66 Appended to a function parameter to tell the compiler that the
  67 parameter might not be used within the function.  It suppresses the
  68 warning that would otherwise appear.
  69 @end defmac
  70
  71 @defmac NO_RETURN
  72 Appended to a function prototype to tell the compiler that the
  73 function never returns.  It allows the compiler to fine-tune its
  74 warnings and its code generation.
  75 @end defmac
  76
  77 @defmac NO_INLINE
  78 Appended to a function prototype to tell the compiler to never emit
  79 the function in-line.  Occasionally useful to improve the quality of
  80 backtraces (see below).
  81 @end defmac
  82
  83 @defmac PRINTF_FORMAT (@var{format}, @var{first})
  84 Appended to a function prototype to tell the compiler that the function
  85 takes a @func{printf}-like format string as the argument numbered
  86 @var{format} (starting from 1) and that the corresponding value
  87 arguments start at the argument numbered @var{first}.  This lets the
  88 compiler tell you if you pass the wrong argument types.
  89 @end defmac
  90
  91 @node Backtraces
  92 @section Backtraces
  93
  94 When the kernel panics, it prints a ``backtrace,'' that is, a summary
  95 of how your program got where it is, as a list of addresses inside the
  96 functions that were running at the time of the panic.  You can also
  97 insert a call to @func{debug_backtrace}, prototyped in
  98 @file{<debug.h>}, to print a backtrace at any point in your code.
  99
 100 The addresses in a backtrace are listed as raw hexadecimal numbers,
 101 which are meaningless by themselves.  You can translate them into
 102 function names and source file line numbers using a tool called
 103 @command{addr2line} (80@var{x}86) or @command{i386-elf-addr2line}
 104 (SPARC).
 105
 106 The output format of @command{addr2line} is not ideal, so
 107 we've supplied a wrapper for it simply called @command{backtrace}.
 108 Give it the name of your @file{kernel.o} as the first argument and the
 109 hexadecimal numbers composing the backtrace (including the @samp{0x}
 110 prefixes) as the remaining arguments.  It outputs the function name
 111 and source file line numbers that correspond to each address.
 112
 113 If the translated form of a backtrace is garbled, or doesn't make
 114 sense (e.g.@: function A is listed above function B, but B doesn't
 115 call A), then it's a good sign that you're corrupting a kernel
 116 thread's stack, because the backtrace is extracted from the stack.
 117 Alternatively, it could be that the @file{kernel.o} you passed to
 118 @command{backtrace} does not correspond to the kernel that produced
 119 the backtrace.
 120
 121 Sometimes backtraces can be confusing without implying corruption.
 122 Compiler optimizations can cause surprising behavior.  When a function
 123 has called another function as its final action (a @dfn{tail call}), the
 124 calling function may not appear in a backtrace at all.  Similarly, when
 125 function A calls another function B that never returns, the compiler may
 126 optimize such that an unrelated function C appears in the backtrace
 127 instead of A.  Function C is simply the function that happens to be in
 128 memory just after A.  In the threads project, this is commonly seen in
 129 backtraces for test failures; see @ref{The pass function fails, ,
 130 @func{pass} Fails}, for more information.
 131
 132 @menu
 133 * Backtrace Example::
 134 @end menu
 135
 136 @node Backtrace Example
 137 @subsection Example
 138
 139 Here's an example.  Suppose that Pintos printed out this following call
 140 stack, which is taken from an actual Pintos submission for the file
 141 system project:
 142
 143 @example
 144 Call stack: 0xc0106eff 0xc01102fb 0xc010dc22 0xc010cf67 0xc0102319
 145 0xc010325a 0x804812c 0x8048a96 0x8048ac8.
 146 @end example
 147
 148 You would then invoke the @command{backtrace} utility like shown below,
 149 cutting and pasting the backtrace information into the command line.
 150 This assumes that @file{kernel.o} is in the current directory.  You
 151 would of course enter all of the following on a single shell command
 152 line, even though that would overflow our margins here:
 153
 154 @example
 155 backtrace kernel.o 0xc0106eff 0xc01102fb 0xc010dc22 0xc010cf67
 156 0xc0102319 0xc010325a 0x804812c 0x8048a96 0x8048ac8
 157 @end example
 158
 159 The backtrace output would then look something like this:
 160
 161 @example
 162 0xc0106eff: debug_panic (../../lib/debug.c:86)
 163 0xc01102fb: file_seek (../../filesys/file.c:405)
 164 0xc010dc22: seek (../../userprog/syscall.c:744)
 165 0xc010cf67: syscall_handler (../../userprog/syscall.c:444)
 166 0xc0102319: intr_handler (../../threads/interrupt.c:334)
 167 0xc010325a: ?? (threads/intr-stubs.S:1554)
 168 0x804812c: ?? (??:0)
 169 0x8048a96: ?? (??:0)
 170 0x8048ac8: ?? (??:0)
 171 @end example
 172
 173 (You will probably not get the same results if you run the command above
 174 on your own kernel binary, because the source code you compiled from is
 175 different from the source code that panicked.)
 176
 177 The first line in the backtrace refers to @func{debug_panic}, the
 178 function that implements kernel panics.  Because backtraces commonly
 179 result from kernel panics, @func{debug_panic} will often be the first
 180 function shown in a backtrace.
 181
 182 The second line shows @func{file_seek} as the function that panicked,
 183 in this case as the result of an assertion failure.  In the source code
 184 tree used for this example, line 405 of @file{filesys/file.c} is the
 185 assertion
 186
 187 @example
 188 ASSERT (file_ofs >= 0);
 189 @end example
 190
 191 @noindent
 192 (This line was also cited in the assertion failure message.)
 193 Thus, @func{file_seek} panicked because it passed a negative file offset
 194 argument.
 195
 196 The third line indicates that @func{seek} called @func{file_seek},
 197 presumably without validating the offset argument.  In this submission,
 198 @func{seek} implements the @code{seek} system call.
 199
 200 The fourth line shows that @func{syscall_handler}, the system call
 201 handler, invoked @func{seek}.
 202
 203 The fifth and sixth lines are the interrupt handler entry path.
 204
 205 The remaining lines are for addresses below @code{PHYS_BASE}.  This
 206 means that they refer to addresses in the user program, not in the
 207 kernel.  If you know what user program was running when the kernel
 208 panicked, you can re-run @command{backtrace} on the user program, like
 209 so: (typing the command on a single line, of course):
 210
 211 @example
 212 backtrace grow-too-big 0xc0106eff 0xc01102fb 0xc010dc22 0xc010cf67
 213 0xc0102319 0xc010325a 0x804812c 0x8048a96 0x8048ac8
 214 @end example
 215
 216 The results look like this:
 217
 218 @example
 219 0xc0106eff: ?? (??:0)
 220 0xc01102fb: ?? (??:0)
 221 0xc010dc22: ?? (??:0)
 222 0xc010cf67: ?? (??:0)
 223 0xc0102319: ?? (??:0)
 224 0xc010325a: ?? (??:0)
 225 0x804812c: test_main (../../tests/filesys/extended/grow-too-big.c:20)
 226 0x8048a96: main (../../tests/main.c:10)
 227 0x8048ac8: _start (../../lib/user/entry.c:9)
 228 @end example
 229
 230 Here's an extra tip for anyone who read this far: @command{backtrace}
 231 is smart enough to strip the @code{Call stack:} header and @samp{.}
 232 trailer from the command line if you include them.  This can save you
 233 a little bit of trouble in cutting and pasting.  Thus, the following
 234 command prints the same output as the first one we used:
 235
 236 @example
 237 backtrace kernel.o Call stack: 0xc0106eff 0xc01102fb 0xc010dc22
 238 0xc010cf67 0xc0102319 0xc010325a 0x804812c 0x8048a96 0x8048ac8.
 239 @end example
 240
 241 @node gdb
 242 @section @command{gdb}
 243
 244 You can run the Pintos kernel under the supervision of the
 245 @command{gdb} (80@var{x}86) or @command{i386-elf-gdb} (SPARC)
 246 debugger.  First,
 247 start Pintos with the @option{--gdb} option, e.g.@: @command{pintos
 248 --gdb -- run mytest}.  Second, in a separate terminal, invoke @command{gdb} (or
 249 @command{i386-elf-gdb}) on
 250 @file{kernel.o}:
 251 @example
 252 gdb kernel.o
 253 @end example
 254 @noindent and issue the following @command{gdb} command:
 255 @example
 256 target remote localhost:1234
 257 @end example
 258
 259 (If the @command{target remote} command fails, then make sure that both
 260 @command{gdb} and @command{pintos} are running on the same machine by
 261 running @command{hostname} in each terminal.  If the names printed
 262 differ, then you need to open a new terminal for @command{gdb} on the
 263 machine running @command{pintos}.)
 264
 265 Now @command{gdb} is connected to the simulator over a local
 266 network connection.  You can now issue any normal @command{gdb}
 267 commands.  If you issue the @samp{c} command, the simulated BIOS will take
 268 control, load Pintos, and then Pintos will run in the usual way.  You
 269 can pause the process at any point with @key{Ctrl+C}.  If you want
 270 @command{gdb} to stop when Pintos starts running, set a breakpoint on
 271 @func{main} with the command @code{break main} before @samp{c}.
 272
 273 You can read the @command{gdb} manual by typing @code{info gdb} at a
 274 terminal command prompt, or you can view it in Emacs with the command
 275 @kbd{C-h i}.  Here's a few commonly useful @command{gdb} commands:
 276
 277 @table @code
 278 @item c
 279 Continues execution until @key{Ctrl+C} or the next breakpoint.
 280
 281 @item break @var{function}
 282 @itemx break @var{filename}:@var{linenum}
 283 @itemx break *@var{address}
 284 Sets a breakpoint at the given function, line number, or address.
 285 (Use a @samp{0x} prefix to specify an address in hex.)
 286
 287 @item p @var{expression}
 288 Evaluates the given C expression and prints its value.
 289 If the expression contains a function call, that function will actually
 290 be executed.
 291
 292 @item l *@var{address}
 293 Lists a few lines of code around the given address.
 294 (Use a @samp{0x} prefix to specify an address in hex.)
 295
 296 @item bt
 297 Prints a stack backtrace similar to that output by the
 298 @command{backtrace} program described above.
 299
 300 @item p/a @var{address}
 301 Prints the name of the function or variable that occupies the given
 302 address.
 303 (Use a @samp{0x} prefix to specify an address in hex.)
 304
 305 @item diassemble @var{function}
 306 Disassembles the specified @var{function}.
 307 @end table
 308
 309 If you notice other strange behavior while using @command{gdb}, there
 310 are three possibilities: a bug in your
 311 modified Pintos, a bug in Bochs's
 312 interface to @command{gdb} or in @command{gdb} itself, or
 313 a bug in the original Pintos code.  The first and second
 314 are quite likely, and you should seriously consider both.  We hope
 315 that the third is less likely, but it is also possible.
 316
 317 @menu
 318 * Debugging User Programs::
 319 @end menu
 320
 321 @node Debugging User Programs
 322 @subsection Debugging User Programs
 323
 324 You can also use @command{gdb} to debug a user program running under
 325 Pintos.  Start by issuing this @command{gdb} command to load the
 326 program's symbol table:
 327 @example
 328 add-symbol-file @var{program}
 329 @end example
 330 @noindent
 331 where @var{program} is the name of the program's executable (in the host
 332 file system, not in the Pintos file system).  After this, you should be
 333 able to debug the user program the same way you would the kernel, by
 334 placing breakpoints, inspecting data, etc.  Your actions apply to every
 335 user program running in Pintos, not just to the one you want to debug,
 336 so be careful in interpreting the results.  Also, a name that appears in
 337 both the kernel and the user program will actually refer to the kernel
 338 name.  (The latter problem can be avoided by giving the user executable
 339 name on the @command{gdb} command line, instead of @file{kernel.o}.)
 340
 341 @node Debugging by Infinite Loop
 342 @section Debugging by Infinite Loop
 343
 344 If you get yourself into a situation where the machine reboots in a
 345 loop, that's probably a ``triple fault.''  In such a situation you
 346 might not be able to use @func{printf} for debugging, because the
 347 reboots might be happening even before everything needed for
 348 @func{printf} is initialized.  In such a situation, you might want to
 349 try what I call ``debugging by infinite loop.''
 350
 351 What you do is pick a place in the Pintos code, insert the statement
 352 @code{for (;;);} there, and recompile and run.  There are two likely
 353 possibilities:
 354
 355 @itemize @bullet
 356 @item
 357 The machine hangs without rebooting.  If this happens, you know that
 358 the infinite loop is running.  That means that whatever caused the
 359 reboot must be @emph{after} the place you inserted the infinite loop.
 360 Now move the infinite loop later in the code sequence.
 361
 362 @item
 363 The machine reboots in a loop.  If this happens, you know that the
 364 machine didn't make it to the infinite loop.  Thus, whatever caused the
 365 reboot must be @emph{before} the place you inserted the infinite loop.
 366 Now move the infinite loop earlier in the code sequence.
 367 @end itemize
 368
 369 If you move around the infinite loop in a ``binary search'' fashion, you
 370 can use this technique to pin down the exact spot that everything goes
 371 wrong.  It should only take a few minutes at most.
 372
 373 @node Modifying Bochs
 374 @section Modifying Bochs
 375
 376 An advanced debugging technique is to modify and recompile the
 377 simulator.  This proves useful when the simulated hardware has more
 378 information than it makes available to the OS.  For example, page
 379 faults have a long list of potential causes, but the hardware does not
 380 report to the OS exactly which one is the particular cause.
 381 Furthermore, a bug in the kernel's handling of page faults can easily
 382 lead to recursive faults, but a ``triple fault'' will cause the CPU to
 383 reset itself, which is hardly conducive to debugging.
 384
 385 In a case like this, you might appreciate being able to make Bochs
 386 print out more debug information, such as the exact type of fault that
 387 occurred.  It's not very hard.  You start by retrieving the source
 388 code for Bochs 2.1.1 from @uref{http://bochs.sourceforge.net} and
 389 extracting it into a directory.  Then read
 390 @file{pintos/src/misc/bochs-2.1.1.patch} and apply the patches needed.
 391 Then run @file{./configure}, supplying the options you want (some
 392 suggestions are in the patch file).  Finally, run @command{make}.
 393 This will compile Bochs and eventually produce a new binary
 394 @file{bochs}.  To use your @file{bochs} binary with @command{pintos},
 395 put it in your @env{PATH}, and make sure that it is earlier than
 396 @file{/usr/class/cs140/`uname -m`/bochs}.
 397
 398 Of course, to get any good out of this you'll have to actually modify
 399 Bochs.  Instructions for doing this are firmly out of the scope of
 400 this document.  However, if you want to debug page faults as suggested
 401 above, a good place to start adding @func{printf}s is
 402 @func{BX_CPU_C::dtranslate_linear} in @file{cpu/paging.cc}.
 403
 404 @node Debugging Tips
 405 @section Tips
 406
 407 The page allocator in @file{threads/palloc.c} and the block allocator in
 408 @file{threads/malloc.c} both clear all the bytes in pages and blocks to
 409 @t{0xcc} when they are freed.  Thus, if you see an attempt to
 410 dereference a pointer like @t{0xcccccccc}, or some other reference to
 411 @t{0xcc}, there's a good chance you're trying to reuse a page that's
 412 already been freed.  Also, byte @t{0xcc} is the CPU opcode for ``invoke
 413 interrupt 3,'' so if you see an error like @code{Interrupt 0x03 (#BP
 414 Breakpoint Exception)}, Pintos tried to execute code in a freed page or
 415 block.
 416
 417 An assertion failure on the expression @code{sec_no < d->capacity}
 418 indicates that Pintos tried to access a file through an inode that has
 419 been closed and freed.  Freeing an inode clears its starting sector
 420 number to @t{0xcccccccc}, which is not a valid sector number for disks
 421 smaller than about 1.6 TB.