pintos-os.org Git - pintos-anon/blob - doc/debug.texi

   1 @node Debugging Tools, Development Tools, Project Documentation, Top
   2 @appendix Debugging Tools
   3
   4 Many tools lie at your disposal for debugging Pintos.  This appendix
   5 introduces you to a few of them.
   6
   7 @menu
   8 * printf::
   9 * ASSERT::
  10 * UNUSED NO_RETURN NO_INLINE PRINTF_FORMAT::
  11 * Backtraces::
  12 * i386-elf-gdb::
  13 * Debugging by Infinite Loop::
  14 * Modifying Bochs::
  15 * Debugging Tips::
  16 @end menu
  17
  18 @node printf
  19 @section @code{@code{printf()}}
  20
  21 Don't underestimate the value of @func{printf}.  The way
  22 @func{printf} is implemented in Pintos, you can call it from
  23 practically anywhere in the kernel, whether it's in a kernel thread or
  24 an interrupt handler, almost regardless of what locks are held.
  25
  26 @func{printf} isn't useful just because it can print data members.
  27 It can also help figure out when and where something goes wrong, even
  28 when the kernel crashes or panics without a useful error message.  The
  29 strategy is to sprinkle calls to @func{print} with different strings
  30 (e.g.@: @code{"1\n"}, @code{"2\n"}, @dots{}) throughout the pieces of
  31 code you suspect are failing.  If you don't even see @code{1} printed,
  32 then something bad happened before that point, if you see @code{1}
  33 but not @code{2}, then something bad happened between those two
  34 points, and so on.  Based on what you learn, you can then insert more
  35 @func{printf} calls in the new, smaller region of code you suspect.
  36 Eventually you can narrow the problem down to a single statement.
  37
  38 @node ASSERT
  39 @section @code{ASSERT}
  40
  41 Assertions are useful because they can catch problems early, before
  42 they'd otherwise be notices.  Pintos provides a macro for assertions
  43 named @code{ASSERT}, defined in @file{<debug.h>}, that you can use for
  44 this purpose.  Ideally, each function should begin with a set of
  45 assertions that check its arguments for validity.  (Initializers for
  46 functions' local variables are evaluated before assertions are
  47 checked, so be careful not to assume that an argument is valid in an
  48 initializer.)  You can also sprinkle assertions throughout the body of
  49 functions in places where you suspect things are likely to go wrong.
  50
  51 When an assertion proves untrue, the kernel panics.  The panic message
  52 should help you to find the problem.  See the description of
  53 backtraces below for more information.
  54
  55 @node UNUSED NO_RETURN NO_INLINE PRINTF_FORMAT
  56 @section UNUSED, NO_RETURN, NO_INLINE, and PRINTF_FORMAT
  57
  58 These macros defined in @file{<debug.h>} tell the compiler special
  59 attributes of a function or function parameter.  Their expansions are
  60 GCC-specific.
  61
  62 @defmac UNUSED
  63 Appended to a function parameter to tell the compiler that the
  64 parameter might not be used within the function.  It suppresses the
  65 warning that would otherwise appear.
  66 @end defmac
  67
  68 @defmac NO_RETURN
  69 Appended to a function prototype to tell the compiler that the
  70 function never returns.  It allows the compiler to fine-tune its
  71 warnings and its code generation.
  72 @end defmac
  73
  74 @defmac NO_INLINE
  75 Appended to a function prototype to tell the compiler to never emit
  76 the function in-line.  Occasionally useful to improve the quality of
  77 backtraces (see below).
  78 @end defmac
  79
  80 @defmac PRINTF_FORMAT (@var{format}, @var{first})
  81 Appended to a function prototype to tell the compiler that the
  82 function takes a @func{printf}-like format string as its
  83 @var{format}th argument and that the corresponding value arguments
  84 start at the @var{first}th argument.  This lets the compiler tell you
  85 if you pass the wrong argument types.
  86 @end defmac
  87
  88 @node Backtraces
  89 @section Backtraces
  90
  91 When the kernel panics, it prints a ``backtrace,'' that is, a summary
  92 of how your program got where it is, as a list of addresses inside the
  93 functions that were running at the time of the panic.  You can also
  94 insert a call to @func{debug_backtrace}, prototyped in
  95 @file{<debug.h>}, at any point in your code.
  96
  97 The addresses in a backtrace are listed as raw hexadecimal numbers,
  98 which are meaningless in themselves.  You can translate them into
  99 function names and source file line numbers using a tool called
 100 @command{i386-elf-addr2line}.@footnote{If you're using an 80@var{x}86
 101 system for development, it's probably just called
 102 @command{addr2line}.}
 103
 104 The output format of @command{i386-elf-addr2line} is not ideal, so
 105 we've supplied a wrapper for it simply called @command{backtrace}.
 106 Give it the name of your @file{kernel.o} as the first argument and the
 107 hexadecimal numbers composing the backtrace (including the @samp{0x}
 108 prefixes) as the remaining arguments.  It outputs the function name
 109 and source file line numbers that correspond to each address.
 110
 111 If the translated form of a backtrace is garbled, or doesn't make
 112 sense (e.g.@: function A is listed above function B, but B doesn't
 113 call A), then it's a good sign that you're corrupting a kernel
 114 thread's stack, because the backtrace is extracted from the stack.
 115 Alternatively, it could be that the @file{kernel.o} you passed to
 116 @command{backtrace} does not correspond to the kernel that produced
 117 the backtrace.
 118
 119 @menu
 120 * Backtrace Example::
 121 @end menu
 122
 123 @node Backtrace Example
 124 @subsection Example
 125
 126 Here's an example.  Suppose that Pintos printed out this following call
 127 stack, which is taken from an actual Pintos submission for the file
 128 system project:
 129
 130 @example
 131 Call stack: 0xc0106eff 0xc01102fb 0xc010dc22 0xc010cf67 0xc0102319
 132 0xc010325a 0x804812c 0x8048a96 0x8048ac8.
 133 @end example
 134
 135 You would then invoke the @command{backtrace} utility like shown below,
 136 cutting and pasting the backtrace information into the command line.
 137 This assumes that @file{kernel.o} is in the current directory.  You
 138 would of course enter all of the following on a single shell command
 139 line:
 140
 141 @example
 142 backtrace kernel.o 0xc0106eff 0xc01102fb 0xc010dc22 0xc010cf67 0xc0102319
 143 0xc010325a 0x804812c 0x8048a96 0x8048ac8
 144 @end example
 145
 146 The backtrace output would then look something like this:
 147
 148 @example
 149 0xc0106eff: debug_panic (../../lib/debug.c:86)
 150 0xc01102fb: file_seek (../../filesys/file.c:405)
 151 0xc010dc22: seek (../../userprog/syscall.c:744)
 152 0xc010cf67: syscall_handler (../../userprog/syscall.c:444)
 153 0xc0102319: intr_handler (../../threads/interrupt.c:334)
 154 0xc010325a: ?? (threads/intr-stubs.S:1554)
 155 0x804812c: ?? (??:0)
 156 0x8048a96: ?? (??:0)
 157 0x8048ac8: ?? (??:0)
 158 @end example
 159
 160 (You will probably not get the same results if you run the command above
 161 on your own kernel binary, because the source code you compiled from is
 162 different from the source code that emitted the panic message.)
 163
 164 The first line in the backtrace refers to @func{debug_panic}, the
 165 function that implements kernel panics.  Because backtraces commonly
 166 result from kernel panics, @func{debug_panic} will often be the first
 167 function shown in a backtrace.
 168
 169 The second line shows @func{file_seek} to be the function that panicked,
 170 in this case as the result of an assertion failure.  In the source code
 171 tree used for this example, line 405 of @file{filesys/file.c} is the
 172 assertion
 173
 174 @example
 175 ASSERT (file_ofs >= 0);
 176 @end example
 177
 178 @noindent
 179 Thus, @func{file_seek} panicked because it passed a negative file offset
 180 argument.
 181
 182 The third line indicates that @func{seek} called @func{file_seek},
 183 presumably without validating the offset argument.  In this submission,
 184 @func{seek} implements the @code{seek} system call.
 185
 186 The fourth line shows that @func{syscall_handler}, the system call
 187 handler, invoked @func{seek}.
 188
 189 The fifth and sixth lines are the interrupt handler entry path.
 190
 191 The remaining lines are for addresses below @code{PHYS_BASE}.  This
 192 means that they refer to addresses in the user program, not in the
 193 kernel.  If you know what user program was running when the kernel
 194 panicked, you can re-run @command{backtrace} on the user program, like
 195 so: (typing the command on a single line, of course):
 196
 197 @example
 198 backtrace grow-too-big 0xc0106eff 0xc01102fb 0xc010dc22 0xc010cf67
 199 0xc0102319 0xc010325a 0x804812c 0x8048a96 0x8048ac8
 200 @end example
 201
 202 The results look like this:
 203
 204 @example
 205 0xc0106eff: ?? (??:0)
 206 0xc01102fb: ?? (??:0)
 207 0xc010dc22: ?? (??:0)
 208 0xc010cf67: ?? (??:0)
 209 0xc0102319: ?? (??:0)
 210 0xc010325a: ?? (??:0)
 211 0x804812c: test_main (/home/blp/cs140/pintos/grading/filesys/grow-too-big.c:20)
 212 0x8048a96: main (/home/blp/cs140/pintos/grading/filesys/fsmain.c:10)
 213 0x8048ac8: _start (../../src/lib/user/entry.c:9)
 214 @end example
 215
 216 Here's an extra tip for anyone who read this far: @command{backtrace}
 217 is smart enough to strip the @code{Call stack:} header and @samp{.}
 218 trailer from the command line if you include them.  This can save you
 219 a little bit of trouble in cutting and pasting.  Thus, the following
 220 command prints the same output as the first one we used:
 221
 222 @example
 223 backtrace kernel.o Call stack: 0xc0106eff 0xc01102fb 0xc010dc22
 224 0xc010cf67 0xc0102319 0xc010325a 0x804812c 0x8048a96 0x8048ac8.
 225 @end example
 226
 227 @node i386-elf-gdb
 228 @section @command{i386-elf-gdb}
 229
 230 You can run the Pintos kernel under the supervision of the
 231 @command{i386-elf-gdb} debugger.@footnote{If you're using an
 232 80@var{x}86 system for development, it's probably just called
 233 @command{gdb}.}  There are two steps in the process.  First,
 234 start Pintos with the @option{--gdb} option, e.g.@: @command{pintos
 235 --gdb run}.  Second, in a second terminal, invoke @command{gdb} on
 236 @file{kernel.o}:
 237 @example
 238 i386-elf-gdb kernel.o
 239 @end example
 240 @noindent and issue the following @command{gdb} command:
 241 @example
 242 target remote localhost:1234
 243 @end example
 244
 245 At this point, @command{gdb} is connected to Bochs over a local
 246 network connection.  You can now issue any normal @command{gdb}
 247 commands.  If you issue the @samp{c} command, the Bochs BIOS will take
 248 control, load Pintos, and then Pintos will run in the usual way.  You
 249 can pause the process at any point with @key{Ctrl+C}.  If you want
 250 @command{gdb} to stop when Pintos starts running, set a breakpoint on
 251 @func{main} with the command @code{break main} before @samp{c}.
 252
 253 You can read the @command{gdb} manual by typing @code{info gdb} at a
 254 terminal command prompt, or you can view it in Emacs with the command
 255 @kbd{C-h i}.  Here's a few commonly useful @command{gdb} commands:
 256
 257 @table @code
 258 @item c
 259 Continue execution until the next breakpoint or until @key{Ctrl+C} is
 260 typed.
 261
 262 @item break @var{function}
 263 @itemx break @var{filename}:@var{linenum}
 264 @itemx break *@var{address}
 265 Sets a breakpoint at the given function, line number, or address.
 266 (Use a @samp{0x} prefix to specify an address in hex.)
 267
 268 @item p @var{expression}
 269 Evaluates the given C expression and prints its value.
 270 If the expression contains a function call, the function will actually
 271 be executed, so be careful.
 272
 273 @item l *@var{address}
 274 Lists a few lines of code around the given address.
 275 (Use a @samp{0x} prefix to specify an address in hex.)
 276
 277 @item bt
 278 Prints a stack backtrace similar to that output by the
 279 @command{backtrace} program described above.
 280
 281 @item p/a @var{address}
 282 Prints the name of the function or variable that occupies the given
 283 address.
 284 (Use a @samp{0x} prefix to specify an address in hex.)
 285 @end table
 286
 287 If you notice other strange behavior while using @command{gdb}, there
 288 are three possibilities.  The first is that there is a bug in your
 289 modified Pintos.  The second is that there is a bug in Bochs's
 290 interface to @command{gdb} or in @command{gdb} itself.  The third is
 291 that there is a bug in the original Pintos code.  The first and second
 292 are quite likely, and you should seriously consider both.  We hope
 293 that the third is less likely, but it is also possible.
 294
 295 @node Debugging by Infinite Loop
 296 @section Debugging by Infinite Loop
 297
 298 If you get yourself into a situation where the machine reboots in a
 299 loop, you've probably hit a ``triple fault.''  In such a situation you
 300 might not be able to use @func{printf} for debugging, because the
 301 reboots might be happening even before everything needed for
 302 @func{printf} is initialized.  In such a situation, you might want to
 303 try what I call ``debugging by infinite loop.''
 304
 305 What you do is pick a place in the Pintos code, insert the statement
 306 @code{for (;;);} there, and recompile and run.  There are two likely
 307 possibilities:
 308
 309 @itemize @bullet
 310 @item
 311 The machine hangs without rebooting.  If this happens, you know that
 312 the infinite loop is running.  That means that whatever caused the
 313 problem must be @emph{after} the place you inserted the infinite loop.
 314 Now move the infinite loop later in the code sequence.
 315
 316 @item
 317 The machine reboots in a loop.  If this happens, you know that the
 318 machine didn't make it to the infinite loop.  Thus, whatever caused the
 319 reboot must be @emph{before} the place you inserted the infinite loop.
 320 Now move the infinite loop earlier in the code sequence.
 321 @end itemize
 322
 323 If you move around the infinite loop in a ``binary search'' fashion, you
 324 can use this technique to pin down the exact spot that everything goes
 325 wrong.  It should only take a few minutes at most.
 326
 327 @node Modifying Bochs
 328 @section Modifying Bochs
 329
 330 An advanced debugging technique is to modify and recompile the
 331 simulator.  This proves useful when the simulated hardware has more
 332 information than it makes available to the OS.  For example, page
 333 faults have a long list of potential causes, but the hardware does not
 334 report to the OS exactly which one is the particular cause.
 335 Furthermore, a bug in the kernel's handling of page faults can easily
 336 lead to recursive faults, but a ``triple fault'' will cause the CPU to
 337 reset itself, which is hardly conducive to debugging.
 338
 339 In a case like this, you might appreciate being able to make Bochs
 340 print out more debug information, such as the exact type of fault that
 341 occurred.  It's not very hard.  You start by retrieving the source
 342 code for Bochs 2.1.1 from @uref{http://bochs.sourceforge.net} and
 343 extracting it into a directory.  Then read
 344 @file{pintos/src/misc/bochs-2.1.1.patch} and apply the patches needed.
 345 Then run @file{./configure}, supplying the options you want (some
 346 suggestions are in the patch file).  Finally, run @command{make}.
 347 This will compile Bochs and eventually produce a new binary
 348 @file{bochs}.  To use your @file{bochs} binary with @command{pintos},
 349 put it in your @env{PATH}, and make sure that it is earlier than
 350 @file{/usr/class/cs140/i386/bochs}.
 351
 352 Of course, to get any good out of this you'll have to actually modify
 353 Bochs.  Instructions for doing this are firmly out of the scope of
 354 this document.  However, if you want to debug page faults as suggested
 355 above, a good place to start adding @func{printf}s is
 356 @func{BX_CPU_C::dtranslate_linear} in @file{cpu/paging.cc}.
 357
 358 @node Debugging Tips
 359 @section Tips
 360
 361 The page allocator in @file{threads/palloc.c} and the block allocator in
 362 @file{threads/malloc.c} both clear all the bytes in pages and blocks to
 363 @t{0xcc} when they are freed.  Thus, if you see an attempt to
 364 dereference a pointer like @t{0xcccccccc}, or some other reference to
 365 @t{0xcc}, there's a good chance you're trying to reuse a page that's
 366 already been freed.  Also, byte @t{0xcc} is the CPU opcode for ``invoke
 367 interrupt 3,'' so if you see an error like @code{Interrupt 0x03 (#BP
 368 Breakpoint Exception)}, Pintos tried to execute code in a freed page or
 369 block.
 370
 371 An assertion failure on the expression @code{sec_no < d->capacity}
 372 indicates that Pintos tried to access a file through an inode that has
 373 been closed and freed.  Freeing an inode clears its starting sector
 374 number to @t{0xcccccccc}, which is not a valid sector number for disks
 375 smaller than about 1.6 TB.