pintos-os.org Git - pintos-anon/blob - doc/debug.texi

   1 @node Debugging Tools, Development Tools, Project Documentation, Top
   2 @appendix Debugging Tools
   3
   4 Many tools lie at your disposal for debugging Pintos.  This appendix
   5 introduces you to a few of them.
   6
   7 @menu
   8 * printf::
   9 * ASSERT::
  10 * DEBUG::
  11 * UNUSED NO_RETURN NO_INLINE PRINTF_FORMAT::
  12 * Backtraces::
  13 * i386-elf-gdb::
  14 * Debugging by Infinite Loop::
  15 * Modifying Bochs::
  16 * Debugging Tips::
  17 @end menu
  18
  19 @node printf
  20 @section @code{@code{printf()}}
  21
  22 Don't underestimate the value of @func{printf}.  The way
  23 @func{printf} is implemented in Pintos, you can call it from
  24 practically anywhere in the kernel, whether it's in a kernel thread or
  25 an interrupt handler, almost regardless of what locks are held.
  26
  27 @func{printf} isn't useful just because it can print data members.
  28 It can also help figure out when and where something goes wrong, even
  29 when the kernel crashes or panics without a useful error message.  The
  30 strategy is to sprinkle calls to @func{print} with different strings
  31 (e.g.@: @code{"1\n"}, @code{"2\n"}, @dots{}) throughout the pieces of
  32 code you suspect are failing.  If you don't even see @code{1} printed,
  33 then something bad happened before that point, if you see @code{1}
  34 but not @code{2}, then something bad happened between those two
  35 points, and so on.  Based on what you learn, you can then insert more
  36 @func{printf} calls in the new, smaller region of code you suspect.
  37 Eventually you can narrow the problem down to a single statement.
  38
  39 @node ASSERT
  40 @section @code{ASSERT}
  41
  42 Assertions are useful because they can catch problems early, before
  43 they'd otherwise be notices.  Pintos provides a macro for assertions
  44 named @code{ASSERT}, defined in @file{<debug.h>}, that you can use for
  45 this purpose.  Ideally, each function should begin with a set of
  46 assertions that check its arguments for validity.  (Initializers for
  47 functions' local variables are evaluated before assertions are
  48 checked, so be careful not to assume that an argument is valid in an
  49 initializer.)  You can also sprinkle assertions throughout the body of
  50 functions in places where you suspect things are likely to go wrong.
  51
  52 When an assertion proves untrue, the kernel panics.  The panic message
  53 should help you to find the problem.  See the description of
  54 backtraces below for more information.
  55
  56 @node DEBUG
  57 @section @code{DEBUG}
  58
  59 The @code{DEBUG} macro, also defined in @file{<debug.h>}, is a sort of
  60 conditional @func{printf}.  It takes as its arguments the name of a
  61 ``message class'' and a @func{printf}-like format string and
  62 arguments.  The message class is used to filter the messages that are
  63 actually displayed.  You select the messages to display on the Pintos
  64 command line using the @option{-d} option.  This allows you to easily
  65 turn different types of messages on and off while you debug, without
  66 the need to recompile.
  67
  68 For example, suppose you want to output thread debugging messages.  To
  69 use a class named @code{thread}, you could invoke @code{DEBUG} like
  70 this:
  71 @example
  72 DEBUG(thread, "thread id: %d\n", id);
  73 @end example
  74 @noindent
  75 and then to start Pintos with @code{thread} messages enable you'd use
  76 a command line like this:
  77 @example
  78 pintos run -d thread
  79 @end example
  80
  81 @node UNUSED NO_RETURN NO_INLINE PRINTF_FORMAT
  82 @section UNUSED, NO_RETURN, NO_INLINE, and PRINTF_FORMAT
  83
  84 These macros defined in @file{<debug.h>} tell the compiler special
  85 attributes of a function or function parameter.  Their expansions are
  86 GCC-specific.
  87
  88 @defmac UNUSED
  89 Appended to a function parameter to tell the compiler that the
  90 parameter might not be used within the function.  It suppresses the
  91 warning that would otherwise appear.
  92 @end defmac
  93
  94 @defmac NO_RETURN
  95 Appended to a function prototype to tell the compiler that the
  96 function never returns.  It allows the compiler to fine-tune its
  97 warnings and its code generation.
  98 @end defmac
  99
 100 @defmac NO_INLINE
 101 Appended to a function prototype to tell the compiler to never emit
 102 the function in-line.  Occasionally useful to improve the quality of
 103 backtraces (see below).
 104 @end defmac
 105
 106 @defmac PRINTF_FORMAT (@var{format}, @var{first})
 107 Appended to a function prototype to tell the compiler that the
 108 function takes a @func{printf}-like format string as its
 109 @var{format}th argument and that the corresponding value arguments
 110 start at the @var{first}th argument.  This lets the compiler tell you
 111 if you pass the wrong argument types.
 112 @end defmac
 113
 114 @node Backtraces
 115 @section Backtraces
 116
 117 When the kernel panics, it prints a ``backtrace,'' that is, a summary
 118 of how your program got where it is, as a list of addresses inside the
 119 functions that were running at the time of the panic.  You can also
 120 insert a call to @func{debug_backtrace}, prototyped in
 121 @file{<debug.h>}, at any point in your code.
 122
 123 The addresses in a backtrace are listed as raw hexadecimal numbers,
 124 which are meaningless in themselves.  You can translate them into
 125 function names and source file line numbers using a tool called
 126 @command{i386-elf-addr2line}.@footnote{If you're using an 80@var{x}86
 127 system for development, it's probably just called
 128 @command{addr2line}.}
 129
 130 The output format of @command{i386-elf-addr2line} is not ideal, so
 131 we've supplied a wrapper for it simply called @command{backtrace}.
 132 Give it the name of your @file{kernel.o} as the first argument and the
 133 hexadecimal numbers composing the backtrace (including the @samp{0x}
 134 prefixes) as the remaining arguments.  It outputs the function name
 135 and source file line numbers that correspond to each address.
 136
 137 If the translated form of a backtrace is garbled, or doesn't make
 138 sense (e.g.@: function A is listed above function B, but B doesn't
 139 call A), then it's a good sign that you're corrupting a kernel
 140 thread's stack, because the backtrace is extracted from the stack.
 141 Alternatively, it could be that the @file{kernel.o} you passed to
 142 @command{backtrace} does not correspond to the kernel that produced
 143 the backtrace.
 144
 145 @menu
 146 * Backtrace Example::
 147 @end menu
 148
 149 @node Backtrace Example
 150 @subsection Example
 151
 152 Here's an example.  Suppose that Pintos printed out this following call
 153 stack, which is taken from an actual Pintos submission for the file
 154 system project:
 155
 156 @example
 157 Call stack: 0xc0106eff 0xc01102fb 0xc010dc22 0xc010cf67 0xc0102319
 158 0xc010325a 0x804812c 0x8048a96 0x8048ac8.
 159 @end example
 160
 161 You would then invoke the @command{backtrace} utility like shown below,
 162 cutting and pasting the backtrace information into the command line.
 163 This assumes that @file{kernel.o} is in the current directory.  You
 164 would of course enter all of the following on a single shell command
 165 line:
 166
 167 @example
 168 backtrace kernel.o 0xc0106eff 0xc01102fb 0xc010dc22 0xc010cf67 0xc0102319
 169 0xc010325a 0x804812c 0x8048a96 0x8048ac8
 170 @end example
 171
 172 The backtrace output would then look something like this:
 173
 174 @example
 175 0xc0106eff: debug_panic (../../lib/debug.c:86)
 176 0xc01102fb: file_seek (../../filesys/file.c:405)
 177 0xc010dc22: seek (../../userprog/syscall.c:744)
 178 0xc010cf67: syscall_handler (../../userprog/syscall.c:444)
 179 0xc0102319: intr_handler (../../threads/interrupt.c:334)
 180 0xc010325a: ?? (threads/intr-stubs.S:1554)
 181 0x804812c: ?? (??:0)
 182 0x8048a96: ?? (??:0)
 183 0x8048ac8: ?? (??:0)
 184 @end example
 185
 186 (You will probably not get the same results if you run the command above
 187 on your own kernel binary, because the source code you compiled from is
 188 different from the source code that emitted the panic message.)
 189
 190 The first line in the backtrace refers to @func{debug_panic}, the
 191 function that implements kernel panics.  Because backtraces commonly
 192 result from kernel panics, @func{debug_panic} will often be the first
 193 function shown in a backtrace.
 194
 195 The second line shows @func{file_seek} to be the function that panicked,
 196 in this case as the result of an assertion failure.  In the source code
 197 tree used for this example, line 405 of @file{filesys/file.c} is the
 198 assertion
 199
 200 @example
 201 ASSERT (file_ofs >= )0;
 202 @end example
 203
 204 @noindent
 205 Thus, @func{file_seek} panicked because it passed a negative file offset
 206 argument.
 207
 208 The third line indicates that @func{seek} called @func{file_seek},
 209 presumably without validating the offset argument.  In this submission,
 210 @func{seek} implements the @code{seek} system call.
 211
 212 The fourth line shows that @func{syscall_handler}, the system call
 213 handler, invoked @func{seek}.
 214
 215 The fifth and sixth lines are the interrupt handler entry path.
 216
 217 The remaining lines are for addresses below @code{PHYS_BASE}.  This
 218 means that they refer to addresses in the user program, not in the
 219 kernel.  If you know what user program was running when the kernel
 220 panicked, you can re-run @command{backtrace} on the user program.  You
 221 need to have compiled the user program with debug symbols enabled for
 222 this to be useful.  Instructions for doing so are included in
 223 @file{pintos/src/Makefile.userprog}.
 224
 225 In this case, we rerun @command{backtrace} like so (typing the command
 226 on a single line, of course):
 227
 228 @example
 229 ~/cs140/pintos/src/utils/backtrace grow-too-big 0xc0106eff 0xc01102fb
 230 0xc010dc22 0xc010cf67 0xc0102319 0xc010325a 0x804812c 0x8048a96
 231 0x8048ac8
 232 @end example
 233
 234 The results look like this:
 235
 236 @example
 237 0xc0106eff: ?? (??:0)
 238 0xc01102fb: ?? (??:0)
 239 0xc010dc22: ?? (??:0)
 240 0xc010cf67: ?? (??:0)
 241 0xc0102319: ?? (??:0)
 242 0xc010325a: ?? (??:0)
 243 0x804812c: test_main (/home/blp/cs140/pintos/grading/filesys/grow-too-big.c:20)
 244 0x8048a96: main (/home/blp/cs140/pintos/grading/filesys/fsmain.c:10)
 245 0x8048ac8: _start (../../src/lib/user/entry.c:9)
 246 @end example
 247
 248 @node i386-elf-gdb
 249 @section @command{i386-elf-gdb}
 250
 251 You can run the Pintos kernel under the supervision of the
 252 @command{i386-elf-gdb} debugger.@footnote{If you're using an
 253 80@var{x}86 system for development, it's probably just called
 254 @command{addr2line}.}  There are two steps in the process.  First,
 255 start Pintos with the @option{--gdb} option, e.g.@: @command{pintos
 256 --gdb run}.  Second, in a second terminal, invoke @command{gdb} on
 257 @file{kernel.o}:
 258 @example
 259 i386-elf-gdb kernel.o
 260 @end example
 261 @noindent and issue the following @command{gdb} command:
 262 @example
 263 target remote localhost:1234
 264 @end example
 265
 266 At this point, @command{gdb} is connected to Bochs over a local
 267 network connection.  You can now issue any normal @command{gdb}
 268 commands.  If you issue the @samp{c} command, the Bochs BIOS will take
 269 control, load Pintos, and then Pintos will run in the usual way.  You
 270 can pause the process at any point with @key{Ctrl+C}.  If you want
 271 @command{gdb} to stop when Pintos starts running, set a breakpoint on
 272 @func{main} with the command @code{break main} before @samp{c}.
 273
 274 You can read the @command{gdb} manual by typing @code{info gdb} at a
 275 terminal command prompt, or you can view it in Emacs with the command
 276 @kbd{C-h i}.  Here's a few commonly useful @command{gdb} commands:
 277
 278 @table @code
 279 @item c
 280 Continue execution until the next breakpoint or until @key{Ctrl+C} is
 281 typed.
 282
 283 @item break @var{function}
 284 @itemx break @var{filename}:@var{linenum}
 285 @itemx break *@var{address}
 286 Sets a breakpoint at the given function, line number, or address.
 287 (Use a @samp{0x} prefix to specify an address in hex.)
 288
 289 @item p @var{expression}
 290 Evaluates the given C expression and prints its value.
 291 If the expression contains a function call, the function will actually
 292 be executed, so be careful.
 293
 294 @item l *@var{address}
 295 Lists a few lines of code around the given address.
 296 (Use a @samp{0x} prefix to specify an address in hex.)
 297
 298 @item bt
 299 Prints a stack backtrace similar to that output by the
 300 @command{backtrace} program described above.
 301
 302 @item p/a @var{address}
 303 Prints the name of the function or variable that occupies the given
 304 address.
 305 (Use a @samp{0x} prefix to specify an address in hex.)
 306 @end table
 307
 308 You might notice that @command{gdb} tends to show code being executed
 309 in an order different from the order in the source.  That is, the
 310 current statement jumps around seemingly randomly.  This is due to
 311 GCC's optimizer, which does tend to reorder code.  If it bothers you,
 312 you can turn off optimization by editing
 313 @file{pintos/src/Make.config}, removing @option{-O3} from the
 314 @code{CFLAGS} definition.
 315
 316 If you notice other strange behavior while using @command{gdb}, there
 317 are three possibilities.  The first is that there is a bug in your
 318 modified Pintos.  The second is that there is a bug in Bochs's
 319 interface to @command{gdb} or in @command{gdb} itself.  The third is
 320 that there is a bug in the original Pintos code.  The first and second
 321 are quite likely, and you should seriously consider both.  We hope
 322 that the third is less likely, but it is also possible.
 323
 324 @node Debugging by Infinite Loop
 325 @section Debugging by Infinite Loop
 326
 327 If you get yourself into a situation where the machine reboots in a
 328 loop, you've probably hit a ``triple fault.''  In such a situation you
 329 might not be able to use @func{printf} for debugging, because the
 330 reboots might be happening even before everything needed for
 331 @func{printf} is initialized.  In such a situation, you might want to
 332 try what I call ``debugging by infinite loop.''
 333
 334 What you do is pick a place in the Pintos code, insert the statement
 335 @code{for (;;);} there, and recompile and run.  There are two likely
 336 possibilities:
 337
 338 @itemize @bullet
 339 @item
 340 The machine hangs without rebooting.  If this happens, you know that
 341 the infinite loop is running.  That means that whatever caused the
 342 problem must be @emph{after} the place you inserted the infinite loop.
 343 Now move the infinite loop later in the code sequence.
 344
 345 @item
 346 The machine reboots in a loop.  If this happens, you know that the
 347 machine didn't make it to the infinite loop.  Thus, whatever caused the
 348 reboot must be @emph{before} the place you inserted the infinite loop.
 349 Now move the infinite loop earlier in the code sequence.
 350 @end itemize
 351
 352 If you move around the infinite loop in a ``binary search'' fashion, you
 353 can use this technique to pin down the exact spot that everything goes
 354 wrong.  It should only take a few minutes at most.
 355
 356 @node Modifying Bochs
 357 @section Modifying Bochs
 358
 359 An advanced debugging technique is to modify and recompile the
 360 simulator.  This proves useful when the simulated hardware has more
 361 information than it makes available to the OS.  For example, page
 362 faults have a long list of potential causes, but the hardware does not
 363 report to the OS exactly which one is the particular cause.
 364 Furthermore, a bug in the kernel's handling of page faults can easily
 365 lead to recursive faults, but a ``triple fault'' will cause the CPU to
 366 reset itself, which is hardly conducive to debugging.
 367
 368 In a case like this, you might appreciate being able to make Bochs
 369 print out more debug information, such as the exact type of fault that
 370 occurred.  It's not very hard.  You start by retrieving the source
 371 code for Bochs 2.1.1 from @uref{http://bochs.sourceforge.net} and
 372 extracting it into a directory.  Then read
 373 @file{pintos/src/misc/bochs-2.1.1.patch} and apply the patches needed.
 374 Then run @file{./configure}, supplying the options you want (some
 375 suggestions are in the patch file).  Finally, run @command{make}.
 376 This will compile Bochs and eventually produce a new binary
 377 @file{bochs}.  To use your @file{bochs} binary with @command{pintos},
 378 put it in your @env{PATH}, and make sure that it is earlier than
 379 @file{/usr/class/cs140/i386/bochs}.
 380
 381 Of course, to get any good out of this you'll have to actually modify
 382 Bochs.  Instructions for doing this are firmly out of the scope of
 383 this document.  However, if you want to debug page faults as suggested
 384 above, a good place to start adding @func{printf}s is
 385 @func{BX_CPU_C::dtranslate_linear} in @file{cpu/paging.cc}.
 386
 387 @node Debugging Tips
 388 @section Tips
 389
 390 The page allocator in @file{threads/palloc.c} clears all the bytes in
 391 pages to @t{0xcc} when they are freed.  Thus, if you see an attempt to
 392 dereference a pointer like @t{0xcccccccc}, or some other reference to
 393 @t{0xcc}, there's a good chance you're trying to reuse a page that's
 394 already been freed.  Also, byte @t{0xcc} is the CPU opcode for
 395 ``invoke interrupt 3,'' so if you see an error like @code{Interrupt
 396 0x03 (#BP Breakpoint Exception)}, Pintos tried to execute code in a
 397 freed page.
 398
 399 Similarly, the block allocator in @file{threads/malloc.c} clears all
 400 the bytes in freed blocks to @t{0xcd}.  The two bytes @t{0xcdcd} are
 401 a CPU opcode for ``invoke interrupt @t{0xcd},'' so @code{Interrupt
 402 0xcd (unknown)} is a good sign that you tried to execute code in a
 403 block freed with @func{free}.