pintos-os.org Git - pintos-anon/blob - doc/mlfqs.texi

   1 @node Multilevel Feedback Scheduling, Coding Standards, References, Top
   2 @appendix Multilevel Feedback Scheduling
   3
   4 This section gives a brief overview of the behavior of the Solaris 2.6
   5 Time-Sharing (TS) scheduler, an example of a Multilevel Feedback Queue
   6 scheduler.  The information in this handout, in conjunction with that
   7 given in lecture, should be used to answer Problem 1-4.  The end of
   8 this document specifies in more detail which aspects of the Solaris
   9 scheduler that you should implement.
  10
  11 The goal of a multilevel feedback queue scheduler is to fairly and
  12 efficiently schedule a mix of processes with a variety of execution
  13 characteristics.  By controlling how a process moves between priority
  14 levels, processes with different characteristics can be scheduled as
  15 appropriate.  Priority-based schedulers attempt to provide a
  16 compromise between several desirable metrics (e.g.@: response time for
  17 interactive jobs, throughput for compute-intensive jobs, and fair
  18 allocations for all jobs).
  19
  20 The queues in the system are ranked according to priority.  Processes
  21 waiting in higher priority queues are always scheduled over those in
  22 lower priority queues.  Processes at the same priority are usually
  23 scheduled in a round-robin fashion.
  24
  25 Such schedulers tend to be preemptible in order to support interactive
  26 processes.  That is, a higher priority process is immediately
  27 scheduled if a lower priority process is running on the CPU.
  28
  29 @menu
  30 * Scheduling in Solaris::
  31 * Class Independent Functionality::
  32 * Time-Sharing Scheduling Class::
  33 * Dispatch Table::
  34 * Implementation::
  35 * Fairness::
  36 * Project Requirements::
  37 @end menu
  38
  39 @node Scheduling in Solaris
  40 @section Scheduling in Solaris
  41
  42 The Solaris operating system is based on Unix System V Release 4
  43 (SVR4).  Scheduling in Solaris, as in all SVR4-based schedulers, is
  44 performed at two levels: class-independent routines and
  45 class-dependent routines.  Class-independent routines are those that
  46 are responsible for dispatching and preempting processes (the
  47 low-level mechanisms).  Class-dependent routines are those that are
  48 responsible for setting the priority of each of its processes (the
  49 high-level policy).
  50
  51 By default, Solaris supports three scheduling classes: time-sharing
  52 (TS), real-time (RT), and system (SYS).  Users with root privileges
  53 can easily implement and add new scheduling classes by adhering to a
  54 predefined interface.  Each scheduling class gives each of its
  55 processes a priority, the range of which is shown below:
  56
  57 @multitable {Scheduling Class} {Priorities}
  58 @item Scheduling Class @tab Priorities
  59 @item Real-Time @tab 100-159
  60 @item System @tab 60-99
  61 @item Time-Sharing @tab 0-59
  62 @end multitable
  63
  64 As long as a user has the correct privileges, he or she can submit
  65 jobs to any scheduling class.    By default, jobs are executed in the same
  66 scheduling class as the parent process that forked the job.  Since
  67 your shell is running in the timesharing class, all of your jobs run
  68 by default in the time-sharing class.
  69
  70 See the man pages for @command{priocntl} on any machine running
  71 Solaris for information on how to submit jobs to different scheduling
  72 classes.  However, since you probably don't have root privileges on
  73 your machine, you won't be able to do much.
  74
  75 To see the scheduling class of each process in the system, run
  76 @samp{ps -edaflc}.  (@samp{-c} is the flag that shows the scheduling
  77 class.)  The fourth column shows the scheduling class of the running
  78 process.  Most jobs will be running in the TS class, with a few (owned
  79 by root) running in the SYS class.
  80
  81 @example
  82 elaine1:~> ps -edafc
  83      UID   PID PPID CLS PRI  STIME TTY TIME CMD
  84     root     0    0 SYS  96 Aug 01 ?   0:00 sched
  85     root     1    0  TS  58 Aug 01 ?   1:06 /etc/init -
  86     root     2    0 SYS  98 Aug 01 ?   0:02 pageout
  87     root     3    0 SYS  60 Aug 01 ?  15:22 fsflush
  88     root   245  239  TS  59 Aug 01 ?   0:00 ttymon
  89     root   181    1  TS  48 Aug 01 ?   0:00 sendmail -q15m
  90     root   239    1  TS  59 Aug 01 ?   0:00 sac -t 300
  91     root    96    1  TS  58 Aug 01 ?   0:00 rpcbind
  92     root   125    1  TS  59 Aug 01 ?   0:32 syslogd
  93 @end example
  94
  95 In this document, we only discuss the Solaris time-sharing (TS)
  96 class.  Note the priorities of each of the processes, as listed in the
  97 fifth column.
  98
  99 @node Class Independent Functionality
 100 @section Class Independent Functionality
 101
 102 The class independent routines arbitrate across the scheduling
 103 classes.  This involves three basic responsibilities.
 104
 105 @itemize @bullet
 106 @item
 107 The process with the highest priority must be dispatched, and the
 108 state of the preempted process saved.
 109
 110 @item
 111 The class independent functions must notifying the class-dependent
 112 routines when the state of its processes changes (for example, at
 113 creation and termination, when a process changes from blocked to
 114 runnable, or runnable to blocked, and when a 10ms timer expires).
 115
 116 @item
 117 Processes must be moved between priority queues in the class
 118 independent data structures, as directed by its scheduling class, and
 119 must be moved between blocked and ready queues.
 120 @end itemize
 121
 122 @node Time-Sharing Scheduling Class
 123 @section Time-Sharing Scheduling Class
 124
 125 The time-sharing scheduler in Solaris is an example of a multi-level
 126 feedback queue scheduler.  A job begins at priority 29.  Compute-bound
 127 jobs then filter down to the lower priorities, where they are
 128 scheduled less frequently, but for longer time-slices.  Interactive
 129 jobs propagate to the higher priorities, where they are scheduled
 130 whenever they have work to perform, on the assumption that they will
 131 soon relinquish the processor again.  In the TS scheduler, the
 132 priority of a process is lowered after it consumes its allocated
 133 time-slice.  Its priority is raised if it has not consumed its
 134 time-slice before a starvation interval expires.
 135
 136 @node Dispatch Table
 137 @section Dispatch Table
 138
 139 The durations of the time-slices, the changes in priorities, and the
 140 starvation interval are specified in a user-tunable dispatch table.
 141 The system administrator (or anyone with root privileges) can change
 142 the values in this table, thus configuring how the time-sharing
 143 scheduler manages its jobs.  While this has the noble intention of
 144 allowing different systems to tune the scheduler to better handle
 145 their workloads, in reality no one really knows how to configure these
 146 tables well.  Therefore, we will focus on the default dispatch table.
 147
 148 To see how this table is configured in your system, run
 149 @samp{dispadmin -c TS -g}.  You should see something like the table
 150 shown below.  Looking at the man pages on @command{dispadmin} and
 151 @command{ts_dptbl} may also be helpful.
 152
 153 @multitable {@code{ts_quantum}} {@code{ts_tqexp}} {@code{ts_slpret}} {@code{ts_maxwait}} {@code{ts_lwait}} {priority}
 154 @item @code{ts_quantum} @tab @code{ts_tqexp} @tab @code{ts_slpret} @tab @code{ts_maxwait} @tab @code{ts_lwait} @tab priority
 155 @item 200 @tab 0 @tab 50 @tab 0 @tab 50 @tab 0
 156 @item 200 @tab 0 @tab 50 @tab 0 @tab 50 @tab 1
 157 @item 200 @tab 0 @tab 50 @tab 0 @tab 50 @tab 2
 158 @item 200 @tab 0 @tab 50 @tab 0 @tab 50 @tab 3
 159 @item 200 @tab 0 @tab 50 @tab 0 @tab 50 @tab 4
 160 @item 200 @tab 0 @tab 50 @tab 0 @tab 50 @tab 5
 161 @item 200 @tab 0 @tab 50 @tab 0 @tab 50 @tab 6
 162 @item 200 @tab 0 @tab 50 @tab 0 @tab 50 @tab 7
 163 @item 200 @tab 0 @tab 50 @tab 0 @tab 50 @tab 8
 164 @item 200 @tab 0 @tab 50 @tab 0 @tab 50 @tab 9
 165 @item 160 @tab 0 @tab 51 @tab 0 @tab 51 @tab 10
 166 @item 160 @tab 1 @tab 51 @tab 0 @tab 51 @tab 11
 167 @item 160 @tab 2 @tab 51 @tab 0 @tab 51 @tab 12
 168 @item 160 @tab 3 @tab 51 @tab 0 @tab 51 @tab 13
 169 @item 160 @tab 4 @tab 51 @tab 0 @tab 51 @tab 14
 170 @item 160 @tab 5 @tab 51 @tab 0 @tab 51 @tab 15
 171 @item 160 @tab 6 @tab 51 @tab 0 @tab 51 @tab 16
 172 @item 160 @tab 7 @tab 51 @tab 0 @tab 51 @tab 17
 173 @item 160 @tab 8 @tab 51 @tab 0 @tab 51 @tab 18
 174 @item 160 @tab 9 @tab 51 @tab 0 @tab 51 @tab 19
 175 @item 120 @tab 10 @tab 52 @tab 0 @tab 52 @tab 20
 176 @item 120 @tab 11 @tab 52 @tab 0 @tab 52 @tab 21
 177 @item 120 @tab 12 @tab 52 @tab 0 @tab 52 @tab 22
 178 @item 120 @tab 13 @tab 52 @tab 0 @tab 52 @tab 23
 179 @item 120 @tab 14 @tab 52 @tab 0 @tab 52 @tab 24
 180 @item 120 @tab 15 @tab 52 @tab 0 @tab 52 @tab 25
 181 @item 120 @tab 16 @tab 52 @tab 0 @tab 52 @tab 26
 182 @item 120 @tab 17 @tab 52 @tab 0 @tab 52 @tab 27
 183 @item 120 @tab 18 @tab 52 @tab 0 @tab 52 @tab 28
 184 @item 120 @tab 19 @tab 52 @tab 0 @tab 52 @tab 29
 185 @item 80 @tab 20 @tab 53 @tab 0 @tab 53 @tab 30
 186 @item 80 @tab 21 @tab 53 @tab 0 @tab 53 @tab 31
 187 @item 80 @tab 22 @tab 53 @tab 0 @tab 53 @tab 32
 188 @item 80 @tab 23 @tab 53 @tab 0 @tab 53 @tab 33
 189 @item 80 @tab 24 @tab 53 @tab 0 @tab 53 @tab 34
 190 @item 80 @tab 25 @tab 54 @tab 0 @tab 54 @tab 35
 191 @item 80 @tab 26 @tab 54 @tab 0 @tab 54 @tab 36
 192 @item 80 @tab 27 @tab 54 @tab 0 @tab 54 @tab 37
 193 @item 80 @tab 28 @tab 54 @tab 0 @tab 54 @tab 38
 194 @item 80 @tab 29 @tab 54 @tab 0 @tab 54 @tab 39
 195 @item 40 @tab 30 @tab 55 @tab 0 @tab 55 @tab 40
 196 @item 40 @tab 31 @tab 55 @tab 0 @tab 55 @tab 41
 197 @item 40 @tab 32 @tab 55 @tab 0 @tab 55 @tab 42
 198 @item 40 @tab 33 @tab 55 @tab 0 @tab 55 @tab 43
 199 @item 40 @tab 34 @tab 55 @tab 0 @tab 55 @tab 44
 200 @item 40 @tab 35 @tab 56 @tab 0 @tab 56 @tab 45
 201 @item 40 @tab 36 @tab 57 @tab 0 @tab 57 @tab 46
 202 @item 40 @tab 37 @tab 58 @tab 0 @tab 58 @tab 47
 203 @item 40 @tab 38 @tab 58 @tab 0 @tab 58 @tab 48
 204 @item 40 @tab 39 @tab 58 @tab 0 @tab 59 @tab 49
 205 @item 40 @tab 40 @tab 58 @tab 0 @tab 59 @tab 50
 206 @item 40 @tab 41 @tab 58 @tab 0 @tab 59 @tab 51
 207 @item 40 @tab 42 @tab 58 @tab 0 @tab 59 @tab 52
 208 @item 40 @tab 43 @tab 58 @tab 0 @tab 59 @tab 53
 209 @item 40 @tab 44 @tab 58 @tab 0 @tab 59 @tab 54
 210 @item 40 @tab 45 @tab 58 @tab 0 @tab 59 @tab 55
 211 @item 40 @tab 46 @tab 58 @tab 0 @tab 59 @tab 56
 212 @item 40 @tab 47 @tab 58 @tab 0 @tab 59 @tab 57
 213 @item 40 @tab 48 @tab 58 @tab 0 @tab 59 @tab 58
 214 @item 20 @tab 49 @tab 59 @tab 32000 @tab 59 @tab 59
 215 @end multitable
 216
 217 You will see one row for every priority in the scheduling class, from
 218 0 to 59.  For each priority, there are five columns:
 219
 220 @table @code
 221 @item ts_quantum
 222 Length of the time-slice. In the actual table, this value is specified
 223 in 10@dmn{ms} clock ticks, but in the output from running
 224 @command{dispadmin}, the value is specified in units of 1@dmn{ms}.
 225
 226 @item ts_tqexp
 227 Priority level of the new queue on which to place a process if it
 228 exceeds its time quantum.  Normally this field links to a lower
 229 priority time-sharing level.
 230
 231 @item ts_slpret
 232 The new, generally increased, priority to adopt when the job returns
 233 from sleeping (i.e.@: from the blocked queue) if @code{ts_dispwait}
 234 exceeds @code{ts_maxwait}.
 235
 236 @item ts_maxwait
 237 Each time a process is placed back on the dispatcher queue after its
 238 time quantum expires or when it is awakened, but not when it is
 239 preempted by a higher-priority process, a per-process counter named
 240 @code{ts_dispwait} is zeroed.  This counter is incremented once per
 241 second.  If a process's @code{ts_dispwait} exceeds its priority's
 242 @code{ts_maxwait}, then the process's priority is changed to
 243 @code{ts_lwait}.  This prevents starvation.
 244
 245 @item ts_lwait
 246 The new, generally increased, priority to adopt if the starvation
 247 timer expires before the job consumes its time-slice (i.e.@: if
 248 @code{ts_dispwait} exceeds @code{ts_maxwait}).
 249 @end table
 250
 251 In this table, the priority of jobs ranges from a high of 59 down to
 252 0.  Time-slices begin at 20@dmn{ms} at the highest priority and
 253 gradually increase in duration up to 200@dmn{ms} at the lowest
 254 priorities.  Generally, the priority of a process decreases by 10
 255 levels after it consumes its time-slice.  The priority of a process is
 256 increased to 50 or above when the starvation timer expires.
 257
 258 @node Implementation
 259 @section Implementation
 260
 261 For each job in the TS class, the following data structure is
 262 maintained (we've removed a few of the fields for simplicity):
 263
 264 @example
 265 /*
 266  * time-sharing class specific thread structure
 267  */
 268 typedef struct tsproc @{
 269      long          ts_timeleft; /* time remaining in quantum */
 270      long          ts_dispwait; /* number of seconds since */
 271                                 /* start of quantum (not reset */
 272                                 /* upon preemption) */
 273      pri_t         ts_pri;      /* priority (0-59) */
 274      kthread_t     *ts_tp;      /* pointer to thread */
 275      struct tsproc *ts_next;    /* link to next tsproc on list */
 276      struct tsproc *ts_prev;    /* link to previous tsproc */
 277 @} tsproc_t;
 278 @end example
 279
 280 The @code{kthread_t} structure tracks the necessary information to
 281 context-switch to and from this process.  This structure is kept
 282 separate from the time-sharing class in order to separate the
 283 mechanisms of the dispatcher from the policies of the scheduler.
 284
 285 There are seven interesting routines in the TS class:
 286
 287 @table @code
 288 @item ts_enterclass(thread *@var{t})
 289 Called when a new thread is added to the TS class.  It initializes a
 290 @code{tsproc} structure for this process and adds it to the list of
 291 processes.
 292
 293 @item ts_exitclass(thread *@var{t})
 294 Called when the thread terminates or exits the class.  The
 295 @code{tsproc} structure is removed from the list of processes.
 296
 297 @item ts_tick(thread *@var{t})
 298 Called once every 10@dmn{ms} with a pointer to the currently running
 299 thread.  The @code{ts_timeleft} variable of the running thread is
 300 decremented by one.  If @code{ts_timeleft} reaches zero, then its new
 301 priority becomes its old priority's @code{ts_tqexp}, its timeslice is
 302 reset, and @code{ts_dispwait} is zeroed.  The thread is then added to
 303 the back of the appropriate priority queue and a new job is scheduled.
 304
 305 @item ts_update()
 306 Called once a second to check the starvation qualities of each job.
 307 The routine increments the @code{ts_dispwait} counter of every process
 308 in the class (even those that are blocked) by one.  If the job is on
 309 the ready queue (i.e.@: the job is neither running nor blocked) and
 310 its @code{ts_dispwait} exceeds @code{ts_maxwait}, then its priority
 311 and @code{ts_dispwait} (but not @code{ts_timeleft}) are reset.  This
 312 may involve rearranging the priority queues.
 313
 314 @item ts_sleep(thread *@var{t})
 315 Called when the thread blocks (e.g.@: due to I/O or synchronization).
 316 The TS routine does not need to do anything in these circumstance, but
 317 the dispatcher, or class-independent routines, must add the thread to
 318 the blocked queue and schedule a new thread.
 319
 320 @item ts_wakeup(thread *@var{t})
 321 Called when the blocked thread becomes ready.  If @code{ts_dispwait}
 322 for the process is greater than its priority's @code{ts_maxwait}, then
 323 its priority is set to @code{ts_slpret}, its timeslice
 324 (@code{ts_timeleft}) is reset, and @code{ts_dispwait} is zeroed.  If
 325 the priority of this job is higher than that of the running job, it
 326 preempts the currently running job.  Otherwise the newly awoken job is
 327 added to the back of its priority queue.
 328
 329 @item ts_preempt(thread *@var{t})
 330 Called when the thread is preempted by a higher priority thread.  The
 331 preempted thread is added to the front of its priority queue.
 332 @end table
 333
 334 @node Fairness
 335 @section Fairness
 336
 337 The Solaris time-sharing scheduler approximates fair allocations by
 338 decreasing the priority of a job the more that it is scheduled.
 339 Therefore, a job that is runnable relatively infrequently remains at a
 340 higher priority and is scheduled over lower priority jobs.  However,
 341 due to the configuration of the default dispatch table (in which the
 342 starvation interval is set to zero), you the priority of every process
 343 is raised once a second, regardless of whether or not it is actually
 344 starving.  Thus, the allocation history of each process is erased
 345 every second and compute-bound processes tend to acquire more than
 346 their fair share of the resources.
 347
 348 This behavior is illustrated in the graph below for three competing
 349 jobs that relinquish the processor at different rates while waiting
 350 for I/O to complete: acoarse job that rarely relinquishes the CPU, a
 351 medium job that does so more frequently, and afine job that often
 352 relinquishes the CPU.  The graph shows a typical snapshot over a five
 353 second execution interval.  As described, each second the priority of
 354 all three jobs is raised to level 50 or higher.  As a job executes and
 355 consumes its timeslice, its priority is lowered about ten levels Since
 356 the coarse job runs more frequently, it drops in priority at a faster
 357 rate than the other two jobs.
 358
 359 @ifnottex
 360 @image{mlfqs1}
 361 @end ifnottex
 362 @iftex
 363 @image{mlfqs1, 3in}
 364 @end iftex
 365
 366 The impact of this policy on the relative execution times of the three
 367 applications is shown in the next graph below.  Because the coarse
 368 application acquires more CPU time, it finishes its work earlier than
 369 the other applications, even though all three jobs require the same
 370 amount of time in a dedicated environment.
 371
 372 @ifnottex
 373 @image{mlfqs2}
 374 @end ifnottex
 375 @iftex
 376 @image{mlfqs2, 3in}
 377 @end iftex
 378
 379 @node Project Requirements
 380 @section Project Requirements
 381
 382 For your project, you need to implement code that is similar in
 383 functionality to the Solaris TS scheduler, but your code does not have
 384 to be structured in the same way.  Implement your code in whatever
 385 manner you find easiest when interfacing with the already existing
 386 Pintos code.
 387
 388 Specifically, you are not required to separate the class-independent
 389 and class-dependent scheduling functionality.  You do not need to
 390 support multiple scheduling classes.  You can implement any routines
 391 that you feel are necessary, not just the seven functions we
 392 specifically listed.  You can pass any parameters that you find
 393 helpful.
 394
 395 However, it is important that your approach be table-driven, with a
 396 user-configurable dispatch table.  Your table should be initialized to
 397 the default values in Solaris 2.6, but you may also want to experiment
 398 with different configurations.  To demonstrate the functionality of
 399 your scheduler, you may want to print out the change in priorities of
 400 several different competing processes, as in the first graph above.
 401