1 @node Multilevel Feedback Scheduling, Coding Standards, References, Top
2 @appendix Multilevel Feedback Scheduling
4 This section gives a brief overview of the behavior of the Solaris 2.6
5 Time-Sharing (TS) scheduler, an example of a Multilevel Feedback Queue
6 scheduler. The information in this handout, in conjunction with that
7 given in lecture, should be used to answer Problem 1-3. The end of
8 this document specifies in more detail which aspects of the Solaris
9 scheduler that you should implement.
11 The goal of a multilevel feedback queue scheduler is to fairly and
12 efficiently schedule a mix of processes with a variety of execution
13 characteristics. By controlling how a process moves between priority
14 levels, processes with different characteristics can be scheduled as
15 appropriate. Priority-based schedulers attempt to provide a
16 compromise between several desirable metrics (e.g.@: response time for
17 interactive jobs, throughput for compute-intensive jobs, and fair
18 allocations for all jobs).
20 The queues in the system are ranked according to priority. Processes
21 waiting in higher priority queues are always scheduled over those in
22 lower priority queues. Processes at the same priority are usually
23 scheduled in a round-robin fashion.
25 Such schedulers tend to be preemptible to support interactive
26 processes. That is, a higher priority process is immediately
27 scheduled if a lower priority process is running on the CPU.
30 * Scheduling in Solaris::
31 * Class Independent Functionality::
32 * Time-Sharing Scheduling Class::
36 * Project Requirements::
39 @node Scheduling in Solaris
40 @section Scheduling in Solaris
42 The Solaris operating system is based on Unix System V Release 4
43 (SVR4). Scheduling in Solaris, as in all SVR4-based schedulers, is
44 performed at two levels: class-independent routines and
45 class-dependent routines. Class-independent routines are those that
46 are responsible for dispatching and preempting processes (the
47 low-level mechanisms). Class-dependent routines are those that are
48 responsible for setting the priority of each of its processes (the
51 By default, Solaris supports three scheduling classes: time-sharing
52 (TS), real-time (RT), and system (SYS). Users with root privileges
53 can easily implement and add new scheduling classes by adhering to a
54 predefined interface. Each scheduling class gives each of its
55 processes a priority, the range of which is shown below:
57 @multitable {Scheduling Class} {Priorities}
58 @item Scheduling Class @tab Priorities
59 @item Real-Time @tab 100-159
60 @item System @tab 60-99
61 @item Time-Sharing @tab 0-59
64 As long as a user has the correct privileges, he or she can submit
65 jobs to any scheduling class. By default, jobs are executed in the same
66 scheduling class as the parent process that forked the job. Since
67 your shell is running in the timesharing class, all of your jobs run
68 by default in the time-sharing class.
70 See the man pages for @command{priocntl} on any machine running
71 Solaris for information on how to submit jobs to different scheduling
72 classes. However, since you probably don't have root privileges on
73 your machine, you won't be able to do much.
75 To see the scheduling class of each process in the system, run
76 @samp{ps -edaflc}. (@samp{-c} is the flag that shows the scheduling
77 class.) The fourth column shows the scheduling class of the running
78 process. Most jobs will be running in the TS class, with a few (owned
79 by root) running in the SYS class.
83 UID PID PPID CLS PRI STIME TTY TIME CMD
84 root 0 0 SYS 96 Aug 01 ? 0:00 sched
85 root 1 0 TS 58 Aug 01 ? 1:06 /etc/init -
86 root 2 0 SYS 98 Aug 01 ? 0:02 pageout
87 root 3 0 SYS 60 Aug 01 ? 15:22 fsflush
88 root 245 239 TS 59 Aug 01 ? 0:00 ttymon
89 root 181 1 TS 48 Aug 01 ? 0:00 sendmail -q15m
90 root 239 1 TS 59 Aug 01 ? 0:00 sac -t 300
91 root 96 1 TS 58 Aug 01 ? 0:00 rpcbind
92 root 125 1 TS 59 Aug 01 ? 0:32 syslogd
95 In this document, we only discuss the Solaris time-sharing (TS)
96 class. Note the priorities of each of the processes, as listed in the
99 @node Class Independent Functionality
100 @section Class Independent Functionality
102 The class independent routines arbitrate across the scheduling
103 classes. This involves three basic responsibilities.
107 The process with the highest priority must be dispatched, and the
108 state of the preempted process saved.
111 The class independent functions must notifying the class-dependent
112 routines when the state of its processes changes (for example, at
113 creation and termination, when a process changes from blocked to
114 runnable, or runnable to blocked, and when a 10ms timer expires).
117 Processes must be moved between priority queues in the class
118 independent data structures, as directed by its scheduling class, and
119 must be moved between blocked and ready queues.
122 @node Time-Sharing Scheduling Class
123 @section Time-Sharing Scheduling Class
125 The time-sharing scheduler in Solaris is an example of a multi-level
126 feedback queue scheduler. A job begins at priority 29. Compute-bound
127 jobs then filter down to the lower priorities, where they are
128 scheduled less frequently, but for longer time-slices. Interactive
129 jobs propagate to the higher priorities, where they are scheduled
130 whenever they have work to perform, on the assumption that they will
131 soon relinquish the processor again. In the TS scheduler, the
132 priority of a process is lowered after it consumes its allocated
133 time-slice. Its priority is raised if it has not consumed its
134 time-slice before a starvation interval expires.
137 @section Dispatch Table
139 The durations of the time-slices, the changes in priorities, and the
140 starvation interval are specified in a user-tunable dispatch table.
141 The system administrator (or anyone with root privileges) can change
142 the values in this table, thus configuring how the time-sharing
143 scheduler manages its jobs. While this has the noble intention of
144 allowing different systems to tune the scheduler to better handle
145 their workloads, in reality no one really knows how to configure these
146 tables well. Therefore, we will focus on the default dispatch table.
148 To see how this table is configured in your system, run
149 @samp{dispadmin -c TS -g}. You should see something like the table
150 shown below. Looking at the man pages on @command{dispadmin} and
151 @command{ts_dptbl} may also be helpful.
153 @multitable {@code{ts_quantum}} {@code{ts_tqexp}} {@code{ts_slpret}} {@code{ts_maxwait}} {@code{ts_lwait}} {priority}
154 @item @code{ts_quantum} @tab @code{ts_tqexp} @tab @code{ts_slpret} @tab @code{ts_maxwait} @tab @code{ts_lwait} @tab priority
155 @item 200 @tab 0 @tab 50 @tab 0 @tab 50 @tab 0
156 @item 200 @tab 0 @tab 50 @tab 0 @tab 50 @tab 1
157 @item 200 @tab 0 @tab 50 @tab 0 @tab 50 @tab 2
158 @item 200 @tab 0 @tab 50 @tab 0 @tab 50 @tab 3
159 @item 200 @tab 0 @tab 50 @tab 0 @tab 50 @tab 4
160 @item 200 @tab 0 @tab 50 @tab 0 @tab 50 @tab 5
161 @item 200 @tab 0 @tab 50 @tab 0 @tab 50 @tab 6
162 @item 200 @tab 0 @tab 50 @tab 0 @tab 50 @tab 7
163 @item 200 @tab 0 @tab 50 @tab 0 @tab 50 @tab 8
164 @item 200 @tab 0 @tab 50 @tab 0 @tab 50 @tab 9
165 @item 160 @tab 0 @tab 51 @tab 0 @tab 51 @tab 10
166 @item 160 @tab 1 @tab 51 @tab 0 @tab 51 @tab 11
167 @item 160 @tab 2 @tab 51 @tab 0 @tab 51 @tab 12
168 @item 160 @tab 3 @tab 51 @tab 0 @tab 51 @tab 13
169 @item 160 @tab 4 @tab 51 @tab 0 @tab 51 @tab 14
170 @item 160 @tab 5 @tab 51 @tab 0 @tab 51 @tab 15
171 @item 160 @tab 6 @tab 51 @tab 0 @tab 51 @tab 16
172 @item 160 @tab 7 @tab 51 @tab 0 @tab 51 @tab 17
173 @item 160 @tab 8 @tab 51 @tab 0 @tab 51 @tab 18
174 @item 160 @tab 9 @tab 51 @tab 0 @tab 51 @tab 19
175 @item 120 @tab 10 @tab 52 @tab 0 @tab 52 @tab 20
176 @item 120 @tab 11 @tab 52 @tab 0 @tab 52 @tab 21
177 @item 120 @tab 12 @tab 52 @tab 0 @tab 52 @tab 22
178 @item 120 @tab 13 @tab 52 @tab 0 @tab 52 @tab 23
179 @item 120 @tab 14 @tab 52 @tab 0 @tab 52 @tab 24
180 @item 120 @tab 15 @tab 52 @tab 0 @tab 52 @tab 25
181 @item 120 @tab 16 @tab 52 @tab 0 @tab 52 @tab 26
182 @item 120 @tab 17 @tab 52 @tab 0 @tab 52 @tab 27
183 @item 120 @tab 18 @tab 52 @tab 0 @tab 52 @tab 28
184 @item 120 @tab 19 @tab 52 @tab 0 @tab 52 @tab 29
185 @item 80 @tab 20 @tab 53 @tab 0 @tab 53 @tab 30
186 @item 80 @tab 21 @tab 53 @tab 0 @tab 53 @tab 31
187 @item 80 @tab 22 @tab 53 @tab 0 @tab 53 @tab 32
188 @item 80 @tab 23 @tab 53 @tab 0 @tab 53 @tab 33
189 @item 80 @tab 24 @tab 53 @tab 0 @tab 53 @tab 34
190 @item 80 @tab 25 @tab 54 @tab 0 @tab 54 @tab 35
191 @item 80 @tab 26 @tab 54 @tab 0 @tab 54 @tab 36
192 @item 80 @tab 27 @tab 54 @tab 0 @tab 54 @tab 37
193 @item 80 @tab 28 @tab 54 @tab 0 @tab 54 @tab 38
194 @item 80 @tab 29 @tab 54 @tab 0 @tab 54 @tab 39
195 @item 40 @tab 30 @tab 55 @tab 0 @tab 55 @tab 40
196 @item 40 @tab 31 @tab 55 @tab 0 @tab 55 @tab 41
197 @item 40 @tab 32 @tab 55 @tab 0 @tab 55 @tab 42
198 @item 40 @tab 33 @tab 55 @tab 0 @tab 55 @tab 43
199 @item 40 @tab 34 @tab 55 @tab 0 @tab 55 @tab 44
200 @item 40 @tab 35 @tab 56 @tab 0 @tab 56 @tab 45
201 @item 40 @tab 36 @tab 57 @tab 0 @tab 57 @tab 46
202 @item 40 @tab 37 @tab 58 @tab 0 @tab 58 @tab 47
203 @item 40 @tab 38 @tab 58 @tab 0 @tab 58 @tab 48
204 @item 40 @tab 39 @tab 58 @tab 0 @tab 59 @tab 49
205 @item 40 @tab 40 @tab 58 @tab 0 @tab 59 @tab 50
206 @item 40 @tab 41 @tab 58 @tab 0 @tab 59 @tab 51
207 @item 40 @tab 42 @tab 58 @tab 0 @tab 59 @tab 52
208 @item 40 @tab 43 @tab 58 @tab 0 @tab 59 @tab 53
209 @item 40 @tab 44 @tab 58 @tab 0 @tab 59 @tab 54
210 @item 40 @tab 45 @tab 58 @tab 0 @tab 59 @tab 55
211 @item 40 @tab 46 @tab 58 @tab 0 @tab 59 @tab 56
212 @item 40 @tab 47 @tab 58 @tab 0 @tab 59 @tab 57
213 @item 40 @tab 48 @tab 58 @tab 0 @tab 59 @tab 58
214 @item 20 @tab 49 @tab 59 @tab 32000 @tab 59 @tab 59
217 You will see one row for every priority in the scheduling class, from
218 0 to 59. For each priority, there are five columns:
222 Length of the time-slice. In the actual table, this value is specified
223 in 10@dmn{ms} clock ticks, but in the output from running
224 @command{dispadmin}, the value is specified in units of 1@dmn{ms}.
227 Priority level of the new queue on which to place a process if it
228 exceeds its time quantum. Normally this field links to a lower
229 priority time-sharing level.
232 The new, generally increased, priority to adopt when the job returns
233 from sleeping (i.e.@: from the blocked queue) if @code{ts_dispwait}
234 exceeds @code{ts_maxwait}.
237 Each time a process is placed back on the dispatcher queue after its
238 time quantum expires or when it is awakened, but not when it is
239 preempted by a higher-priority process, a per-process counter named
240 @code{ts_dispwait} is zeroed. This counter is incremented once per
241 second. If a process's @code{ts_dispwait} exceeds its priority's
242 @code{ts_maxwait}, then the process's priority is changed to
243 @code{ts_lwait}. This prevents starvation.
246 The new, generally increased, priority to adopt if the starvation
247 timer expires before the job consumes its time-slice (i.e.@: if
248 @code{ts_dispwait} exceeds @code{ts_maxwait}).
251 In this table, the priority of jobs ranges from a high of 59 down to
252 0. Time-slices begin at 20@dmn{ms} at the highest priority and
253 gradually increase in duration up to 200@dmn{ms} at the lowest
254 priorities. Generally, the priority of a process decreases by 10
255 levels after it consumes its time-slice. The priority of a process is
256 increased to 50 or above when the starvation timer expires.
259 @section Implementation
261 For each job in the TS class, the following data structure is
262 maintained (we've removed a few of the fields for simplicity):
266 * time-sharing class specific thread structure
268 typedef struct tsproc @{
269 long ts_timeleft; /* time remaining in quantum */
270 long ts_dispwait; /* number of seconds since */
271 /* start of quantum (not reset */
272 /* upon preemption) */
273 pri_t ts_pri; /* priority (0-59) */
274 kthread_t *ts_tp; /* pointer to thread */
275 struct tsproc *ts_next; /* link to next tsproc on list */
276 struct tsproc *ts_prev; /* link to previous tsproc */
280 The @code{kthread_t} structure tracks the necessary information to
281 context-switch to and from this process. This structure is kept
282 separate from the time-sharing class to separate the
283 mechanisms of the dispatcher from the policies of the scheduler.
285 There are seven interesting routines in the TS class:
288 @item ts_enterclass(thread *@var{t})
289 Called when a new thread is added to the TS class. It initializes a
290 @code{tsproc} structure for this process and adds it to the list of
293 @item ts_exitclass(thread *@var{t})
294 Called when the thread terminates or exits the class. The
295 @code{tsproc} structure is removed from the list of processes.
297 @item ts_tick(thread *@var{t})
298 Called once every 10@dmn{ms} with a pointer to the currently running
299 thread. The @code{ts_timeleft} variable of the running thread is
300 decremented by one. If @code{ts_timeleft} reaches zero, then its new
301 priority becomes its old priority's @code{ts_tqexp}, its timeslice is
302 reset, and @code{ts_dispwait} is zeroed. The thread is then added to
303 the back of the appropriate priority queue and a new job is scheduled.
306 Called once a second to check the starvation qualities of each job.
307 The routine increments the @code{ts_dispwait} counter of every process
308 in the class (even those that are blocked) by one. If the job is on
309 the ready queue (i.e.@: the job is neither running nor blocked) and
310 its @code{ts_dispwait} exceeds @code{ts_maxwait}, then its priority
311 and @code{ts_dispwait} (but not @code{ts_timeleft}) are reset. This
312 may involve rearranging the priority queues.
314 @item ts_sleep(thread *@var{t})
315 Called when the thread blocks (e.g.@: due to I/O or synchronization).
316 The TS routine does not need to do anything in these circumstance, but
317 the dispatcher, or class-independent routines, must add the thread to
318 the blocked queue and schedule a new thread.
320 @item ts_wakeup(thread *@var{t})
321 Called when the blocked thread becomes ready. If @code{ts_dispwait}
322 for the process is greater than its priority's @code{ts_maxwait}, then
323 its priority is set to @code{ts_slpret}, its timeslice
324 (@code{ts_timeleft}) is reset, and @code{ts_dispwait} is zeroed. If
325 the priority of this job is higher than that of the running job, it
326 preempts the currently running job. Otherwise the newly awoken job is
327 added to the back of its priority queue.
329 @item ts_preempt(thread *@var{t})
330 Called when the thread is preempted by a higher priority thread. The
331 preempted thread is added to the front of its priority queue.
337 The Solaris time-sharing scheduler approximates fair allocations by
338 decreasing the priority of a job the more that it is scheduled.
339 Therefore, a job that is runnable relatively infrequently remains at a
340 higher priority and is scheduled over lower priority jobs. However,
341 due to the configuration of the default dispatch table (in which the
342 starvation interval is set to zero), you the priority of every process
343 is raised once a second, regardless of whether or not it is actually
344 starving. Thus, the allocation history of each process is erased
345 every second and compute-bound processes tend to acquire more than
346 their fair share of the resources.
348 This behavior is illustrated in the graph below for three competing
349 jobs that relinquish the processor at different rates while waiting
350 for I/O to complete: acoarse job that rarely relinquishes the CPU, a
351 medium job that does so more frequently, and afine job that often
352 relinquishes the CPU. The graph shows a typical snapshot over a five
353 second execution interval. As described, each second the priority of
354 all three jobs is raised to level 50 or higher. As a job executes and
355 consumes its timeslice, its priority is lowered about ten levels Since
356 the coarse job runs more frequently, it drops in priority at a faster
357 rate than the other two jobs.
366 The impact of this policy on the relative execution times of the three
367 applications is shown in the next graph below. Because the coarse
368 application acquires more CPU time, it finishes its work earlier than
369 the other applications, even though all three jobs require the same
370 amount of time in a dedicated environment.
379 @node Project Requirements
380 @section Project Requirements
382 For your project, you need to implement code that is similar in
383 functionality to the Solaris TS scheduler, but your code does not have
384 to be structured in the same way. Implement your code in whatever
385 manner you find easiest when interfacing with the already existing
388 Specifically, you are not required to separate the class-independent
389 and class-dependent scheduling functionality. You do not need to
390 support multiple scheduling classes. You can implement any routines
391 that you feel are necessary, not just the seven functions we
392 specifically listed. You can pass any parameters that you find
395 However, it is important that your approach be table-driven, with a
396 user-configurable dispatch table. Your table should be initialized to
397 the default values in Solaris 2.6, but you may also want to experiment
398 with different configurations. To demonstrate the functionality of
399 your scheduler, you may want to print out the change in priorities of
400 several different competing processes, as in the first graph above.