cs341 Lecture Notes Spring 2002 Week 3, Thursday For today you should have 1) Finished hw02. It's due at the beginning of class. 2) Read Sections 7.1 and 7.2. Skimmed 7.3 and 7.4. For next time you should 1) Solve the next semaphore puzzle and prepare to turn in an explanation of what's wrong with the non-solution on page 39, and another attempt at a reusable barrier. Please write "Semaphore Homework #5" on your solution. Quick semaphore note: when a turnstile is 1 it is unlocked; 0 is locked. 2) Meet with your partner and make a plan for Homework 3. Mushtaque Wan Pollard Wada Muenchow Stadler Train Siyam Golder Choy Schwenck Masiello Cameron Chang Carlin Connor Ahmad 3) Read 7.3 and 7.4 and the paper by Waldspurger and Weihl. Preparation questions follow. 4) Prepare for a quiz on the topics we covered this week and the reading for next time. Random announcements -------------------- Please staple multiple-page assignments. Handouts now available from rocky.wellesley.edu/cs341/handouts/ chart.pdf quiz02b.pdf quiz02b.ps Hints about printing postscript and PDF... check out acroread, ps2pdf and pdf2ps Don't lpr PDF files!!! I sent in a request for a class conference. I feel dirty. Queueing model -------------- Processes arrive in the ready state from two sources 1) new submissions, whose arrival times are unpredictable 2) I/O completions, where we might have some information about impending arrivals or the distribution of interarrival times The CPU scheduler makes decisions at the following times 1) whenever an interrupt occurs (including timer expirations), we have to decide which process to run next (possibly including the current one) 2) whenever a process arrives in the ready state, we might choose to preempt the running procees (how, other than during an interrupt, might a process enter the ready state?) 3) when the running process terminates 4) when the running process blocks Other schedulers are also involved and affect the overall performance of the system. For example, the disk scheduler controls the order in which disk requests are handled. Goals ----- What would we like the scheduler to do? 1) minimize the average turnaround time for all processes (turnaround time = wall clock time from beginning to completion) 2) satisfy real-time constraints, like requirements for interactive behavior 3) maintain high utilization of the CPU and the other hardware by overlapping the CPU of one process with the I/O of another (this last is more like a heuristic than a goal -- we have a vague notion that doing this is generally a good idea, although it is not necessarily something users care about) 4) behave predictably and fairly predictability is good for programmers who need to have a model of program behavior fairness is hard to define, but one aspect of it is that every process should have _some_ access to the CPU; none should "starve" Strategies (heuristics) ----------------------- FIFO: first in first out processes are moved from ready into running in the order they arrived, and they run until they complete or block pro: simple, and meets some definition of fairness con: bad average turnaround when short waits for long no priority mechanism: hard to meet real-time constraints example: a 1 ms process and a 10 ms process SJF: shortest (time to completion or blocking) first this has some intuitive appeal, if you think about relative costs: a 1 ms delay isn't bad for a 30 ms process, but it is catastrophic for a 0.1 ms process SJF meets a different definition of fairness pro: provably optimal (minimal) average turnaround time for a set of processes all available at t0 con: we generally don't know the duration of each process's next CPU burst, and also the model assumed in the proof does not consider the dependency of successive bursts Round Robin: give each process a little bit of time this is an approximation of SJF in the case where we don't know the durations example: 1 ms and 10 ms again, but we don't know which is which, quantum = 1 ms result: almost as good as SJF, and much better than FIFO pathology: can be even worse than FIFO example: two 10 ms processes, quantum = 1 ms pro: generally good when there is large variability in duration con: overhead becomes signficant as quantum gets smaller pessimal when jobs are the same length External priorities: users or the the system might attach priorities to different processes example: interactive jobs get highest priority because response time is important and they tend to have short bursts long-running computationally-intensive processes might get lower priority (but maybe longer quanta) pro: satisfies real time requirements con: burden on users/programmers starvation / bidding wars Multi-level feedback queue: use past behavior to predict future behavior two kinds of learning: 1) predict the duration of the next burst based on past bursts 2) classify process based on long-term behavior Adjust quantum length automatically to reduce overhead for long-running computationally-intensive processes (but why bother?) Tradeoff against the expense of keeping track of accounting information. Non-deterministic scheduling: Example: lottery scheduling. Problem: for any deterministic system you can contruct pathological cases that yield bad performance Solution: a little randomness goes a long way Con: unpredictable execution model for programmers Any idea what Linux is doing? Evaluation techniques for scheduling strategies ----------------------------------------------- 1) Analysis: only works for very simple strategies/models 2) Simulation: useful for checking out wide range of possibilities. depends on workload model or traces 3) Implementation: most reliable, but also depends on workload Which do Waldspurger and Weihl do? What is their workload or workload model? What is their performance metric? Dependencies ------------ Complication: there are often dependencies among tasks. 1) the sooner you issue an I/O command, the sooner it completes 2) users might not submit the next process until this one completes These dependencies limit the usefulness of simple queueing models. Instead, designers use heuristics like: 1) give processes priority if you expect them to do I/O soon (and why might you suspect them?) 2) degrade the priority of things that have been running for a long time a) they are less likely to be interactive b) the relative cost of delays is declining Real-time scheduling -------------------- When processes arrive, we generally need to know 1) their resource requirements 2) their deadlines 3) sometimes a cost function (how bad would it be if we missed the deadline) The goal of the real time scheduler is either 1) find any schedule that misses no deadlines (or maybe the best such schedule) 2) if not possible, minimize the total cost of all misses Some real-time schedulers involve recurring tasks, in which case they sometimes break the time line into cycles that are the LCM of the cycle times of recurring tasks. In the case where there is guaranteed to be a schedule that misses no deadlines, and the cost of a context switch is zero, there is a simple algorithm that is guaranteed to find a successful schedule: Work on the Process with the Nearest Deadline First Sound familiar? Problem: if the schedule is untenable, this algorithm is often pessimal.