Software Systems Spring 2008 For today, you should have: 1) worked on the next semaphore puzzle (reusable barrier) 2) read the Cow book and done Exercise 8-1 3) done the reading from notes05 Outline: 1) homework 2 discussion 2) reusable barrier solution 3) Exercise 8-1 solution 4) thread experiment 5) scheduling heuristics 6) Lottery Scheduling paper For next time you should: 1) start Homework 3 (due Thursday 21 February) Randomly-generated pairing for this homework are: samuel.freilich jeffrey.dezso Benjamin.Fisher kiuk.gwak andrew.tsang kent.munson Jonathan.Inman Mary.Germino Christina.Nguyen Ilari.Shafer James.Switzer allison.weis 2) Read the Cow book, pages 209-211, to see how file input works, then read Chapter 9 and write a program that reads a file and counts the number of words (be sure to define carefully what a word is), and then count the number of words in War and Peace, which you can download from gutenberg.net Compare your results with the output from wc. 3) Read Waldspurger and Weihl and answer the reading questions below. http://wb/ss/handouts/waldspurger94lottery.pdf 4) Read the next section of the LBoS Pthreads -------- Quoth the Wikipedia: "POSIX Threads is a POSIX standard for threads. The standard defines an API for creating and manipulating threads. "Libraries implementing the POSIX Threads standard are often named Pthreads. Pthreads are most commonly used on POSIX systems such as Linux and Unix, but Microsoft Windows implementations also exist." wget wb/ss/code/thread.tgz tar -xzf thread.tgz cd thread make ./thread 1) read thread.c 2) design an experiment you can run to determine where in the address space the child thread's stack is. 3) modify the code to create two child threads, and run your experiment again. Where is the second child's stack? Scheduling ---------- Processes arrive in the ready state from three sources 1) new submissions, whose arrival times are unpredictable 2) a running process that has been interrupted 3) a blocked process that has been serviced In the latter two cases, we have some historical information about the process. In the first case we _might_ have some information about the program. The CPU scheduler makes decisions at the following times 1) whenever an interrupt occurs (including timer expirations), we have to decide which process to run next (possibly including the current one) 2) while executing a system call on behalf of a running process, we might create a new process or unblock an existing one, and we might invoke the scheduler, which might choose to preempt the running procees 3) when the running process terminates 4) when the running process blocks Other resource schedulers are also involved and affect the overall performance of the system. For example, the disk scheduler controls the order in which disk requests are handled. Goals ----- What would we like the scheduler to do? 1) minimize the average turnaround time for all processes (turnaround time = wall clock time from beginning to completion) 2) satisfy real-time constraints, like requirements for interactive behavior 3) maintain high utilization of the CPU and the other hardware by overlapping the CPU of one process with the I/O of another (this last is more like a heuristic than a goal -- we have a vague notion that doing this is generally a good idea, although it is not necessarily something users care about) 4) behave predictably and fairly predictability is good for programmers who need to have a model of program behavior fairness is hard to define, but one aspect of it is that every process should have _some_ access to the CPU; none should "starve" FUNDAMENTAL PROBLEM OF SCHEDULING: Goals are high-level and aggregate, but the scheduler has to make a series of low-level decisions, quickly, with limited global information. Strategies (heuristics) ----------------------- FIFO: first in first out processes are moved from ready into running in the order they arrived, and continue until they complete or block pro: simple, and meets some definition of fairness con: bad average turnaround when short waits for long bad utilization when I/O bound wait for CPU bound no priority mechanism: hard to meet real-time constraints example: a 1 unit process and a 10 unit process SJF: shortest (time to completion or blocking) first this has some intuitive appeal, if you think about relative costs: a 1 unit delay isn't bad for a 30 unit process, but it is catastrophic for a 0.1 unit process SJF meets a different definition of fairness pro: provably optimal (minimal) average turnaround time for a set of processes all available at t0 con: we generally don't know the duration of each process's next CPU burst, and also the model assumed in the proof does not consider the dependency of successive bursts Round Robin: give each process a little bit of time this is an approximation of SJF in the case where we don't know the durations example: 1 unit and 10 unit, but we don't know which is which, quantum = 1 unit result: almost as good as SJF, and much better than FIFO pathology: can be worse than FIFO example: two 10 ms processes, quantum = 1 ms pro: generally good when there is large variability in duration con: overhead becomes signficant as quantum gets smaller pessimal when jobs are the same length External priorities: users or the the system might attach priorities to different processes example: interactive jobs get highest priority because response time is important and they tend to have short bursts long-running computationally-intensive processes might get lower priority (but maybe longer quanta) pro: satisfies real time requirements con: burden on users/programmers starvation / bidding wars Multi-level feedback queue: use past behavior to predict future behavior two kinds of learning: 1) predict the duration of the next burst based on past bursts 2) classify process based on long-term behavior Tradeoff performance against the expense of keeping track of long-term information. Non-deterministic scheduling: Example: lottery scheduling. Problem: for any deterministic system you can contruct pathological cases that yield bad performance Solution: a little randomness goes a long way Con: unpredictable execution model for programmers (maybe) Lottery Scheduling paper ------------------------ The goal of reading a technical paper is to get to the point where you can answer the following questions: 1) what is the new idea/contribution of this paper? 2) what did the authors do to evaluate their idea? 3) what are the results of the evaluation? 4) are the conclusions supported by the evidence? The first two are usually easy to extract quickly. The third requires more careful reading. The last requires _very_ careful reading, and sometimes a lot of background. So we will start with the first three! Read the following questions, then read the Lottery Scheduling paper and write answers. 1) Make several passes through the paper. 2) Do not read the entire paper. It is not all equally important! 3) Budget a reasonable amount of time for this exercise and then stop. What is Lottery Scheduling? What other kind of scheduling is most similar, and how is Lottery Scheduling different? Name three features or advantages of Lottery Scheduling. What are ticket transfers, and why might they be useful? Anything interesting about the implementation? What experiments do the authors perform to test their implementation? For each experiment, what are the authors trying to prove? (hint: the fastest way to answer this question is often to look at the figures and their captions) Do the authors acknowledge any limitations or disadvantages of their technique? Can you think of any?