Software Systems Fall 2006 For today, you should have: 1) worked on the next semaphore puzzle (reusable barrier) 2) read the Cow book and done Exercise 8-1 3) done the reading from notes05 Outline: 1) quiz 2 / homework 2 discussion 2) reusable barrier solution 3) Exercise 8-1 solution 4) thread experiment 5) scheduling heuristics 6) Lottery Scheduling paper For next time you should: 1) start Homework 3 (due Tuesday 2 October) jonathan.cass bradford.westgate alex.dorsk james.whong brian.shih ben.hayden robin.nix sean.mcbride thomas.michon robert.quimby george.harris vito.ruiz matthew.crawford andrew.kalcic connor.riley jeffrey.decew daniel.gallagher nicholas.hays 2) Read the Cow book, pages 209-211, to see how file input works, then read Chapter 9 and write a program that reads a file and counts the number of words (be sure to define carefully what a word is), and then count the number of words in War and Peace, which you can download from gutenberg.net Compare your results with the output from wc. 3) Read Waldspurger and Weihl and answer the reading questions below. 4) Read the next section of the LBoS WARNING: the Queue problem is not super well posed. Quiz 2 ------ Part 1 was good. It was important to note that the experiment went to some trouble to ensure that the processes ran concurrently. Part 2 was mostly good. There was some confusion about the right level of detail. Some people said things like "Process A will see that the block from disk is available..." This is true at the high level of abstraction, where we imagine that multiple processes run at the same time. But at the OS level, only one process (or the OS) is running. Since A is blocked, it can't do anything until... 1) something happens to move it from blocked to ready 2) the scheduler moves it from ready to running. Most of the time, we will adopt the OS point-of-view. Homework 2 ---------- I checked your semaphore solutions and marked errors if I found them. I looked over the C programs for style, but did not check correctness (you don't really need me for that). Things looked pretty good. Some confusion about semaphores, but I think that is cleared up. The C programming is coming along nicely. It is (still) fairly standard to format code to fit in 80 columns. I'm not a stickler for any particular code formatting convention. There are several: http://pantransit.reptiles.org/prog/CodingStyle.html http://www.gnu.org/prep/standards/standards.html If you join an existing project, you will have to adopt their standard. If they don't have one, think about finding another project. It's probably a symptom. Pthreads -------- Quoth the Wikipedia: "POSIX Threads is a POSIX standard for threads. The standard defines an API for creating and manipulating threads. "Libraries implementing the POSIX Threads standard are often named Pthreads. Pthreads are most commonly used on POSIX systems such as Linux and Unix, but Microsoft Windows implementations also exist." wget wb/ss/code/thread.tgz tar -xzf thread.tgz cd thread make ./thread 1) read thread.c 2) design an experiment you can run to determine where in the address space the child thread's stack is. 3) modify the code to create two child threads, and run your experiment again. Where is the second child's stack? Scheduling ---------- Processes arrive in the ready state from three sources 1) new submissions, whose arrival times are unpredictable 2) a running process that has been interrupted 3) a blocked process that has been serviced In the latter two cases, we have some historical information about the process. In the first case we _might_ have some information about the program. The CPU scheduler makes decisions at the following times 1) whenever an interrupt occurs (including timer expirations), we have to decide which process to run next (possibly including the current one) 2) while executing a system call on behalf of a running process, we might create a new process or unblock an existing one, and we might invoke the scheduler, which might choose to preempt the running procees 3) when the running process terminates 4) when the running process blocks Other resource schedulers are also involved and affect the overall performance of the system. For example, the disk scheduler controls the order in which disk requests are handled. Goals ----- What would we like the scheduler to do? 1) minimize the average turnaround time for all processes (turnaround time = wall clock time from beginning to completion) 2) satisfy real-time constraints, like requirements for interactive behavior 3) maintain high utilization of the CPU and the other hardware by overlapping the CPU of one process with the I/O of another (this last is more like a heuristic than a goal -- we have a vague notion that doing this is generally a good idea, although it is not necessarily something users care about) 4) behave predictably and fairly predictability is good for programmers who need to have a model of program behavior fairness is hard to define, but one aspect of it is that every process should have _some_ access to the CPU; none should "starve" FUNDAMENTAL PROBLEM OF SCHEDULING: Goals are high-level and aggregate, but the scheduler has to make a series of low-level decisions, quickly, with limited global information. Strategies (heuristics) ----------------------- FIFO: first in first out processes are moved from ready into running in the order they arrived, and continue until they complete or block pro: simple, and meets some definition of fairness con: bad average turnaround when short waits for long bad utilization when I/O bound wait for CPU bound no priority mechanism: hard to meet real-time constraints example: a 1 s process and a 10 s process SJF: shortest (time to completion or blocking) first this has some intuitive appeal, if you think about relative costs: a 1 s delay isn't bad for a 30 s process, but it is catastrophic for a 0.1 s process SJF meets a different definition of fairness pro: provably optimal (minimal) average turnaround time for a set of processes all available at t0 con: we generally don't know the duration of each process's next CPU burst, and also the model assumed in the proof does not consider the dependency of successive bursts Round Robin: give each process a little bit of time this is an approximation of SJF in the case where we don't know the durations example: 1 ms and 10 ms, but we don't know which is which, quantum = 1 ms result: almost as good as SJF, and much better than FIFO pathology: can be worse than FIFO example: two 10 ms processes, quantum = 1 ms pro: generally good when there is large variability in duration con: overhead becomes signficant as quantum gets smaller pessimal when jobs are the same length External priorities: users or the the system might attach priorities to different processes example: interactive jobs get highest priority because response time is important and they tend to have short bursts long-running computationally-intensive processes might get lower priority (but maybe longer quanta) pro: satisfies real time requirements con: burden on users/programmers starvation / bidding wars Multi-level feedback queue: use past behavior to predict future behavior two kinds of learning: 1) predict the duration of the next burst based on past bursts 2) classify process based on long-term behavior Tradeoff performance against the expense of keeping track of long-term information. Non-deterministic scheduling: Example: lottery scheduling. Problem: for any deterministic system you can contruct pathological cases that yield bad performance Solution: a little randomness goes a long way Con: unpredictable execution model for programmers Lottery Scheduling paper ------------------------ The goal of reading a technical paper is to get to the point where you can answer the following questions: 1) what is the new idea/contribution of this paper? 2) what did the authors do to evaluate their idea? 3) what are the results of the evaluation? 4) are the conclusions supported by the evidence? The first two are usually easy to extract quickly. The third requires more careful reading. The last requires _very_ careful reading, and sometimes a lot of background. So we will start with the first three! Read the following questions, then read the Lottery Scheduling paper and write answers. 1) Make several passes through the paper. 2) Do not read the entire paper. It is not all equally important! 3) Budget a reasonable amount of time for this exercise and then stop. What is Lottery Scheduling? What other kind of scheduling is most similar, and how is Lottery Scheduling different? Name three features or advantages of Lottery Scheduling. What are ticket transfers, and why might they be useful? Anything interesting about the implementation? What experiments do the authors perform to test their implementation? For each experiment, what are the authors trying to prove? (hint: the fastest way to answer this question is often to look at the figures and their captions) Do the authors acknowledge any limitations or disadvantages of their technique? Can you think of any?