Software Systems Fall 2006 For today, you should have: 1) prepared a project progress report 2) prepared for a quiz 3) read the second half of Chapter 18 of the Cow book 4) started Homework 5 with the partner of your choice (due 6 November) 5) read Tanenbaum pages 379-399 6) read LBoS and done the Dining Savages problem Today: 1) quiz 2) border tags (leftover from notes15.txt) 3) homework 4 debrief 4) homework 5 early returns 5) file system performance For next time you should: 1) read from Tanenbaum (questions below) 2) finish Homework 5 3) work on your project 4) read this article on the convergence of web apps and desktop apps: http://www.readwriteweb.com/archives/elephants_and_evolution.php Homework 4 ---------- Format: follow the instructions from Homework 3. In general, 1) Emulate the style of the technical papers we have been reading. 2) Write for a target audience that has taken this class but has not read this particular assignment. A graph is worth a thousand numbers. Tables are best when it is important to see absolute values (as opposed to relative ones) with more than one digit of precision. Otherwise graphs are best for seeing 1) trends 2) comparisons 3) large datasets. Regarding warmup, it is relatively easy to deal with (how?) So it is much better to deal with it than to use it as an all-purpose explanation for any anomaly. Many groups reported greater discrepancies for large rho. Observed stats are noisier for large rho, so it is important to make multiple runs and either 1) average multiple runs (but don't forget to be clear about what you are averaging over -- the notation can help) 2) make a scatter plot of the data. A good graphical idiom is to use lines for analytic predictions and points for measured (or simulated) values. Also, get the data off the axes! Distinguish noisy errors from consistent ones. On Experiment 4, the problem statement was underspecified, which allowed me to do some meta-analysis. 1) in choosing the parameters of the lognormal distribution, some of you matched one moment (the mean). In this case the variance was usually higher. And the result was the L and W were often much higher, especially for large rho. 2) others matched two moments (mean and variance) In this case, the analytic predictions were still pretty good. These results indicate two things: 1) the M/M/1 model is reasonably robust in the sense that it can tolerate some deviation from the assumptions 2) W and L are sensitive to variance in service times. Why? The M/M/1 model can be extended to deal with a general distribution of service times, in which case it is called M/G/1 (see Chapter 9 of Taylor and Karlin, equations 9.35 and 9.36) Given the known mean and std dev of service times, nu and tau: mu = 1/nu rho = lambda * nu L = rho + (lambda^2 tau^2 + rho^2) / (2 (1-rho)) W = nu + (lamdba (tau^2 + nu^2)) / (2 (1-rho)) So for fixed (lambda, nu), W and L increase linearly with tau^2! (Or quadratically with std dev or CV) I/O performance --------------- wget http://wb/ss/code/mmap.tgz tar -xzf mmap.tgz cd mmap Edit mycat.c and change BUFFSIZE as instructed in class. Read the code an make sure you understand it. make time ./mycat < biographies.data > temp.txt Run the previous line a couple of times and see what you see. What is the hardware path of the bytes being copied? (review the system diagram from way back when) How many times are the bytes copied? What is the achieved bandwidth of the bytes along this path? Where do you think the bottleneck is in this path? Now read mcopy.c and the handout from Stevens (stevens93mmap.pdf) Then run time ./mcopy biographies.data temp.txt How does the runtime compare? Which program is faster, and why? Finally, run time cp biographies.data temp.txt How do you think cp works? A couple of C notes: 1) C defines a bunch of system types, like caddr_t that are meant to allow a little bit of abstraction and little bit of portability. 2) Flags are often specified by #defined constants that you combine with the logical OR operator. From /usr/include/bits/mman.h... /* Protections are chosen from these bits, OR'd together. The implementation does not necessarily support PROT_EXEC or PROT_WRITE without PROT_READ. The only guarantees are that no writing will be allowed without PROT_WRITE and no access will be allowed for PROT_NONE. */ #define PROT_READ 0x1 /* Page can be read. */ #define PROT_WRITE 0x2 /* Page can be written. */ #define PROT_EXEC 0x4 /* Page can be executed. */ #define PROT_NONE 0x0 /* Page can not be accessed. */ Reading ------- Read Tanenbaum pages 399-428. 1) What are the biggest advantage and biggest drawback of contiguous allocation? 2) What are the biggest advantage and biggest drawback of linked allocation? When you get to page 404 (inodes), you should jump ahead and read pages 445--447 on UNIX inodes. 4) If a block is 4KB and a block address is 4B, what is the biggest file we can index using only a single indirect block? And now back to page 405... The diagrams on page 406 are really more detail than we need. When you get to page 408, you should also read the man page for ln, which creates links. Don't get bogged down in Tanenbaum's discussion of different kinds of linking. 5) What is the take-home message of Figure 6-20? 6) What is the median file size? 7) Name one advantage of a free list over a bitmap, and vice versa. Skip pages 415--423. Read Section 6.3.7 8) Why is LRU a potentially dangerous algorithm for a block cache (which I will call a disk cache)? 9) Why would we expect block read-ahead to be a good idea? Skim Section 6.3.8. We will be reading about LFS soon. Dining savages -------------- In a previous edition of LBoS, I had the following solution: # COOK emptyPot.wait() putServingsInPot(M) fullPot.signal() # SAVAGES mutex.wait() if servings == 0: emptyPot.signal() fullPot.wait() servings = M servings -= 1 mutex.signal() getServingFromPot() eat() The difference is that getServingFromPot() was outside the mutex. But I got an email that convinced me that there is a problem here. What is it?