Software Systems
Fall 2006

For today, you should have:

1) prepared a project progress report

2) prepared for a quiz

3) read the second half of Chapter 18 of the Cow book

4) started Homework 5 with the partner of your choice (due 6 November)

5) read Tanenbaum pages 379-399

6) read LBoS and done the Dining Savages problem


Today:

1) quiz

2) border tags (leftover from notes15.txt)

3) homework 4 debrief

4) homework 5 early returns

5) file system performance


For next time you should:

1) read from Tanenbaum (questions below) 

2) finish Homework 5

3) work on your project

4) read this article on the convergence of web apps and 
   desktop apps:

http://www.readwriteweb.com/archives/elephants_and_evolution.php


Homework 4
----------

Format: follow the instructions from Homework 3.

In general,

1) Emulate the style of the technical papers we have been reading.

2) Write for a target audience that has taken this class but has
   not read this particular assignment.

A graph is worth a thousand numbers.

Tables are best when it is important to see absolute values (as
opposed to relative ones) with more than one digit of precision.

Otherwise graphs are best for seeing

1) trends

2) comparisons

3) large datasets.

Regarding warmup, it is relatively easy to deal with (how?)

So it is much better to deal with it than to use it as an
all-purpose explanation for any anomaly.

Many groups reported greater discrepancies for large rho.

Observed stats are noisier for large rho, so it is important to
make multiple runs and either 

1) average multiple runs (but don't forget to be clear about
what you are averaging over -- the notation can help)

2) make a scatter plot of the data.

A good graphical idiom is to use lines for analytic predictions
and points for measured (or simulated) values.

Also, get the data off the axes!

Distinguish noisy errors from consistent ones.

On Experiment 4, the problem statement was underspecified, which
allowed me to do some meta-analysis.

1) in choosing the parameters of the lognormal distribution, some
   of you matched one moment (the mean).

   In this case the variance was usually higher.

   And the result was the L and W were often much higher, especially
   for large rho.

2) others matched two moments (mean and variance)

   In this case, the analytic predictions were still pretty good.

These results indicate two things:

1) the M/M/1 model is reasonably robust in the sense that it can
   tolerate some deviation from the assumptions

2) W and L are sensitive to variance in service times.  Why?

The M/M/1 model can be extended to deal with a general distribution
of service times, in which case it is called M/G/1

(see Chapter 9 of Taylor and Karlin, equations 9.35 and 9.36)

Given the known mean and std dev of service times, nu and tau:

mu = 1/nu

rho = lambda * nu

L = rho + (lambda^2 tau^2 + rho^2) / (2 (1-rho))

W = nu + (lamdba (tau^2 + nu^2)) / (2 (1-rho))

So for fixed (lambda, nu), W and L increase linearly with tau^2!

(Or quadratically with std dev or CV)


I/O performance
---------------

wget http://wb/ss/code/mmap.tgz
tar -xzf mmap.tgz
cd mmap


Edit mycat.c and change BUFFSIZE as instructed in class.

Read the code an make sure you understand it.

make
time ./mycat < biographies.data > temp.txt

Run the previous line a couple of times and see what you see.

What is the hardware path of the bytes being copied?

     (review the system diagram from way back when)

How many times are the bytes copied?

What is the achieved bandwidth of the bytes along this path?

Where do you think the bottleneck is in this path?


Now read mcopy.c and the handout from Stevens (stevens93mmap.pdf)

Then run

time ./mcopy biographies.data temp.txt

How does the runtime compare?  Which program is faster, and why?


Finally, run

time cp biographies.data temp.txt

How do you think cp works?


A couple of C notes:

1) C defines a bunch of system types, like caddr_t that are meant
   to allow a little bit of abstraction and little bit of portability.

2) Flags are often specified by #defined constants that you combine
   with the logical OR operator.

From /usr/include/bits/mman.h...

/* Protections are chosen from these bits, OR'd together.  The
   implementation does not necessarily support PROT_EXEC or PROT_WRITE
   without PROT_READ.  The only guarantees are that no writing will be
   allowed without PROT_WRITE and no access will be allowed for PROT_NONE. */

#define PROT_READ	0x1		/* Page can be read.  */
#define PROT_WRITE	0x2		/* Page can be written.  */
#define PROT_EXEC	0x4		/* Page can be executed.  */
#define PROT_NONE	0x0		/* Page can not be accessed.  */


Reading
-------

Read Tanenbaum pages 399-428.

1) What are the biggest advantage and biggest drawback of contiguous
   allocation?


2) What are the biggest advantage and biggest drawback of linked
   allocation?


When you get to page 404 (inodes), you should jump ahead and read
pages 445--447 on UNIX inodes.


4) If a block is 4KB and a block address is 4B, what is the biggest
file we can index using only a single indirect block?


And now back to page 405...

The diagrams on page 406 are really more detail than we need.

When you get to page 408, you should also read the man page for ln,
which creates links.

Don't get bogged down in Tanenbaum's discussion of different kinds
of linking.

5) What is the take-home message of Figure 6-20?


6) What is the median file size?


7) Name one advantage of a free list over a bitmap, and vice versa.


Skip pages 415--423.  Read Section 6.3.7


8) Why is LRU a potentially dangerous algorithm for a block cache
   (which I will call a disk cache)?


9) Why would we expect block read-ahead to be a good idea?


Skim Section 6.3.8.  We will be reading about LFS soon.


Dining savages
--------------

In a previous edition of LBoS, I had the following solution:

# COOK

    emptyPot.wait()
    putServingsInPot(M)
    fullPot.signal()

# SAVAGES

    mutex.wait()
        if servings == 0:
            emptyPot.signal()
            fullPot.wait()
            servings = M
        servings -= 1
    mutex.signal()

    getServingFromPot()
    eat()

The difference is that getServingFromPot() was outside the mutex.

But I got an email that convinced me that there is a problem
here.  What is it?