Software Systems
Fall 2006

For today, you should have:

1) worked on your project -- progress report due today

2) read Chapter 23 of the Cow Book

3) read the handout from Raskin, "The Humane Interface"

4) optional: read the next section of LBoS and do the H2O problem


Outline:

1) Raskin and Archy

2) homework 5 debrief

3) project meeting time

4) practice exam?


For next time you should:

1) prepare for the exam

2) read the handout from Database System Implementation and
   answer the questions below.


Homework 5
----------

In OOP it is common to have multiple implementations of an
interface.

1) One file specifies the interface.

2) Any number of clients can be written using only the
   public parts of the interface.

3) Any number of implementations can be written.

When do you specify which impementation you actually want to use?

1) Compile time: in Java, for example, you cannot instantiate an
   object without specifying which implementation you want.

2) Link time: alternatively, we can write client code that is
   completely implementation independent, and provide the
   implementation at link time or run time.

In my solution to Homework 5, I demonstrate a way to do (2) in C.


Step 1
------

First, I put the interface definition in lock.h

typedef struct lock Lock;

Lock *make_lock ();
void lock_acquire (Lock *lock);
void lock_release (Lock *lock);

Notice that the type definition only says that Lock is a structure;
it doesn't say what's in it.

This is an anonymous structure.  The client code is not allowed to
access the elements.

Step 2
------

Then I put an implementation in lock.c:

#include "lock.h"

struct lock {
  int value;    % here is the actual specification of the lock structure
};

Lock *make_lock ()
{
  Lock *lock = (Lock *) malloc (sizeof(Lock));
  lock->value = 0;
  return lock;
}

void lock_acquire (Lock *lock)
{
  while (lock->value == 1);
  lock->value = 1;
}

void lock_release (Lock *lock)
{
  lock->value = 0;
}


Step 3
------

And I can put another implementation in pmutex.c:

#include "lock.h"

struct lock {
  pthread_mutex_t mutex[1];    % a completely different structure
};

Lock *make_lock ()
{
  Lock *lock = (Lock *) malloc (sizeof(Lock));
  pthread_mutex_init (lock->mutex, NULL);
  return lock;
}

void lock_acquire (Lock *lock)
{
  pthread_mutex_lock (lock->mutex);
}

void lock_release (Lock *lock)
{
  pthread_mutex_unlock (lock->mutex);
}

Notice that the structure definition contains an array with
a single pthread_mutex_t in it.

This is a quick and dirty way of allocating space and getting
a pointer to it.

We saw another example in get_seconds:

double get_seconds () {
  struct timeval tv[1];

  gettimeofday (tv, NULL);
  return tv->tv_sec + tv->tv_usec / 1e6;
}


Step 4
------

Then I can write the client code using _only_ the interface:

#include "lock.h"

typedef struct {
  int next_id;
  int array[SIZE];
  Lock *lock;
} Environment;

Environment *make_environment ()
{
  int i;

  Environment *env = (Environment *) malloc (sizeof (Environment));
  if (env == NULL) {
    perror ("malloc failed");
    exit (-1);
  }
  env->next_id = 0;

  for (i=0; i<SIZE; i++) {
    env->array[i] = 0;
  }

  env->lock = make_lock ();
  return env;
}

void loop_and_count (pthread_t self, Environment *env)
{
  int id;

  while (1) {
    lock_acquire (env->lock);
    id = get_next_id (env);
    lock_release (env->lock);

    // printf ("%d got %d\n", self, id);

    if (id >= SIZE) return;
    env->array[id]++;
  }
}

Step 5
------

Then I can specify at link time which implementation I want.
Here's the Makefile:

array1:
	gcc -g array.c lock.c -o array1 -lpthread

array2:
	gcc -g array.c lock.s -o array2 -lpthread

array3:
	gcc -g array.c pmutex.c -o array3 -lpthread

array4:
	gcc -g array.c semlock.c semaphore.c cond.c -o array4 -lpthread


The last one uses 3 files:

cond.c:       a wrapper for pthread_cond_t

semaphore.c:  an implementation of semaphores using pthread condition
	      variables and mutexes

semlock.c:    an adapter that implements the lock.h interface using
              a semaphore.


See the handout from LBoS for details about my implementation of
semaphores.

Puzzle: what are the two problems with this implementation of
semaphores?

void semaphore_wait (Semaphore *semaphore)
{
  acquire (semaphore->lock);
  semaphore->value--;
  if (semaphore->value < 0) {
    cond_wait (semaphore->cond, semaphore->lock);
  }
  release (semaphore->lock);
}

void semaphore_signal (Semaphore *semaphore)
{
  acquire (semaphore->lock);
  semaphore->value++;
  if (semaphore->value <= 0) {
    cond_signal (semaphore->cond);
  }
  release (semaphore->lock);
}


What is a spurious wakeup?  
See http://en.wikipedia.org/wiki/Spurious_wakeup


Reading Questions
-----------------

Database System Implementation, pp 40-62

http://wb/ss/handouts/garcia00database.pdf

There are more page in this reading than usual, but at least
on the first pass, you can skip the running Example and just
read the main text.

On a second pass, you might want to read the examples to test
your understanding.

There is a little database-specific vocabulary, so here are
some definitions that might help.

A "relation" is like a dictionary or hash table that maps keys
  to values.  The keys are often unique identifiers of some
  kind, the values are often records.

A "tuple" is a value stored in a relation, also called a record.

For example, in Software Design we sometimes do a homework that
involves processing a database on the reproductive histories of
a large number of respondents.

The database contains two relations, one for respondents and one for
intervals.  Each relation mapped a unique identifier to a tuple, or
record, that stored all the information about a given respondent or
interval.


1) What is the primary performance assumption that underlies
   the design of database algorithms?


2) What work is done during the first phase of a multiway
   merge sort?


3) What work is done during the second phase?


4) During the second phase, why aren't the sublists merged pairwise?


5) Page 47 makes it sound like the two phases take the same amount
   of time, but in practice, which one do you think takes longer?


6) Page 48 makes an argument that two phases is enough to handle the
   biggest conceivable database.  Do you buy it?


The bulleted list on page 51 is an outline of the next few sections.
Come back to this list at the end to make sure you understand each
point.

7) In Section 2.4.2, why is it important that the disks have
   independent heads?


8) In Section 2.4.3, would it make any sense to have two copies
   of the same data on the same disk?


9) What is the primary advantage of the elevator algorithm over
   FIFO?


10) What is the primary drawback of the elevator algorithm?