Software Systems Fall 2006 For today, you should have: 1) worked on your project -- progress report due today 2) read Chapter 23 of the Cow Book 3) read the handout from Raskin, "The Humane Interface" 4) optional: read the next section of LBoS and do the H2O problem Outline: 1) Raskin and Archy 2) homework 5 debrief 3) project meeting time 4) practice exam? For next time you should: 1) prepare for the exam 2) read the handout from Database System Implementation and answer the questions below. Homework 5 ---------- In OOP it is common to have multiple implementations of an interface. 1) One file specifies the interface. 2) Any number of clients can be written using only the public parts of the interface. 3) Any number of implementations can be written. When do you specify which impementation you actually want to use? 1) Compile time: in Java, for example, you cannot instantiate an object without specifying which implementation you want. 2) Link time: alternatively, we can write client code that is completely implementation independent, and provide the implementation at link time or run time. In my solution to Homework 5, I demonstrate a way to do (2) in C. Step 1 ------ First, I put the interface definition in lock.h typedef struct lock Lock; Lock *make_lock (); void lock_acquire (Lock *lock); void lock_release (Lock *lock); Notice that the type definition only says that Lock is a structure; it doesn't say what's in it. This is an anonymous structure. The client code is not allowed to access the elements. Step 2 ------ Then I put an implementation in lock.c: #include "lock.h" struct lock { int value; % here is the actual specification of the lock structure }; Lock *make_lock () { Lock *lock = (Lock *) malloc (sizeof(Lock)); lock->value = 0; return lock; } void lock_acquire (Lock *lock) { while (lock->value == 1); lock->value = 1; } void lock_release (Lock *lock) { lock->value = 0; } Step 3 ------ And I can put another implementation in pmutex.c: #include "lock.h" struct lock { pthread_mutex_t mutex[1]; % a completely different structure }; Lock *make_lock () { Lock *lock = (Lock *) malloc (sizeof(Lock)); pthread_mutex_init (lock->mutex, NULL); return lock; } void lock_acquire (Lock *lock) { pthread_mutex_lock (lock->mutex); } void lock_release (Lock *lock) { pthread_mutex_unlock (lock->mutex); } Notice that the structure definition contains an array with a single pthread_mutex_t in it. This is a quick and dirty way of allocating space and getting a pointer to it. We saw another example in get_seconds: double get_seconds () { struct timeval tv[1]; gettimeofday (tv, NULL); return tv->tv_sec + tv->tv_usec / 1e6; } Step 4 ------ Then I can write the client code using _only_ the interface: #include "lock.h" typedef struct { int next_id; int array[SIZE]; Lock *lock; } Environment; Environment *make_environment () { int i; Environment *env = (Environment *) malloc (sizeof (Environment)); if (env == NULL) { perror ("malloc failed"); exit (-1); } env->next_id = 0; for (i=0; iarray[i] = 0; } env->lock = make_lock (); return env; } void loop_and_count (pthread_t self, Environment *env) { int id; while (1) { lock_acquire (env->lock); id = get_next_id (env); lock_release (env->lock); // printf ("%d got %d\n", self, id); if (id >= SIZE) return; env->array[id]++; } } Step 5 ------ Then I can specify at link time which implementation I want. Here's the Makefile: array1: gcc -g array.c lock.c -o array1 -lpthread array2: gcc -g array.c lock.s -o array2 -lpthread array3: gcc -g array.c pmutex.c -o array3 -lpthread array4: gcc -g array.c semlock.c semaphore.c cond.c -o array4 -lpthread The last one uses 3 files: cond.c: a wrapper for pthread_cond_t semaphore.c: an implementation of semaphores using pthread condition variables and mutexes semlock.c: an adapter that implements the lock.h interface using a semaphore. See the handout from LBoS for details about my implementation of semaphores. Puzzle: what are the two problems with this implementation of semaphores? void semaphore_wait (Semaphore *semaphore) { acquire (semaphore->lock); semaphore->value--; if (semaphore->value < 0) { cond_wait (semaphore->cond, semaphore->lock); } release (semaphore->lock); } void semaphore_signal (Semaphore *semaphore) { acquire (semaphore->lock); semaphore->value++; if (semaphore->value <= 0) { cond_signal (semaphore->cond); } release (semaphore->lock); } What is a spurious wakeup? See http://en.wikipedia.org/wiki/Spurious_wakeup Reading Questions ----------------- Database System Implementation, pp 40-62 http://wb/ss/handouts/garcia00database.pdf There are more page in this reading than usual, but at least on the first pass, you can skip the running Example and just read the main text. On a second pass, you might want to read the examples to test your understanding. There is a little database-specific vocabulary, so here are some definitions that might help. A "relation" is like a dictionary or hash table that maps keys to values. The keys are often unique identifiers of some kind, the values are often records. A "tuple" is a value stored in a relation, also called a record. For example, in Software Design we sometimes do a homework that involves processing a database on the reproductive histories of a large number of respondents. The database contains two relations, one for respondents and one for intervals. Each relation mapped a unique identifier to a tuple, or record, that stored all the information about a given respondent or interval. 1) What is the primary performance assumption that underlies the design of database algorithms? 2) What work is done during the first phase of a multiway merge sort? 3) What work is done during the second phase? 4) During the second phase, why aren't the sublists merged pairwise? 5) Page 47 makes it sound like the two phases take the same amount of time, but in practice, which one do you think takes longer? 6) Page 48 makes an argument that two phases is enough to handle the biggest conceivable database. Do you buy it? The bulleted list on page 51 is an outline of the next few sections. Come back to this list at the end to make sure you understand each point. 7) In Section 2.4.2, why is it important that the disks have independent heads? 8) In Section 2.4.3, would it make any sense to have two copies of the same data on the same disk? 9) What is the primary advantage of the elevator algorithm over FIFO? 10) What is the primary drawback of the elevator algorithm?