CS151 lecture notes, Fall 1999 Week 13, Wednesday Mergesort --------- Simple mergesort 1) split the deck in half 2) sort the halves 3) merge the halves Is this faster than the traversal sorting we did before? We'll analyze it next time and see. Recursive mergesort 0) if the deck contains a single card, return it 1) split the deck in half 2) sort the halves (using mergesort) 3) merge the halves Is this faster still? We'll see. The hardest part of implementing mergesort is merge. Make sure you test merge rigorously before you write the recursive version, or you will be unhappy. Analysis of algorithms ---------------------- Analysis of algorithms is an area of computer science on the border between CS and mathematics. It uses many of the tools of mathematics 1) abstract algebra 2) proof techniques But it has a very practical aspect -- the design and analysis of fast algorithms. By now you should have an idea what an algorithm is. Analysis of bisection search ---------------------------- How many comparisons do you have to do in a linear search? One for each element of the array. If array size is n, there are n comparisons. How many comparisons do you have to do in a bisection search? (Don't forget that the array has to be sorted. Can we prove that you can't search an unsorted array in fewer than n comparisons?) One each time we divide the array in half. How many times can we divide the array (size n) before we get to one? How many times do we have to double 1 before we get to n? What power of 2 is n? What mathematical function answers that question? Logarithm! The amount of work in a bisection search is proportional to the logarithm of the number of elements in the (sorted) array. What is the base of the logarithm? In this case, 2, but in general it doesn't matter much. We are more interested in the order of growth than the exact run time. Order of Growth --------------- Often it is useful to characterize how a function behaves as n gets large. Remember what n is -- it's the number of things getting searched, or sorted, or whatever. It's the "problem size". We are mostly interested in large values of n because when n is small, any algorithm will be fast enough. As n gets big, the choice of algorithm becomes more important. How different are n and log n? n log n - ----- 10 1 100 2 1000 3 10000 4 Clearly n is getting big much faster than log n. What if the log n term is multiplied by a constant factor? Very often, it doesn't matter, because there will still be a value of n where (1000 log n) is less than n, no matter what the leading constant is. Analysis of Mergesort --------------------- How many comparisons do you have to do using the simple sort algorithm? Have to traverse the array (n iterations). At each iteration we have to find the largest among n-i elements (where i is the iteration number). Total comparisons = n + (n-1) + (n-2) + ... + 1 Draw a triangle diagram to motivate the solution n/2 (n-1) The leading order term is n^2, which means that as n gets big, the behavior of this curve is like n^2. How many comparisons do you have to do in a mergesort? At each level of recursion, the number of comparisons is n-1. How many levels of recursion are there?