Announcements

April 9, 2019

Here's a quick rundown of the office hours times we'll have available each week:

Monday, 10:00AM - 12:00 Noon, Huang Basement (Anton)
Monday, 1:30PM - 3:30PM, Huang Basement (Michael)
Wednesday. 2:00PM - 4:00PM, Gates 178 (Keith)
Friday, 1:00PM - 3:00PM, Huang Basement (Ryan)

April 1, 2019

Welcome to CS166, a course in the design, analysis, and implementation of data structures. We've got an exciting quarter ahead of us - the data structures we'll investigate are some of the most beautiful constructs I've ever come across - and I hope you're able to join us.

CS166 has two prerequisites - CS107 and CS161. From CS107, we'll assume that you're comfortable working from the command-line; designing, testing, and debugging nontrivial programs; manipulating pointers and arrays; using bitwise operators; and reasoning about the memory hierarchy. From CS161, we'll assume you're comfortable designing and analyzing nontrivial algorithms; using O, o, Θ, Ω, and ω notation; solving recurrences; working through standard graph and sequence algorithms; and structuring proofs of correctness.

We'll update this site with more information as we get closer to the start of the quarter. In the meantime, feel free to email me at htiek@cs.stanford.edu if you have any questions about the class!

Schedule and Readings

This syllabus is still under construction and is subject to change as we fine-tune the course. Stay tuned for more information and updates!

Tuesday	Thursday
Linear Probing May 14 Linear probing is one of the oldest and simplest strategies for building a hash table. It's easy to implement and, for a number of reasons, is extremely fast in practice. The analysis of linear probing dates was due to Knuth, which assumed truly random hash functions. But what happens if we tighten up this restriction and say that the hash functions aren't truly random but are, instead, just k-independent for some fixed k? Then the analysis gets a lot more challenging, but also a lot more interesting. In exploring how the math works out, we'll get a richer understanding for how to analyze randomized data structures. Slides: Slides Condensed Slides Readings: CLRS, Chapter 11.4 (summary of linear probing) Knuth, Donald. Notes on "Open" Addressing. Thorup, Mikkel. Lecture Notes on Linear Probing with 5-Independent Hashing. Patrascu, Mihai and Thorup, Mikkel. On the k-Independence Required by Linear Probing and Minwise Independence.
Splay Trees May 7 Balanced binary search trees give worst-case O(log n) times on each tree operation. If we're trying to guarantee worst-case efficiency, this is as good as it gets. But worst-case efficiency doesn't capture everything. We can find a bunch of other properties we'd like our data structures to have. Sometimes, we get those properties by directly designing for them. Sometimes, we get those properties by having almost no structural guarantees at all. Slides: Slides Condensed Slides Readings: Sleator, Daniel and Tarjan, Robert. Self-Adjusting Binary Search Trees. Handouts: Handout 11: Problem Set 4 \| (LaTeX Template)	Frequency Estimators May 9 How can Google keep track of frequent search queries without storing all the queries it gets in memory? How can you estimate frequently- occurring tweets without storing every tweet in RAM? As long as you're willing to trade off accuracy for space, you get get excellent approximations. Slides: Slides Condensed Slides Readings: Charikar et al. Finding Frequent Items in Data Streams. Cormode, Graham and Muthukrishnan, C. An Improved Data Stream Summary: The Count-Min Sketch and its Applications.
Binomial Heaps April 30 Binomial heaps are a simple and flexible priority queue structure that supports efficient melding of priority queues. The intuition behind binomial heaps is particularly elegant, and they'll serve as a building block toward the more complex Fibonacci heap data structure that we'll talk about on Thursday. Slides: Slides Condensed Slides Readings: Vuillemin, Jean. A Data Structure for Manipulating Priority Queues	Fibonacci Heaps May 2 Fibonacci heaps are a type of priority queue that efficiently supports decrease-key, an operation used as a subroutine in many graph algorithms (Dijkstra's algorithm, Prim's algorithm, the Stoer-Wagner min cut algorithm, etc.) They're formed by a clever transformation on a lazy binomial heap. Although Fibonacci heaps have a reputation for being ferociously complicated, they're a lot less scary than they might seem! Slides: Slides Condensed Slides Readings: CLRS: Chapter 19 Fredman, Michael and Tarjan, Robert. Fibonacci Heaps and Their Uses in Improved Network Optimization Algorithms
Balanced Trees, Part II April 23 We've spent a lot of time trying to figure out how to build nice balanced trees. Now that we've got them, what else can we do with them? In this lecture, we'll see two more advanced techniques - tree augmentation and the split/join operations - that will make it possible to build significantly more complex data structures in the future. Slides: Slides Condensed Slides Readings: CLRS, Chapter 14.	Amortized Analysis April 25 In many cases we only care about the total time required to process a set of data. In those cases, we can design data structures that make some operations more expensive in order to lower the total cost of all aggregate operations. How do you analyze these structures? Slides: Lecture Slides Condensed Slides Readings: CLRS: Chapter 17 Handouts: Handout 08: Problem Set 3 \| (LaTeX Template) \| (Solutions) Handout 09: Final Project Handout 10: Suggested Project Topics
Constructing Suffix Arrays April 16 Suffix trees and suffix arrays are amazing structures, but they'd be much less useful if it weren't possible to construct them quickly. Fortunately, there are some great techniques for building suffix arrays and suffix trees. By using the fact that suffixes overlap and simulating what a multiway merge algorithm would do in certain circumstances, we can rapidly build these beautiful structures. Slides: Slides Condensed Slides Readings: Ko, Pang and Aluru, Srinivas. Linear Time Construction of Suffix Arrays Nong, Ge, Zhang, Sen, and Chan, Wai Hong. Linear Suffix Array Construction by Almost Pure Induced Sorting Handouts: Handout 07: Problem Set 2 \| (LaTeX Template) \| (Solutions)	Balanced Trees, Part I April 18 Balanced search trees are among the most versatile and flexible data structures. They're used extensively in theory and in practice. What sorts of balanced trees exist? How would you design them? And what can you do with them? Slides Slides Condensed Slides Readings: CLRS, Chapter 13. CLRS, Chapter 18. Bayer, Rudolf and McCreight, Edward. Organization and Maintenance of Large Ordered Indices Guibas, Leo and Sedgewick, Robert. A Dichromatic Framework for Balanced Trees
Suffix Trees April 9 To kick off our discussion of string data structures, we'll be exploring tries, Patricia tries, and, most importantly, suffix trees. These data structures provide fast solutions to a number of algorithmic problems and are much more versatile than they might initially seem. What makes them so useful? What properties of strings do they capture? And what intuitions can we build from them? Slides: Slides Condensed Slides Handouts: Handout 06: Problem Set 1 \| (LaTeX Template) \| (Solutions)	Suffix Arrays April 11 What makes suffix trees so useful as a data structure? Surprisingly, much of their utility and flexibility can be attributed purely to two facts: they keep the suffixes sorted, and they expose the branching words in the string. By representing this information in a different way, we can get much of the benefit of suffix trees without the huge space cost. Readings: Manber, Udi and Myers, Gene Suffix Arrays: A New Method for On-Line String Searches Kasai, Toru et al. Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications Slides: Slides Condensed Slides
Range Minimum Queries, Part One April 2 The range minimum query problem is the following: given an array, preprocess it so that you can efficiently determine the smallest value in a variety of subranges. RMQ has tons of applications throughout computer science and is an excellent proving ground for a number of advanced algorithmic techniques. Slides: Slides Condensed Slides Readings: Handout 00: Course Information Handout 01: CS166 Syllabus Handout 02: Math Terms and Identities Handout 03: Problem Set Policies Handout 04: CS166 and the Honor Code Handout 05: Problem Set 0 \| (LaTeX Template) \| (Solutions)	Range Minimum Queries, Part Two April 4 Our last lecture took us very, very close to a ⟨O(n), O(1)⟩-time solution to RMQ. Using a new data structure called a Cartesian tree in conjunction with a technique called the Method of Four Russians, we can adapt our approach to end up with a linear-preprocessing-time, constant-query-time solution to RMQ. In doing so, we'll see a number of clever techniques that will appear time and time again in data structure design. Slides: Slides Condensed Slides Readings: Fischer, Johannes and Heun, Volker. Theoretical and Practical Improvements on the RMQ-Problem, with Applications to LCA and LCE