EECS 4101/5101, Winter 2019
EECS 4101/5101: Advanced Data Structures
Winter 2019
Web page contents:
General Information
Announcements
Important Dates
Resources
Reading
Course Handouts
General Information
Instructor: Eric Ruppert
Office: Room 1003H of the Lassonde Building (enter through Lassonde 1012 hallway)
Telephone: (416) 7362100 ext. 33993
Lectures: Tuesdays and Thursdays, 11:3013:00 in room 0005 of the Dahdaleh Building
Email: [my last name]@cse.yorku.ca (Please use a York email account when sending me email, and start your subject line with "[4101]" or "[5101]".)
Office Hours: Tuesdays 14:0015:00, Thursdays 15:0016:00.
Learning Outcomes
In this course, you will be invited to develop your ability to think clearly and carefully about data structures and the algorithms that operate on them, and to improve your skills in expressing those thoughts about data structures in a precise way.
Data structures are a crucial component of most computer applications. Understanding them is a key ingredient for writing correct and efficient computer programmes.
By the end of this course, you will be able to do the following things.

Describe important classes of data structures (dictionaries, priority queues, disjoint set union) and explain how they work.
Before trying to design your own data structures, it is important to be aware of a good collection of classical ones. They are useful in their own right, but the ideas that underlie them are also useful for new data structures.

Augment data structures to expand their functionality.
If one of the standard data structures doesn't work for the task at hand, it is useful to be able to tweak an existing one to fit your purpose.

Analyze the efficiency of data structures, including the use of amortized analysis and online competitive analysis.
In many cases, there are multiple data structures that could be used to solve a problem. In some cases, it is important to be able to analyze the different options and choose the one that will solve the problem most efficiently. Amortized and competitive analysis are two special types of analysis that are useful in some settings.

Demonstrate knowledge of important themes in data structure design such as persistence, selfadjustment, concurrency.
This course is not just about going through a catalogue of data structures. Certain ideas can be applied in the construction of many different data structures, and we want to understand those concepts too. Persistence deals with maintaining multiple versions of stored data so that old versions can be accessed when needed. Selfadjustment is used instead of carefully rebalancing data structures to improve the shape of the data structure in response to the sequence of operations performed on it. Concurrent data structures allow multiple threads to access the data simultaneously.

Select or design appropriate data structures to help solve problems for algorithmic applications.
The ultimate goal is to ensure that you can find (or build) the right data structure for whatever task you want to solve. You should also be able to justify your design choices and explain why the algorithms that operate on the data structures are correct.
How to Learn This Material
Much of the material in this course is in the same vein as the material of EECS3101. To develop your understanding of the material, it is important to do more than just read the textbook and attend lectures: you should work through exercises.
You can often learn by struggling with problems. However, if you get too stuck or don't know how to begin, help is available. Talk to your classmates (however; see the notes below about academic honesty regarding discussing assignment problems with others). Go to office hours; the instructor is there to help you! You also learn by making mistakes and getting feedback about those mistakes. Just make sure that you use the feedback to improve your understanding.
Groups of students can learn a lot by explaining their solutions to the suggested exercises from the textbook to one another and critiquing the solutions of others. After all, learning how to explain solutions clearly is one of the goals of this course. Seeing where other students' solutions are unclear to you helps you make your own explanations clearer. Be aware that a problem may have many different correct solutions; just because someone's solution is different from yours doesn't necessarily mean that one of them is wrong.
It takes time to build new skills, so it helps if you work on exercises regularly: don't leave all the work to the days right before a test. Similarly, some of the homework assignments will be difficult to finish if you leave them to the last minute.
Academic Honesty
The key to academic honesty for this course is simply this: Solutions that you submit should be your own work.
Although you may discuss the general approach to solving a homework problem
with other people, you should not discuss the solution in detail.
You must not take any written notes away from such a discussion.
Also, you must list on the cover page of your solutions any
people with whom you have discussed the problems.
The solutions you hand in should be your own work. While
writing them, you may look at the course textbook and your own
lecture notes but no other outside sources.
It is not acceptable to try to find the answer to a homework question on the web, put it in your own words and submit it. You may learn a little by doing this, but you will learn much, much more by working on the problem yourself, and the purpose of this course is to help you learn how to design and analyze data structures on your own.
Furthermore, the web will not be available during your exam (or during your job interview at Google), so you should learn to solve problems yourself, instead of relying on others to do your thinking for you.
As time runs out, students are sometimes tempted to get help from other students on assignments in a way that would violate the preceding policy on academic honesty. DO NOT DO THIS! If you do, I will refer the case to the Dean's Office, which will be very unpleasant for you. The assignments are worth very little, so it is not worth risking a sizable punishment. (Furthermore, I have noticed that the students who cheat on the homework assignments almost always fail the tests and exams, so even if I do not catch you cheating, you will likely fail the course if you do not do your own work on the homework assignments.)
It is important that you look at the departmental
guidelines
on academic honesty.
Marking Scheme
 EECS4101  EECS5101 
Homework exercises  15%  15% 
Test #1  22.5%  15% 
Test #2  22.5%  15% 
Project (EECS5101 only)  0%  20% 
Exam  40%  35% 
It's a very good idea to type your solutions to homework assignments, since it allows you to edit and polish the answers, but handwritten solutions are also acceptable, as long as they are legible. If you want to type your solutions, LaTeX produces elegantly typeset documents, is available for free, and was built to handle even the most complicated mathematical notation. It can take a while to learn how to use it, but once you do, you will probably not want to type documents any other way.
You should make every effort to make your answers as brief as possible, while still being thorough. Brevity requires careful thought and editing. (Pascal once excused himself for writing a long letter, saying that he did not have enough time to write a shorter one.) Students who write copious amounts usually do not know what they want to say, or are saying it in a very disorganized way.
Usually, an answer to a homework question should fit on one sheet of paper. If you are writing much more than that, you probably have not found the best way to solve it. On tests, your answer should usually fit into the space provided for it.
Announcements
 (May 2) You can check your grades on all pieces of work by logging into a departmental unix machine and typing "courseInfo 4101 201819 W". Come see me if you would like to view your exam or pick up assignments.
 (Apr 1) Clarification for Assignment 8: the set up for part (c) is the same as for parts (a) and (b), except for anonymity. Thus, the counter implementation is restricted to only using reads and writes of shared memory.
 (Mar 28) As mentioned in class today, there will be a bonus quiz in class on Tuesday. Any material from the whole course may be included. For the quiz, you may bring one 8.5by11inch sheet of paper with handwritten notes on both sides.
 (Mar 28) For the exam, you will also be allowed one 8.5by11inch sheet of paper with handwritten notes on both sides.
 (Mar 27) The Mar 13 announcement below about the bug in the Delete code has been updated to also recognize that the problem could occur on line 72.
 (Mar 21) Please fill out a course evaluation online at this link. They are available from March 21 to April 4.
 (Mar 21) Typo in solutions to assignment 6. The upper bound is correct, but we could make it tighter: In 4(d), the rightmost column for line 74 should be 2a and 22a (instead of 3a and 32a). This means that we could choose smaller constants a=1 and b=4 in part (e), yielding an amortized bound of 13 per insert in (f).
 (Mar 14) I will have a "review session" on Friday, April 5 in Room 105 of Founders College, starting at 2pm. This optional session is a chance for you to ask questions prior to the exam and listen to other students' questions.
 (Mar 13) Due to popular demand, the deadline for Assignment 6 will be extended to 11:30 on Tuesday, March 19.
 (Mar 13) A student found the bug I inserted into the code for DELETE(k) of Assignment 6. The problem occurs when line 72 or 74 moves key k from current into the current's child next. If this happens, found will no longer point to the node containing k. To repair the bug, we should update found to be next if the key moved in line 72 or 74 is k.
 (Mar 11) Alumni series: Nima Shahbazi will give a talk on March 25 at 1pm in BRG125 on his use of machine learning to win a million dollars. For more details, see this link.
 (Mar 11) Bug contest update: A student has identified a bug in the code for Assignment 6: on line 71, "more than" should be replaced by "at least".
 (Mar 8) There was a (big) mistake in my solution to question 2(a) of Assignment 5 that I handed out yesterday. Revised solutions will be handed out in the next class.
 (Mar 7) I will have an extra office hour on Mar 18 at 13:0014:00 since there is a test on Mar 19. (The office hour on Mar 19 will be cancelled because I have to go to a meeting.)
 (Mar 1) For assignment 5, question 2, intepret "between ymin and ymax" in the inclusive sense: i.e., an object whose y value is equal to ymin or ymax is a candidate to be returned for a QUERY operation.
 (Mar 1) For the test on Mar 19, you are permitted to bring one 8.5by11inch sheet of paper with handwritten notes on one side of the paper.
 (Feb 21) I'm cancelling today's office hour too, because I'm sick this week.
 (Feb 19) Today's office hour is cancelled. Sorry for the short notice.
 (Feb 12) The university is closed on Feb 12 due to the snowstorm. The test that was scheduled for Feb 12 will instead be held in class on Thursday, Feb 14.
 (Feb 9) For assignment 4, question 2, assume that the graph representing the initial highway network is provided in an adjacency list representation. In other words, the nodes are labelled 1..n and for each node i, there is a list E[i] of edges of the form (i,j), sorted by j.
 (Jan 25) For the test on Feb 12, you are permitted to bring one 8.5by11inch sheet of paper with handwritten notes on one side of the paper.
 (Jan 24) The deadline for assignment 3 has been changed to 11:30 am on Thursday, Jan 31.
 (Jan 21) If you are interested in research opportunities this summer, see this link. Some of the deadlines are coming up soon.
 (Jan 10) Typo on assignment 1, question 1(f): "heap" should be "priority queue".
 (Jan 10) For assignment 1, question 1(e), you may assume as a precondition for the {\sc Insert} operation that the item being inserted is not already in the priority queue.
 (Jan 8) There will be no office hour on Thursday, January 17.
 (Jan 8) The date of Test 1 has been moved to February 12 so that you can attend Richard Stallman's colloquium
Important Dates
First class  Thursday, January 3 
Test 1  Thursday, February 14 (revised date) 
Reading week (no classes)  February 1822 
Last date to drop course without receiving a grade  Friday, March 8 
Test 2  Tuesday, March 19 
Last class  Tuesday, April 2 
Last date to withdraw from course (receiving W on transcript)  Wednesday, April 3 
Exam period  April 520 
Resources
Textbook
 Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and
Clifford Stein, Introduction to Algorithms, 3rd edition,
MIT Press & McGrawHill, 2009. For solutions to some of the textbook's exercises and known bugs in the text, see the textbook's web page.
Copies of the text are on reserve at the Steacie Library. You can also access an electronic version of the textbook through the York Library website.
Other References
 Gonzalo Navarro, Compact Data Structures, Cambridge University Press, 2016.
 Peter Brass, Advanced Data Structures, Cambridge University Press, 2008.
 Ronald L. Graham, Donald E. Knuth, and Oren Patashnik, Concrete
Mathematics: A Foundation for Computer Science (2nd edition),
AddisonWesley, 1994.
 Pat Morin, Open data structures.
 Hanan Samet, Foundations of Multidimensional and Metric Data Structures, Elsevier/Morgan Kaufmann, 2006.
 Clifford A. Shaffer, Data Structures and Algorithm Analysis, edition 3.2.0.10, 2013.
 Daniel Solow, How to Read and Do Proofs (6th edition), Wiley, 2013.
Reading
This section will be filled in as the term progresses.
Don't fall behind with your reading.
Date  Topics  Reading in CLRS (3rd ed)  Suggested questions, mostly from CLRS (3rd ed) 
January 3 
Introduction 
partial notes 

January 8 
Aggregate Analysis 
17.1 
17.11 to 17.13, 172 
January 10 
Accounting Method 
17.2 
17.21 to 17.23 
January 15 
Potential Method 
17.3 
17.31 to 17.37 
January 17 
Dynamic Tables, including eager doubling/halving (not in text); Binomial Heaps 
17.4 (and handout for binomial heaps) 
17.43, 192 
Jan 22 
Fibonacci Heaps 
19 
19.21, 19.32, 19.41, 19.42, 193 
Jan 29 
UnionFind Data Structures 
21 (see handout in place of Section 21.4) 
21.21, 21.31, 21.32, 21.33, 21.34, 21.35, 211, 212, 213 
Feb 5 
Binary Search Trees 
12 
12.13, 12.15, 12.24, 12.27, 12.33, 12.42, 12.43, 122, 123 
Feb 14 
Red Black Trees 
13 
13.14, 13.16, 13.17, 13.24, 13.32, 13.43, 13.46, 13.47, 133, 134, 174 
Feb 26 
Augmenting Data Structures 
14 
14.13, 14.15, 14.21, 14.22, 14.23, 14.24, 14.36, 142, 173 
Mar 5 
BTrees 
18 
18.12, 18.13, 18.14, 18.21, 18.26, 181 
Mar 7 
Hashing 
review 11.1 and 11.2; read 11.3, 11.5 
11.14, 11.31, 114 
Mar 14 
Dynamic perfect hashing and cuckoo hashing 
Paper on dynamic perfect hashing, Notes on cuckoo hashing by Rasmus Pagh 

Mar 26 
Lockfree data structures 
Some slides (omit p.2934) 

Course Handouts
 Notes from first class on ADT specification.
 Assignment 1 (the corrections mentioned in the announcements above are now included in this handout)
 Assignment 2
 Assignment 3
 Test 1 from a previous year for practice
 Assignment 4
 Assignment 5
 Assignment 6 Note: I have inserted one bug in the pseudocode of Question 2 on the assignment. (This bug shouldn't really affect your answer.) There could be other, unintentional bugs.
If you find a bug in the pseudocode (and I agree that it really is a bug that will cause incorrect behaviour), send me email explaining how the bug you found can cause incorrect behaviour.
The first person to report each bug will get a 2% bonus on their final grade for the course. Any bugs found by your classmates will be posted as announcements on the course web page. However, if you report a bug that turns out not to be a bug, you may not submit any further bug reports.
If nobody finds the intentional bug by the end of Monday, March 11, I will announce what it is.
 Test 2 from a previous year for practice
 Exam from a previous year for practice
 For graduate students in EECS5101 only: Wikipedia assignment
 Assignment 7
 Assignment 8
Solutions to assignments will be handed out in class on the day they are due.
Updated May 2, 2019