COSC6117, Winter 2008

COSC6117 Theory of Distributed Computing Winter 2008

Instructor: Eric Ruppert
Office: Computer Science Building, room 3042
Email: [my last name] @cs.yorku.ca
Telephone: (416) 736-2100 ext. 33979
Facsimile: (416) 736-5872
Lectures: Tuesdays 11:30-13:00 in room 122 of the Chemistry Building and Thursdays 11:30-13:00 in room 2009 of Vari Hall
Office Hours: Tuesdays and Thursdays, 13:00-14:00, or by appointment, or just drop by when I am in my office.

The best way to contact me is probably by email. Please use your cs account when sending me email, and start your subject line with "[6117]". Send messages in plain text, without attachments.

Announcements

• (Feb 25) In exercise 6, I accidentally used the variable m in part (b) before it is defined (in part (c)). Both uses of m are the same, namely the size of the domain from which input values are chosen.
• (Feb 9) Since I'm going to be away until Feb 18 and I won't be able to answer questions that arise during that time, I've decided to extend the deadline for your next exercise until Feb 21.
• (Feb 7) I will be away Feb 10-18 for reading week and probably not reading email during that time.
• (Feb 7) For exercise 5, you can assume the network graph is complete and processes have unique ids from {1,2, ..., n}. When I said you can use multiple copies of the black box, there is no bound on the number of copies your algorithm uses.
• (Feb 4) For exercise 4, you should consider only deterministic algorithms.
• (Jan 27) For exercise 2c, notice that the output is not a single bit as in the other parts, but rather a natural number between 0 and n.
• (Jan 14) For exercise 1, you may assume processes have unique ids.

Course Description

Can a given problem be solved in a distributed system? If so, how efficiently can it be solved? We investigate how the answers to these questions depend on aspects of the underlying distributed system including synchrony, fault-tolerance and the means of communication between processes. A tentative list of topics:
• shared-memory and message-passing models of distributed systems,
• mutual exclusion,
• agreement problems (consensus, leader-election, Byzantine agreement, approximate agreement),
• impossibility results and lower bounds,
• the consensus hierarchy,
• implementing shared data structures,
• randomization in distributed computing,
• self-stabilization, and
• a theoretical model for mobile computing.

Marking scheme

 Homework exercises 80% Class presentation 20%

The presentation will be a 25-minute talk summarizing the results of a research paper on the theory of distributed computing that you find in the literature. Before you start working on this, you should check with me that the paper you choose is appropriate. You will also have to hand in a short (~2 pages) written summary of what you are presenting. Good places to look for a paper include PODC or DISC conference proceedings or the journal Distributed Computing. This survey paper has a bibliography containing lots of papers that would be suitable to choose. If you have a topic in mind, I might be able to help you find a good paper if you come talk to me.

Lectures

These will be filled in as the term progresses.
The references below are intended for students who want to read more about the topics discussed in class. Sometimes the readings might be helpful for the assignments. Sometimes they will extend the ideas covered in lectures.

• Jan 3: Introduction. Two Generals. (See these notes from 2006.)
• Jan 8: Dijkstra's mutual exclusions algorithm [Dij65]. (See notes.)
• Jan 10: Asynchronous failure-free systems with atomic registers and message-passing: how they can simulate each other. Linearizability [HW90]. Measuring quality of distributed algorithms.
• Jan 15: Broadcast. [AW04, Chap 2].
• Jan 17: Distributed MST. Gallager, Humblet and Spira's algorithm (synchronous version) [GHS83], [Lyn96, Sec. 4.4] based on Borůvka's algorithm.
• Jan 22: Analysis of GHS MST algorithm. Leader election in a ring: impossibility in anonymous ring [Ang80],[AW04, Sec 3.1], asynchronous algorithms using ids [AW04, Sec 3.3].
• Jan 24: Leader election in a ring: synchronous algorithm using ids [AW04, Sec 3.4.1], randomized anonymous algorithm using expected O(nlogn) messages [AW04, Sec 14.1].
• Jan 29: Omega(nlogn) lower bound on number of messages for leader election in asynchronous ring (if n is unknown) [AW04, Sec 3.3.3]. Synchronous message-passing consensus algorithm [AW04, 5.1].
• Jan 31: Synchronous consensus in message-passing system with Byzantine failures [AW04, 5.2.5].
• Feb 5: Complete characterization of when it is possible to solve synchronous consensus in message-passing system with Byzantine failures [Lyn94, Sec 6.4,6.5].
• Feb 7: Lower bound on number of rounds needed for synchronous consensus with f halting failures [AW04, Sec 5.1], or [Lyn96, Sec 6.7.]. (The original proof of the lower bound was in [DM90].) Notes on this proof were also posted at this page in the Jan 28 and 30 lectures.
• Feb 12, 14: Reading week (no classes).
• Feb 19: The model of shared-memory systems, sequential specifications, linearizability. A 2-process consensus algorithm using stacks. [HW90,Her91]
• Feb 21: Impossibility of 2-process consensus using registers [FLP85,AW04 Sec 5.3.1] and impossibility of 3-process consensus using stacks and registers [Her91].
• Feb 26: Consensus hierarchy [Her91]. Impossibility of consensus using registers when 1 halting failure can occur [FLP85, AW04 5.3.2].
• Feb 28: Implementation of swmr register in message passing systems with a majority of correct processes [ABD95, or this encyclopedia entry]. Non-blocking implementation of sw snapshot from swmr registers [AADGMS93].
• Mar 4: more implementations: wait-free sw snapshot from sw registers [AADGMS93], counter from sw registers, mw registers from sw snapshot [AW04, in Ch.10].
• Mar 6: mw snapshots from mw registers [AADGMS93], universality of consensus [Her91].
• Mar 11: renaming [AW04, Sec 16.3 (17.3 in old edition)], [MA94].
• Mar 13: go to the seminar.
• Mar 18 and 20: population protocols [AADFP06] and this survey.
• Mar 25: presentations. Przemyslaw Pawluk (Software Transactional Memory), Anna Topol (Robust rendezvous for mobile autonomous agents via proximity graphs in arbitrary dimensions), George Spanogiannopoulos (Consensus and collision detectors in wireless ad hoc networks).
• Mar 27: presentations. Slawomir Kmiec (Parallel Scheduling of Complex Dags under Uncertainty), Hooman Baradaran (Optimally efficient multi-valued byzantine agreement).
• Apr 1: presentations. Hussain Tinwala (A tree-based algorithm for distributed mutual exclusion), Nastaran Shafiei (Unreliable failure detectors for reliable distributed systems), Marcin Kwietniewski (On the power of anonymous one-way communication), Jie Chen (MST Construction in O(log log n) Communication Rounds).

References

There is no required textbook for the course. However, I shall sometimes recommend readings from books or papers. These references will be listed here, and the list will grow during the term. Accessing some of the links below may require you to be logged into a machine at York, so that you can access the ACM Digital Library, etc.

Books

• [AW04] Hagit Attiya and Jennifer Welch. Distributed Computing: Fundamentals, Simulations and Advanced Topics, 2nd edition. Wiley, 2004.
• [GR06] Rachid Guerraoui and Luís Rodrigues. Introduction to Reliable Distributed Programming. Springer, 2006.
• [HS08] Maurice Herlihy and Nir Shavit. The Art of Multiprocessor Programming. Morgan Kaufmann, 2008
• [Lyn96] Nancy A. Lynch. Distributed Algorithms. Morgan Kaufmann, 1996.
• [San06] Nicola Santoro. Design and Analysis of Distributed Algorithms. Wiley, 2006.
• [Tau06] Gadi Taubenfeld. Synchronization Algorithms and Concurrent Programming. Pearson Education, 2006.

Papers

This list is from last year's version of the course, but it gives you an idea of the kinds of topics covered. I'll add and subtract items from the list during the term.

Web Pages

Previous versions of this course: Winter 2002, Fall 2003, Fall 2004, Winter 2006.

Exercises

Updated March 31, 2008