COSC6117, Winter 2008
COSC6117
Theory of Distributed Computing
Winter 2008
Instructor: Eric Ruppert
Office: Computer Science Building, room 3042
Email: [my last name] @cs.yorku.ca
Telephone: (416) 736-2100 ext. 33979
Facsimile: (416) 736-5872
Lectures: Tuesdays 11:30-13:00 in room 122 of the Chemistry Building and Thursdays 11:30-13:00 in room 2009 of Vari Hall
Office Hours: Tuesdays and Thursdays, 13:00-14:00, or by appointment, or just drop by when I am in my office.
The best way to contact me is probably by email. Please use your cs account when sending me email, and start your subject line with "[6117]". Send messages in plain text, without attachments.
Announcements
- (Feb 25) In exercise 6, I accidentally used the variable m in part (b) before it is defined (in part (c)). Both uses of m are the same, namely the size of the domain from which input values are chosen.
- (Feb 9) Since I'm going to be away until Feb 18 and I won't be able to answer questions that arise during that time, I've decided to extend the deadline for your next exercise until Feb 21.
- (Feb 7) I will be away Feb 10-18 for reading week and probably not reading email during that time.
- (Feb 7) For exercise 5, you can assume the network graph is complete and processes have unique ids from {1,2, ..., n}. When I said you can use multiple copies of the black box, there is no bound on the number of copies your algorithm uses.
- (Feb 4) For exercise 4, you should consider only deterministic algorithms.
- (Jan 27) For exercise 2c, notice that the output is not a single bit as in the other parts, but rather a natural number between 0 and n.
- (Jan 14) For exercise 1, you may assume processes have unique ids.
Course Description
Can a given problem be solved in a distributed system? If so, how
efficiently can it be solved? We investigate how the answers to these
questions depend on aspects of the underlying distributed system
including synchrony, fault-tolerance and the means of communication
between processes.
A tentative list of topics:
- shared-memory and message-passing models of distributed systems,
- mutual exclusion,
- agreement problems (consensus, leader-election, Byzantine agreement, approximate agreement),
- broadcast and multicast algorithms,
- impossibility results and lower bounds,
- the consensus hierarchy,
- implementing shared data structures,
- randomization in distributed computing,
- self-stabilization, and
- a theoretical model for mobile computing.
Marking scheme
Homework exercises | 80% |
Class presentation | 20% |
The presentation will be a 25-minute talk summarizing the results of a
research paper on the theory of distributed computing that you find in
the literature. Before you start working on this, you should check
with me that the paper you choose is appropriate. You will also have
to hand in a short (~2 pages) written
summary of what you are presenting. Good places to look for
a paper include
PODC
or
DISC
conference proceedings or the journal
Distributed Computing.
This survey paper has a bibliography containing lots of papers that would be suitable to choose.
If you have a topic in mind, I might be able to help you find a
good paper if you come talk to me.
Lectures
These will be filled in as the term progresses.
The references below are intended for students who want to read more about the topics discussed in class. Sometimes the readings might be helpful for the assignments. Sometimes they will extend the ideas covered in lectures.
- Jan 3: Introduction. Two Generals. (See these notes from 2006.)
- Jan 8: Dijkstra's mutual exclusions algorithm [Dij65]. (See notes.)
- Jan 10: Asynchronous failure-free systems with atomic registers and message-passing: how they can simulate each other. Linearizability [HW90]. Measuring quality of distributed algorithms.
- Jan 15: Broadcast. [AW04, Chap 2].
- Jan 17: Distributed MST. Gallager, Humblet and Spira's algorithm (synchronous version) [GHS83], [Lyn96, Sec. 4.4] based on Borůvka's algorithm.
- Jan 22: Analysis of GHS MST algorithm. Leader election in a ring: impossibility in anonymous ring [Ang80],[AW04, Sec 3.1], asynchronous algorithms using ids [AW04, Sec 3.3].
- Jan 24: Leader election in a ring: synchronous algorithm using ids [AW04, Sec 3.4.1], randomized anonymous algorithm using expected O(nlogn) messages [AW04, Sec 14.1].
- Jan 29: Omega(nlogn) lower bound on number of messages for leader election in asynchronous ring (if n is unknown) [AW04, Sec 3.3.3]. Synchronous message-passing consensus algorithm [AW04, 5.1].
- Jan 31: Synchronous consensus in message-passing system with Byzantine failures [AW04, 5.2.5].
- Feb 5: Complete characterization of when it is possible to solve synchronous consensus in message-passing system with Byzantine failures [Lyn94, Sec 6.4,6.5].
- Feb 7: Lower bound on number of rounds needed for synchronous consensus with f halting failures [AW04, Sec 5.1], or [Lyn96, Sec 6.7.]. (The original proof of the lower bound was in [DM90].) Notes on this proof were also posted at this page in the Jan 28 and 30 lectures.
- Feb 12, 14: Reading week (no classes).
- Feb 19: The model of shared-memory systems, sequential specifications, linearizability. A 2-process consensus algorithm using stacks. [HW90,Her91]
- Feb 21: Impossibility of 2-process consensus using registers [FLP85,AW04 Sec 5.3.1] and impossibility of 3-process consensus using stacks and registers [Her91].
- Feb 26: Consensus hierarchy [Her91]. Impossibility of consensus using registers when 1 halting failure can occur [FLP85, AW04 5.3.2].
- Feb 28: Implementation of swmr register in message passing systems with a majority of correct processes [ABD95, or this encyclopedia entry]. Non-blocking implementation of sw snapshot from swmr registers [AADGMS93].
- Mar 4: more implementations: wait-free sw snapshot from sw registers [AADGMS93], counter from sw registers, mw registers from sw snapshot [AW04, in Ch.10].
- Mar 6: mw snapshots from mw registers [AADGMS93], universality of consensus [Her91].
- Mar 11: renaming [AW04, Sec 16.3 (17.3 in old edition)], [MA94].
- Mar 13: go to the seminar.
- Mar 18 and 20: population protocols [AADFP06] and this survey.
- Mar 25: presentations. Przemyslaw Pawluk (Software Transactional Memory), Anna Topol (Robust rendezvous for mobile autonomous agents via proximity graphs in
arbitrary dimensions), George Spanogiannopoulos (Consensus and collision detectors in wireless ad hoc networks).
- Mar 27: presentations. Slawomir Kmiec (Parallel Scheduling of Complex Dags under Uncertainty), Hooman Baradaran (Optimally efficient multi-valued byzantine agreement).
- Apr 1: presentations. Hussain Tinwala (A tree-based algorithm for distributed mutual exclusion), Nastaran Shafiei (Unreliable failure detectors for reliable distributed systems),
Marcin Kwietniewski (On the power of anonymous one-way communication), Jie Chen (MST Construction in O(log log n)
Communication Rounds).
References
There is no required textbook for the course. However, I shall sometimes recommend readings from books or papers. These references will be listed here, and the list will grow during the term. Accessing some of the links below may require you to be logged into a machine at York, so that you can access the ACM Digital Library, etc.
Books
- [AW04] Hagit Attiya and Jennifer Welch. Distributed Computing:
Fundamentals, Simulations and Advanced Topics, 2nd edition. Wiley, 2004.
- [GR06] Rachid Guerraoui and Luís Rodrigues. Introduction to Reliable Distributed Programming. Springer, 2006.
- [HS08] Maurice Herlihy and Nir Shavit. The Art of Multiprocessor Programming. Morgan Kaufmann, 2008
- [Lyn96] Nancy A. Lynch. Distributed Algorithms. Morgan Kaufmann, 1996.
- [San06] Nicola Santoro. Design and Analysis of Distributed Algorithms. Wiley, 2006.
- [Tau06] Gadi Taubenfeld. Synchronization Algorithms and Concurrent Programming. Pearson Education, 2006.
Papers
This list is from last year's version of the course, but it gives you an idea of the kinds of topics covered. I'll add and subtract items from the list during the term.
- [AADGMS93] Yehuda Afek, Hagit Attiya, Danny Dolev, Eli Gafni, Michael Merritt and Nir Shavit. Atomic snapshots of shared memory. J. of the ACM, 40(4), pages 873-890, September 1993.
- [Ang80] D. Angluin. Local and global properties in networks of processors. In Proc. 12th ACM Symposium on Theory of Computing, pages 82-93, 1980.
- [AADFP06] Dana Angluin, James Aspnes, Zoë Diamadi, Michael J. Fischer and René Peralta. Computation in networks of passively mobile finite-state sensors. Distributed Computing, 18(4), pages 235-253, 2006.
- [ABD95] Hagit Attiya, Amotz Bar-Noy and Danny Dolev, Sharing memory robustly in message-passing systems. J. of the ACM, 42, p124-142, 1995.
- [ABDPR90] Hagit Attiya, Amotz Bar-Noy, Danny Dolev, David Peleg and Rüdiger Reischuk. Renaming in an asynchronous environment. J. of the ACM, 37(3), pages 524-548, July 1990.
- [AGPV90] Baruch Awerbuch, Oded Goldreich, David Peleg and Ronen Vainish. A trade-off between information and communication in broadcast protocols. J. of the ACM, 37(2), pages 238-256, April 1990.
- [BG93] P. Berman and J. Garay. Cloture votes: n/4-resilient distributed consensus in t+1 rounds. Mathematical Systems Theory 26(1), pages 3-19, 1993.
- [BW01] Saâd Biaz and Jennifer L. Welch. Closed form bounds for clock synchronization under simple uncertainty assumptions. Information Processing Letters, 80, pages 151-157, 2001.
- [Dij65] E. W. Dijkstra. Solution of a problem in concurrent programming control . Communications of the ACM, 8(9), page 569, September, 1965. A very early paper on mutual exclusion.
- [Dij74] Edsger W. Dijkstra. Self-stabilizing systems in spite of distributed control. Communications of the ACM, 17(4), pages 643-644, November 1974.
- [DM90] Cynthia Dwork and Yoram Moses. Knowledge and common knowledge in a Byzantine environment: Crash failures. Information and Computation, 88(2), pages 156-186, October 1990.
- [FR03] Faith Fich and Eric Ruppert. Hundreds of impossibility results for distributed computing. In Distributed Computing, 16(2-3), pages 121-163, 2003.
- [FLP85] Michael J. Fischer, Nancy A. Lynch and Michael S. Paterson. Impossibility of distributed consensus with one faulty process. Journal of the ACM, 32(2), pages 374-382, April 1985.
- [GHS83] R. G. Gallager, P. A. Humblet and P. M. Spira. A distributed algorithm for minimum-weight spanning trees. ACM Transactions on Programming Languages and System Sciences, 5(1), pages 66-77, 1983.
- [HMM85] Joseph Y. Halpern, Nimrod Megiddo and Ashfaq A. Munshi. Optimal precision in the presence of uncertainty. Journal of Complexity, 1, pages 170-196, 1985.
- [Her91] Maurice Herlihy. Wait-free synchronization. ACM Transactions on Programming Languages and Systems, 13(1), pages 124-149, January 1991.
- [HR95] Maurice Herlihy and Sergio Rajsbaum. Algebraic topology and distributed computing: A primer. In Computer Science Today: Recent Trends and Developments, pages 203-217. Springer, 1995.
- [HS99] Maurice Herlihy and Nir Shavit. The topological structure of asynchronous computability. Journal of the ACM, 46(6), pages 858-923, November 1999.
- [HW90] Maurice P. Herlihy and Jeannette M. Wing. Linearizability: A correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems, 12(3), pages 463-492, July 1990.
- [Lam04] Leslie Lamport. A new solution of Dijkstra's concurrent programming problem. Communications of the ACM, 18(8), pages 453-455, August 1974.
- [LL90] Leslie Lamport and Nancy Lynch. Distributed Computing: Models and methods. In Handbook of Theoretical Computer Science, Volume B, Chapter 18, Elsevier, 1990. (On reserve in Steacie Library.) A good, brief introduction to the area. It describes aspects of distributed models and surveys some important early results.
- [LM85] Leslie Lamport and P. M. Melliar-Smith. Synchronizing clocks in the presence of faults. J. of the ACM, 32(1), pages 52-78, 1985.
- [MA94] Mark Moir and James H. Anderson. Fast, Long-Lived Renaming. In Proc. of the 8th International Workshop on Distributed Algorithms, pages 141-155, 1994. (There have been several better versions of adaptive renaming algorithms published afterwards, but I just want to cover the most basic version of the splitter algorithm. See Mark Moir's page for some of the improvements.)
- [Tau04] Gadi Taubenfeld. The black-white bakery algorithm and related bounded-space, adaptive, local-spinning and FIFO algorithms. In Distributed Computing, 18th International Conference, pages 56-70, 2004.
Web Pages
Previous versions of this course: Winter 2002, Fall 2003, Fall 2004, Winter 2006.
Exercises

Updated March 31, 2008