COSC6117, Fall 2008
Instructor: Eric Ruppert
Theory of Distributed Computing
Office: Computer Science Building, room 3042
Email: [my last name] @cse.yorku.ca
Telephone: (416) 736-2100 ext. 33979
Facsimile: (416) 736-5872
Lectures: Mondays 13:00-14:30 in room 318 of Calumet College and Wednesdays 13:00-14:30 in room 230 of Bethune College
Office Hours: Tuesdays 14:00-15:00 (except Feb 10) and Fridays at various times (Feb 13: 1:00 p.m.). You can also try dropping by my office when I'm in or making an appointment by sending me email.
The best way to contact me is probably by email. Please use your cs account when sending me email, and start your subject line with "". Send messages in plain text, without attachments.
- (Feb 23) We will be in room 354 of the Lumbers building for the presentations on Wednesday (1pm to 3pm).
- (Feb 22) We will be in room 2002 of the CSE building for the presentations on Tuesday (10 am to noon).
- (Feb 10) I have to cancel my office hour today. I will be available tomorrow afternoon if you want to come ask questions. (Or you can send me email.)
- (Feb 2) The departmental town hall meeting (mentioned in the announcement below) has been changed to a pizza party on the first floor of the Computer Science and Engineering Building (near the computer labs) at noon on Thursday, Feb 5.
- (Feb 1)
You are invited to a departmental Town Hall Meeting to welcome you back and to answer questions you might have. It will be on Thursday, Feb 5 at 4:00p.m. in lecture hall B of the Computer Science and Engineering Building. Pizza and pop will be available.
- (Jan 30) Classes will resume on Monday, Feb 2.
- (Jan 26) Classes may be starting up again soon, if Bill 145 receives royal assent. Check the York website for details. The timetable for the return to classes is on this page.
- (Jan 21) I will be at a workshop out of town next week (Jan 26-30), so I won't be holding office hours during that time. I will be reachable by email, however.
- (Jan 5) My regular office hours have now resumed, so the first one of the new year will be Jan 6.
- (Dec 19) I hope you enjoy the holidays. Have a Merry Christmas if you celebrate it, and a happy new year.
- (Dec 19) I will have my usual Friday office hour on Dec 19 at 2p.m. I will also have one on Monday, Dec 22 at 2p.m. After that, the university will close for the holidays, so I will not have office hours until the new year.
- (Dec 10) I will be out of town for a few days. This means that my office hour on Fri, Dec 12 is cancelled and my office hour on Tue, Dec 16 will be postponed to Wednesday Dec 17 at 2p.m. (this time will change to 4:30 p.m. if classes have resumed by the 17th). If classes resume while I am out of town, we will reschedule the Dec 15 class for another day. I will try to continue to respond to email while I am away from Toronto.
- (Nov 6) TA's and contract faculty are on strike. The university administration has decided to suspend all classes. So there will not be any 6117 lectures until further notice. A lot of information about the strike is posted at York's "labour dispute toolkit". Check back here too for further announcements.
I am not on strike, so I will continue to hold office hours (although I may change the times of some of them, so check back here), and I am also reachable by email. I may work at home more often during the strike to avoid the hassle of crossing picket lines.
Some deadlines will be postponed as a result of the strike: Homework problem #5 will now be due in the first class when classes resume. Homework problem #6 will now be due in the second class when classes resume. Homework problem #7 will be now due in the fourth class when classes resume. If the strike is prolonged, classes may extend later than originally planned. I'll post further information here when I know it.
I recommend that you use this extra time to work on your assignments and on preparing your presentation.
- (Nov 3) I realized that exercise 5 is harder than I thought. To make it doable, you may assume, without proof, that there is an algorithm that solves binary consensus in the complete graph if n > f+3. I'll also extend the due date for this assignment to Nov 10. (Sorry for the confusion.)
- (Oct 24) Clarifications for exercise 4: In all parts, assume n>4f. In part (a), "correctness" refers to agreement and weak validity. In part (b), "in general" is to contrast (b) with (c). In other words, for part (b) there is no restriction on the inputs.
- (Oct 15) Correction to exercise 3: The wording is a little inconsistent: Instead of having to alert all 4 corners, it is sufficient to alert 1 corner. Thus the condition should be "At least 1 corner receives an alert if at least 1 process receives a request". Also, the output of the leader election algorithm should be that every participating process (i.e. every process that received a leader election request from its user) should output the id of the leader.
- (Oct 8) office hours for the next week: Friday Oct 10 at 1:00, Wednesday Oct 15 at 11:00. (No office hour next Tue Oct 14.)
- (Oct 3) There is such a thing as a free lunch. Our department has invited consultants to review our computer science and engineering programmes and make recommendations on how they can be improved. The consultants want to meet with students. Your input and comments would be valuable to the consultants. You can meet with them on Thursday, Oct 9 from 12:00 to 13:00 in 3033 CSEB, where lunch will be served.
Can a given problem be solved in a distributed system? If so, how
efficiently can it be solved? We investigate how the answers to these
questions depend on aspects of the underlying distributed system
including synchrony, fault-tolerance and the means of communication
A tentative list of topics:
- shared-memory and message-passing models of distributed systems,
- mutual exclusion,
- agreement problems (consensus, leader-election, Byzantine agreement, approximate agreement),
- broadcast and multicast algorithms,
- impossibility results and lower bounds,
- the consensus hierarchy,
- implementing shared data structures,
- randomization in distributed computing,
- self-stabilization, and
- a theoretical model for mobile computing.
|Homework exercises || 80% |
|Class presentation || 20% |
The presentation will be a 25-minute talk summarizing the results of a
research paper on the theory of distributed computing that you find in
the literature. Before you start working on this, you should check
with me that the paper you choose is appropriate. You will also have
to hand in a short (~2 pages) written
summary of what you are presenting. Good places to look for
a paper include
conference proceedings or the journal
This survey paper has a bibliography containing lots of papers that would be suitable to choose.
If you have a topic in mind, I might be able to help you find a
good paper if you come talk to me.
These will be filled in as the term progresses.
The references below are intended for students who want to read more about the topics discussed in class. Sometimes the readings might be helpful for the assignments. Sometimes they will extend the ideas covered in lectures.
- Sep 3: Introduction. Two Generals. (See these notes from 2006.)
- Sep 8: Dijkstra's mutual exclusion algorithm [Dij65].
- Sep 10: Modelling distributed systems, comparing their strength.
- Sep 15: Modelling distributed systems (incl. I/O automata)
- Sep 17: broadcasting [AW04, chapter 2]
- Sep 22: broadcasting (continued)
- Sep 24: distributed MST algorithm [GHS83],[Lyn96, Sec 4.4].
- Sep 29: finishing MST algorithm. Leader election in a ring. [Ang80]
- Oct 6: Leader election in a ring. [AW04, chapter 3].
- Oct 8: Synchronous Byzantine consensus (n=3, f=1). [AW04, Sec 5.2].
- Oct 15: Synchronous Byzantine consensus (impossible for n<=3f, algorithm for n>4f). [AW04, Sec 5.2]
- Oct 20: Synchronous Byzantine consensus (possible iff n>3f and conn>2f) [Lyn96, Sec 6.4, 6.5],
- Oct 22: Synchronous consensus tolerating halting failures (possible iff conn>f).
- Oct 27: Synchronous consensus with halting failures requires f+1 rounds [AW04, Sec 5.1] or [Lyn96, Sec 6.7].
- Oct 29: Linearizability [HW90]. Asynchronous consensus tolerating halting failures using registers is impossible for n=2 [FLP85, AW04 Sec 5.3.1].
There is no required textbook for the course. However, I shall sometimes recommend readings from books or papers. These references will be listed here, and the list will grow during the term. Accessing some of the links below may require you to be logged into a machine at York, so that you can access the ACM Digital Library, etc.
- [AW04] Hagit Attiya and Jennifer Welch. Distributed Computing:
Fundamentals, Simulations and Advanced Topics, 2nd edition. Wiley, 2004.
- [GR06] Rachid Guerraoui and Luís Rodrigues. Introduction to Reliable Distributed Programming. Springer, 2006.
- [HS08] Maurice Herlihy and Nir Shavit. The Art of Multiprocessor Programming. Morgan Kaufmann, 2008
- [Lyn96] Nancy A. Lynch. Distributed Algorithms. Morgan Kaufmann, 1996.
- [San06] Nicola Santoro. Design and Analysis of Distributed Algorithms. Wiley, 2006.
- [Tau06] Gadi Taubenfeld. Synchronization Algorithms and Concurrent Programming. Pearson Education, 2006.
This list is from last year's version of the course, but it gives you an idea of the kinds of topics covered. I'll add and subtract items from the list during the term.
- [AADGMS93] Yehuda Afek, Hagit Attiya, Danny Dolev, Eli Gafni, Michael Merritt and Nir Shavit. Atomic snapshots of shared memory. J. of the ACM, 40(4), pages 873-890, September 1993.
- [Ang80] D. Angluin. Local and global properties in networks of processors. In Proc. 12th ACM Symposium on Theory of Computing, pages 82-93, 1980.
- [AADFP06] Dana Angluin, James Aspnes, Zoë Diamadi, Michael J. Fischer and René Peralta. Computation in networks of passively mobile finite-state sensors. Distributed Computing, 18(4), pages 235-253, 2006.
- [ABD95] Hagit Attiya, Amotz Bar-Noy and Danny Dolev, Sharing memory robustly in message-passing systems. J. of the ACM, 42, p124-142, 1995.
- [ABDPR90] Hagit Attiya, Amotz Bar-Noy, Danny Dolev, David Peleg and Rüdiger Reischuk. Renaming in an asynchronous environment. J. of the ACM, 37(3), pages 524-548, July 1990.
- [AGPV90] Baruch Awerbuch, Oded Goldreich, David Peleg and Ronen Vainish. A trade-off between information and communication in broadcast protocols. J. of the ACM, 37(2), pages 238-256, April 1990.
- [BG93] P. Berman and J. Garay. Cloture votes: n/4-resilient distributed consensus in t+1 rounds. Mathematical Systems Theory 26(1), pages 3-19, 1993.
- [BW01] Saâd Biaz and Jennifer L. Welch. Closed form bounds for clock synchronization under simple uncertainty assumptions. Information Processing Letters, 80, pages 151-157, 2001.
- [Dij65] E. W. Dijkstra. Solution of a problem in concurrent programming control . Communications of the ACM, 8(9), page 569, September, 1965. A very early paper on mutual exclusion.
- [Dij74] Edsger W. Dijkstra. Self-stabilizing systems in spite of distributed control. Communications of the ACM, 17(4), pages 643-644, November 1974.
- [DM90] Cynthia Dwork and Yoram Moses. Knowledge and common knowledge in a Byzantine environment: Crash failures. Information and Computation, 88(2), pages 156-186, October 1990.
- [FR03] Faith Fich and Eric Ruppert. Hundreds of impossibility results for distributed computing. In Distributed Computing, 16(2-3), pages 121-163, 2003.
- [FLP85] Michael J. Fischer, Nancy A. Lynch and Michael S. Paterson. Impossibility of distributed consensus with one faulty process. Journal of the ACM, 32(2), pages 374-382, April 1985.
- [GHS83] R. G. Gallager, P. A. Humblet and P. M. Spira. A distributed algorithm for minimum-weight spanning trees. ACM Transactions on Programming Languages and System Sciences, 5(1), pages 66-77, 1983.
- [HMM85] Joseph Y. Halpern, Nimrod Megiddo and Ashfaq A. Munshi. Optimal precision in the presence of uncertainty. Journal of Complexity, 1, pages 170-196, 1985.
- [Her91] Maurice Herlihy. Wait-free synchronization. ACM Transactions on Programming Languages and Systems, 13(1), pages 124-149, January 1991.
- [HR95] Maurice Herlihy and Sergio Rajsbaum. Algebraic topology and distributed computing: A primer. In Computer Science Today: Recent Trends and Developments, pages 203-217. Springer, 1995.
- [HS99] Maurice Herlihy and Nir Shavit. The topological structure of asynchronous computability. Journal of the ACM, 46(6), pages 858-923, November 1999.
- [HW90] Maurice P. Herlihy and Jeannette M. Wing. Linearizability: A correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems, 12(3), pages 463-492, July 1990.
- [Lam04] Leslie Lamport. A new solution of Dijkstra's concurrent programming problem. Communications of the ACM, 18(8), pages 453-455, August 1974.
- [LL90] Leslie Lamport and Nancy Lynch. Distributed Computing: Models and methods. In Handbook of Theoretical Computer Science, Volume B, Chapter 18, Elsevier, 1990. (On reserve in Steacie Library.) A good, brief introduction to the area. It describes aspects of distributed models and surveys some important early results.
- [LM85] Leslie Lamport and P. M. Melliar-Smith. Synchronizing clocks in the presence of faults. J. of the ACM, 32(1), pages 52-78, 1985.
- [MA94] Mark Moir and James H. Anderson. Fast, Long-Lived Renaming. In Proc. of the 8th International Workshop on Distributed Algorithms, pages 141-155, 1994. (There have been several better versions of adaptive renaming algorithms published afterwards, but I just want to cover the most basic version of the splitter algorithm. See Mark Moir's page for some of the improvements.)
- [Tau04] Gadi Taubenfeld. The black-white bakery algorithm and related bounded-space, adaptive, local-spinning and FIFO algorithms. In Distributed Computing, 18th International Conference, pages 56-70, 2004.
Previous versions of this course:
Updated February 23, 2009