MIME-Version: 1.0 Server: CERN/3.0 Date: Wednesday, 20-Nov-96 18:56:53 GMT Content-Type: text/html Content-Length: 5204 Last-Modified: Friday, 27-Oct-95 20:37:05 GMT Faculty Research Interests : Sam Toueg

Sam Toueg

Professor
Ph.D., Princeton University, 1979

Research Interests

My research interests include distributed computing, fault-tolerance and real-time. I work on methodologies, paradigms, and algorithms for fault-tolerant distributed systems, in both message-passing and shared-memory systems. My long-term goal is to bridge the gap between theoretical results and the need for efficient and practical solutions. In collaboration with Tushar Chandra and Prasad Jayanti, two Ph.D. Computer Science students, we continued our work on unreliable failure detectors for message-passing systems, and on wait-free objects for shared-memory systems.

A fundamental result of fault-tolerant distributed computing states that the Consensus problem cannot be solved (with a deterministic algorithm) in asynchronous systems. This impossibility result is due to the inherent difficulty of determining whether a process has crashed (or is merely very slow) in such a system. In our work, we were able to determine exactly how much information about failures is necessary and sufficient to solve Consensus. We first showed one can use W, an unreliable failure detector that can make an infinite number of mistakes, to solve Consensus in systems with a majority of correct processes. We then proved that to solve Consensus, any failure detector has to provide at least as much information about failures as W. Thus, W is the weakest failure detector for solving Consensus in asynchronous systems with a majority of correct processes. We are now exploring the practicality of implementing W, and of applications that rely on W for their correctness.

A concurrent system consists of processes communicating via shared objects. A shared object is wait-free if each process that accesses this object is guaranteed to get a response even if all the other processes crash. We are now exploring wait-free hierarchies of object types, where each object (type) is assigned to a level that corresponds to its ability in implementing other wait-free objects. In particular, Prasad Jayanti has shown that a well-known hierarchy (Herlihy's) is not robust: Informally, in this hierarchy there is an object at level 2 that can be used to implement wait-free objects at any level. We are now exploring the question of whether robust wait-free hierarchies exist.

Selected Publications

  • Bracha, G., and S. Toueg. Asynchronous consensus and broadcast protocols. Journal of the ACM, vol. 32, 10, 1985, 824-840.

  • Srikanth, T. K., and S. Toueg. Optimal clock synchronization. Journal of the ACM, vol. 34, 3, 1987, 626-645.

  • El Abbadi, A., and S. Toueg. Maintaining availability in partitioned replicated databases. ACM Transactions on Database Systems, vol. 14, 2, 1989, 264-290.

  • Neiger, G., and S. Toueg. Automatically increasing the fault-tolerance of distributed algorithms. Journal of Algorithms, vol. 11, 3, 1990, 374-419.

  • Chandra, T., and S. Toueg. Unreliable failure detectors for asynchronous systems. Proceedings 10th ACM Symposium on Principles of Distributed Computing . August 1991, Montreal, Canada, 257-272.

  • Chandra, T., V. Hadzilacos and S. Toueg. The weakest failure detector for solving consensus. Proceedings 11th ACM Symposium on Principles of Distributed Computing , August 1992, Vancouver, Canada, 147-158.

  • Jayanti, P., Chandra, T., and S. Toueg. Fault-tolerant wait-free shared objects. Proceedings 33rd IEEE Symposium on Foundations of Computer Science, October 1992, Pittsburgh, Pennsylvania, 157-166.

  • Neiger, G., and S. Toueg. Simulating synchronized clocks and common knowledge in distributed systems. Journal of the ACM, vol. 40, 2, 1993, 334-367.