Skip Navigation
York U: Redefine the PossibleHOME | Current Students | Faculty & Staff | Research | International
Search »FacultiesLibrariesCampus MapsYork U OrganizationDirectorySite Index
Future Students, Alumni & Visitors
2002 Technical Reports

Audio-visual localization of multiple speakers in a video teleconferencing setting

Bill Kapralos, Michael R. M. Jenkin and Evangelos Milios

Technical Report CS-2002-02

York University

July 15, 2002


Attending to multiple speakers in a video teleconferencing setting is a complex task. From a visual point of view, multiple speakers can occur at different locations and present radically different appearances. From an audio point of view, multiple speakers may be speaking at the same time, and background noise may make it difficult to localize sound sources without some a priori estimate of the sound source locations. This paper presents a novel sensor and corresponding sensing algorithms to address the task of attending simultaneously to multiple speakers for video teleconferencing. A panoramic visual sensor is used to capture a 360o view of the speakers in the environment, and from this view potential speakers are identified via a color histogram approach. A directional audio system based on beamforming is then used to confirm potential speakers and attend to them. Experimental evaluation of the sensor and its algorithms are presented including sample performance of the entire system in a teleconferencing setting.

Download paper in PDF format.

The documents distributed by this server have been provided by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a noncommercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.