The AV16.3 corpus is an audio-visual corpus of 43 real indoor
multispeaker recordings, designed to test algorithms for audio-only,
video-only and audio-visual speaker localization and tracking. Real human speakers were used. The variety of recordings was
chosen to test algorithms to their limits, and to cover a wide range of
applicative scenarii (meetings, surveillance). The emphasis is on
overlapped speech and multiple moving speakers. Recordings include
mostly dynamic scenarii, with single and multiple moving speakers. A
few meeting scenarii, with mostly seated speakers, are also included. More...
Jacek Dmochowski, Jacob Benesty and Sofiène Affes evaluated the Multichannel Cross-Correlation Coefficient (MCCC)-based acoustic localization (paper, journal paper).
Oscar Varela Serrano proposed and tested new Voice Activity Detection features (paper).
Javed Ahmed tested his Robust and Real-Time Visual Tracking Framework (PhD Thesis).
Recordings were made with two 8-microphone Uniform Circular Arrays (16
kHz sampling frequency) and three digital cameras (25 frames per
second) around the meeting room, hence the "AV16.3" name. Whenever
possible, lapel microphones were also worn by each speaker. All
sensors were synchronized. Thus, the three cameras were calibrated and
used to determine the ground-truth 3-D location of the mouth of each
speaker, with a maximum error of 1.2 cm. To the best of our knowledge,
this audio-visual annotated corpus was the first to be made publicly
available (recorded in fall 2003, published in June 2004 at the MLMI'04 workshop).
How to access the corpus (data + annotation + tools)
Freely download it through the various links in the "Contents" section below.
If you need to extract PPM images from an AVI file you can use mplayer, as in the following example: mplayer -ss 00:00:00 -vo pnm -vf scale=360:288 seq03-1p-0000_cam1_divx_audio.avi
Compatibility issues: there are a few binary data files
("*.mat", created with MATLAB 6.5.1). If your MATLAB version
cannot read those, then use the MATLAB scripts that permit to
recreate their content ("*_mat.m" ASCII file, in the same
directory as each "*.mat" file).
The same 3D calibration parameters as in session 08 were used for the following recording sessions (09 10 11 12), only a small 2D shift correction was applied: