[ README about the AV 16.3 corpus ]

CONTENTS:

( 1 ) General description.
( 2 ) Geometry
( 3 ) File-by-file description.
( 4 ) Log of modifications.

NOTE:

A publication describing how the corpus was defined and recorded,
as well as the available 2D and 3D speaker location ground-truth
and examples of use can be find in:

  http://glat.info/ma/av16.3/AV163.pdf

==================================================
( 1 ) GENERAL DESCRIPTION
==================================================

This is a corpus of audio-visual recordings made in an indoor
environment with 16 microphones and 3 cameras - hence the name
"AV16.3". Also lapels were used when possible. Signals coming from all
sensors were recorded in a fully synchronous manner. In most (but not
all) sequences, people are wearing a colored ball marker on the top of
their head, in order to facilitate 2D annotation and/or 3D
reconstruction.

For detailed description of each recording, see CONTENTS_DETAILED
(annotated recordings) and CONTENTS_DETAILED_ALL (all recordings) in
this directory. It includes a detailed description of the type of
behavior of the actors, as well as timecode information that is useful
to synchronize the various audio/video streams.  IMPORTANT: note that
audio always starts at timecode 00:00:10.00

A short description of each directory is given below, in part ( 2 )

Many different types of behaviours were recorded, including visual
occlusions, speech overlaps, sharp motions, etc. The purpose is
multidisciplinary, the data may be of interest for audio-only
localization and tracking, video-only, or both.

The main motivation for these recordings is systematic assessment of
research algorithms, using e.g. true 3D mouth location provided by
calibrated cameras and 2D measurements on the various images.

For more information the user can report to the IDIAP Research Report
RR 04-28 (http://www.idiap.ch).

A selection of potentially interesting parts of the corpus, beside data:

- MATLAB script to access interesting info such as array geometry, file names, etc.

  http://glat.info/ma/av16.3/seq_av163.m

- Video annotation interfaces (head, ball marker, mouth):

  http://glat.info/ma/av16.3/HAI       ( head box annotation )
  http://glat.info/ma/av16.3/BAI       ( ball marker box annotation )
  http://glat.info/ma/av16.3/MAI       ( mouth location annotation )
  http://glat.info/ma/av16.3/FORMATS   ( description of file formats )

- Examples of use:
  
  http://glat.info/ma/av16.3/EXAMPLES/AUDIO/README  ( single audio source )
  http://glat.info/ma/av16.3/EXAMPLES/VIDEO/README  ( multiple objects )
  http://glat.info/ma/av16.3/EXAMPLES/3D-RECONSTRUCTION/README ( 3D mouth location )


Guillaume Lathoud   -  glathoud@yahoo.fr
Jean-Marc Odobez    -  odobez@idiap.ch
Daniel Gatica-Perez -  gatica@idiap.ch


==================================================
( 2 ) Geometry
==================================================

The room was 8.2m * 3.6m * 2.4m, with a long table in the middle. For
more details consult: ./com02-07.pdf

For all recordings named "seq*" there are two, 8-microphone, uniform
circular microphone arrays (0.1m radius) placed 0.04m above the table.
The centers of the two arrays are separated by 0.8m.

The origin of the 3D referent used everywhere in this corpus is the middle
point between the 2 arrays.  The Z axis is pointing upward.

 * microphone array plane : z = 0
 * table surface :          z = -0.04
 * room floor :             z = -0.84
 * other points (table, walls, panels) : see gt.mat in ./CAL_session08/

(x,y,z) microphone coordinates are given below the graph.


                 window
 
c1
                   
                   m7
                 m8  m6
               m1      m5
                 m2  m4
                   m3


                    Y
                    ^
                    |
                    |
                    O-->X                                 right
                                                          wall

                   m15
                m16  m14
               m9      m13
                m10  m12
                   m11


c2


                         c3


Microphone arrays: 2x8 microphones (m1 to m16 in the graph).
With MATLAB or Octave:

    i = 1:16
    angle = pi * (-1 + 2 * mod(i-1,8) / 8)
    xyz = [ 0.1 * cos(angle); 0.1 * sin(angle) + 0.4 - 0.8 * (i>8); 0 * angle ].'

xyz should look like this:

  -0.10000   0.40000  -0.00000
  -0.07071   0.32929  -0.00000
   0.00000   0.30000  -0.00000
   0.07071   0.32929  -0.00000
   0.10000   0.40000   0.00000
   0.07071   0.47071   0.00000
   0.00000   0.50000   0.00000
  -0.07071   0.47071   0.00000
  -0.10000  -0.40000  -0.00000
  -0.07071  -0.47071  -0.00000
   0.00000  -0.50000  -0.00000
   0.07071  -0.47071  -0.00000
   0.10000  -0.40000   0.00000
   0.07071  -0.32929   0.00000
   0.00000  -0.30000   0.00000
  -0.07071  -0.32929   0.00000

For more information, see also ./seq_av163.m


==================================================
( 3 ) FILE-BY-FILE DESCRIPTION
==================================================

( 3.1 ) Files:

AV163.pdf            -> publication motivating and describing the corpus,
                        including the camera calibration process, 
                        the available 2D and 3D ground-truth,
                        and examples of use.

CONTENTS_DETAILED     -> details about each ANNOTATED recording, including a description
		        of actor(s)' behaviour, timecodes where each video starts, etc.

CONTENTS_DETAILED_ALL -> details about each recording, including a description
		        of actor(s)' behaviour, timecodes where each video starts, etc.

FORMATS              -> describe the format of the annotation files,
			i.e. files with extensions ".headgt", ".mouthgt", ".ballgt", 
		        ".3dmouthgt", ".3dballgt".

README		     -> this file.

readradfile.m        -> MATLAB function to read radial distorsion parameters from a file.

seq_av163.m	     -> useful MATLAB function to get audio and video file pointers, 
                        camera calibration parameters, etc. for a given sequence.

static_gt_2_angle_gt.m -> MATLAB function to extract static speaker annotation
                          ( 3D mouth location + speech/silence segmentation ).
			  It is only applicable to "seq01-1p-0000" and "seq37-3p-0001".
			  See EXAMPLES/AUDIO for an example of use.


( 3.2 ) Data directories:

For example the name "seq37-3p-0001" contains three parts:
- "seq37" is the unique identifier of this recording: sequence #37.
- "3p"    means that overall 3 persons were recorded - but not necessarily 
all visible simultaneously.
- "0001"  are four binary flags giving a quick overview of the contents 
of this recording. From left to right:

  bit 1:   0 means "very constrained",  1 means "mostly unconstrained"
           (general behavior: although most recordings follow some sort of scenario,
           some include very strong constraints such as the speaker 
           facing the microphone arrays at all times)
  
  bit 2:   0 means "static",        1 means "dynamic"       
           (static = sporadic motion (e.g. mostly seated), dynamic = continous motion)

  bit 3:   0 means "minor occlusion(s)",  1 means "at least one major occlusion"     
           (for at least one array or camera: whenever somebody passes in front of 
           or behind somebody else)

  bit 4:   0 means "little overlap",    1 means "significant overlap"       
           (audio: indicates whether there is a significant proportion of overlap
           between speakers and/or noise sources)

Except for seq37, all data directories mentioned here contain two subdirectories:
seq??-?p-????/16kHz:          the audio waveforms.
seq??-?p-????/annotation:     the annotation files.


seq01-1p-0000/  -> Single static speaker.
		-> Continuous 2D and 3D mouth location annotation complete.
                -> Sparse 2D head annotation complete.
		-> Precise speech/silence segmentation complete.
		-> Example of use available in the EXAMPLES/AUDIO directory.

seq11-1p-0100/  -> Single moving speaker sequence.
		-> Sparse 2D mouth location annotation complete.
                -> Sparse 2D head annotation complete.

seq15-1p-0100/  -> Single moving speaker sequence (no ball marker).
		-> Sparse 2D mouth location annotation complete.
                -> Sparse 2D head annotation complete.

seq18-2p-0101/  -> Two moving speakers, getting very close to each other.
		-> Sparse 2D mouth location annotation complete.
                -> Sparse 2D head annotation complete.

seq24-2p-0111/  -> Two moving, walking speakers.
		-> Sparse 2D mouth location annotation complete.
                -> Sparse 2D head annotation complete.

seq37-3p-0001/  -> Three static speakers.
		-> Continuous 2D and 3D mouth location annotation complete.
		-> Rough speech/silence segmentation complete.

seq40-3p-0111/  -> Two static speakers + a third moving speaker.
		-> Sparse 2D mouth location annotation complete.
                -> Sparse 2D head annotation complete.

seq45-3p-1111/  -> Three moving speakers with many occlusion cases.
		-> Sparse 2D mouth location annotation complete.
                -> Sparse 2D head annotation complete.


( 3.3 ) Other data directories 

These include the files in ( 2.1 ) + all other non-annotated
sequences. The only difference between sessions is a minor horizontal
shift in the image plane. If you need to do 3D reconstruction, go to
the corresponding "CAL_sessionXX" directory or use the "seq_av163.m"
Matlab function to obtain complete camera calibration information.

session08/
session09/
session10/
session11/
session12/


This directory contains an additional group of three audio-only
recordings made with loudspeakers at various locations (3D locations
and speech/silence segmentation known by construction). It includes a
README file and the original WAV files played by the loudspeakers:

synthmultisource/


( 3.4 ) Remaining directories:

BAI/		-> Ball Annotation Interface (includes a tracker).

HAI/		-> Head Annotation Interface.

MAI/		-> Mouth Annotation Interface.

CAL_session08/  -> Camera calibration parameters for sequences in session08.
		   Text files and Matlab files.

CAL_session09/  -> Minor image plane shift parameters (deltax,deltay) rel. to session08.
		   Text files and Matlab files.

CAL_session10/  -> Minor image plane shift parameters (deltax,deltay) rel. to session08.
		   Text files and Matlab files.

CAL_session11/  -> Minor image plane shift parameters (deltax,deltay) rel. to session08.
		   Text files and Matlab files.

CAL_session12/  -> Minor image plane shift parameters (deltax,deltay) rel. to session08.
		   Text files and Matlab files.

EXAMPLES/AUDIO/ -> Single audio source localization + comparison with ground-truth.
		   Contains one README file.
		   Contains all the MATLAB code necessary to run the example.

EXAMPLES/VIDEO/ -> Multi-object tracking example.
		   Contains one README file, and tracking results.

EXAMPLES/3D-RECONSTRUCTION/  -> How to produce continuous 3D mouth location annotation.
                   Contains one README file, an example, and all necessary MATLAB code.


==================================================
4. LOG
==================================================

October 9th, 2006:

Added the 2D & 3D mouth annotation for seq02 and seq03, in both ASCII and
MATLAB 6.5.1 format.  Note that each MATLAB file also contains a rough
speech/silence segmentation.

session09/seq02-1p-0000/annotation/*.mouthgt
session09/seq02-1p-0000/seq02-1p-0000_gt.mat

session08/seq03-1p-0000/annotation/*.mouthgt
session08/seq03-1p-0000/seq03-1p-0000_gt.mat

----------

February 21st, 2006:

Added the annotation of ball marker and mouth, including interpolated
3D measures, for seq15, seq40 and seq45.

session10/seq40-3p-0111/annotation/*gt
session10/seq45-3p-0111/annotation/*gt
session11/seq15-1p-0100/annotation/*gt

----------

December 16th, 2004:

Head annotatation (bounding box) complete, and "*.headgt" files uploaded 
for the following sequences:

seq01-1p-0000
seq11-1p-0100
seq15-1p-0100
seq18-2p-0101
seq24-2p-0111
seq40-3p-0111
seq45-3p-1111

Thanks to all involved partners from the AMI project (TNO, Sheffield, BRNO and IDIAP).

----------

August 30th, 2004: 
Entire corpus uploaded.
Audio and video examples uploaded.
2D mouth annotation files uploaded.
3D reconstruction example uploaded.

----------

August 8th, 2004:  filenames were changed. Namely:

"seq1_jitendra"                   became  "seq01-1p-0000"
"seq3_dig"                        became  "seq37-3p-0001"
"seq6_iain2"                      became  "seq11-1p-0100"
"seq9_daniel_guillaume"           became  "seq24-2p-0111"
"seq10_fabien_viktoria"           became  "seq18-2p-0101"
"seq11_d_jm_g"                    became  "seq40-3p-0111"
"seq12_d_jm_g"                    became  "seq45-3p-1111"
"seq19_guillaume2"                became  "seq15-1p-0100"

The new names all follow the same coding scheme.
For example the name "seq37-3p-0001" contains three parts:
- "seq37" is the unique identifier of this recording: sequence #37.
- "3p"    means that overall 3 persons were recorded - but not necessarily 
all visible simultaneously.
- "0001"  are four binary flags giving a quick overview of the contents 
of this recording. From left to right:

  bit 1:   0 means "very constrained",  1 means "mostly unconstrained"
           (general behavior: although most recordings follow some sort of scenario,
           some include very strong constraints such as the speaker 
           facing the microphone arrays at all times)
  
  bit 2:   0 means "static",        1 means "dynamic"       
           (static = sporadic motion (e.g. mostly seated), dynamic = continous motion)

  bit 3:   0 means "minor occlusion(s)",  1 means "at least one major occlusion"     
           (for at least one array or camera: whenever somebody passes in front of 
           or behind somebody else)

  bit 4:   0 means "little overlap",    1 means "significant overlap"       
           (audio: indicates whether there is a significant proportion of overlap
           between speakers and/or noise sources)