General context: originally speech enhancement for humans, but also for Automatic Speech Recognition (ASR) as e.g. in [1].
Below: explanations of some of the parameters in Wiener_iter_mel.m:
niter
niter > 1
the resulting "cleaned" speech may sound distorted ("muffled").noise_est_factor
(real number > 0.0, default 3.0):En
is estimated as the geometric mean of the spectra of "silent" frames, then simply multiplied by noise_est_factor
.
noise_est_factor
(e.g. 10) a lot of noise will be removed, but maybe too much speech will be removed as well, leading to speech distorsion.noise_est_factor
(e.g. 0.1) maybe not enough noise will be removed, but the speech will not be too much distorted.T_smooth_frame
(integer number of time frames >= 0, default 1):H
, to avoid loosing weak speech close to strong speech (in time-frequency space). Implementation: a dilation followed by an erosion [2]. If T_smooth_frame == 0
, no smoothing is applied.new_M
(real number between 0.0 and 1.0, default 0.9):[2] http://en.wikipedia.org/wiki/Mathematical_morphology