Binary Mask Estimation
Binary masks have long been proposed for speech separation tasks. Typical binary mask estimation techniques focus on low-level features; simple noise estimates from the mixture and general statistical models for speech are commonly used. Since oracle masks are typically defined by local SNR, the incorporation of higher-level linguistically motivated information is not straightforward. We propose an alternative masking criterion that forces the use of this higher-level information; we term the mask the ASR-driven binary mask. A discriminatively trained linear sequence model provides the framework for estimating the binary mask.
Related Publications
-
“Improved model selection for the ASR-driven binary mask”, William Hartmann, Eric Fosler-Lussier, in Proceedings of Interspeech, pp. 1203-1206, 2012. [publication] [poster]
-
“ASR-Driven Binary Mask Estimation using Spectral Priors”, William Hartmann, Eric Fosler-Lussier, in Proceedings of IEEE ICASSP, pp. 4685-4688, 2012. [publication] [poster] [post]