Sequence-to-Sequence Models for Low-Resource ASR

home..

Sequence-to-Sequence Models for Low-Resource ASR

Sequence-to-sequence (seq2seq) models of the autoregressive variety have proven very powerful over the last few years. Starting with work like Listen, Attend, and Spell, similar models quickly surpassed the performance of hybrid models on large, industry datasets. Since then, performance has steadily improved on smaller datasets like Librispeech and Switchboard—though those datasets are still large by academic standards.

Despite claims to the contrary, the models still struggle on low-resource languages with limited amounts of transcribed data, especially for more difficult domains. Note that while sometimes CTC-based models are referred to as sequence-to-sequence models, I am talking specifically about autoregressive models. If the model is not conditioned on its previous prediction, then it is relatively easy to add additional information through lexicons and language models, just like in traditional hybrid models.

We have been working to close the gap between hybrid and seq2seq models on more difficult, low-resource tasks. Our first approach used hybrid models to provide supervision to the seq2seq models. More recent work has removed the necessity of an initial hybrid model by leveraging unsupervised representation learning for pretraining.

“Training Autoregressive Speech Recognition Models with Limited in-domain Supervision”, Chak-Fai Li, Francis Keith, William Hartmann, and Matthew Snover, arXiv preprint arXiv:2210.15135, 2022. [arxiv] [bib] [post]
“Combining Unsupervised and Text Augmented Semi-Supervised Learning for Low Resourced Autoregressive Speech Recognition”, Chak-Fai Li, Francis Keith, William Hartmann, and Matthew Snover, in Proceedings of IEEE ICASSP, 2022. [publication] [arxiv] [bib] [post]
“Overcoming Domain Mismatch in Low Resource Sequence-to-Sequence ASR Models using Hybrid Generated Pseudotranscripts”, Chak-Fai Li, Francis Keith, William Hartmann, Matthew Snover, and Owen Kimball, arXiv preprint arXiv:2106.07716, 2021. [arxiv] [bib] [post]

Comments? Send me an email.

Sequence-to-Sequence Models for Low-Resource ASR

Related Publications