Unsupervised Representation Learning
Unsupervised representation learning has recently become a popular approach to pretraining deep neural network models. In image classification, techniques like SimCLR and MoCo have demonstrated impressive performance. Similar approaches have been applied to speech with models like Wav2Vec 2.0.
We are also exploring similar ideas for speech and other time-domain signals. Our first success adapted the HuBERT approach for the purposes of unsupervised domain adaptation.
Related Publications
-
“Training Autoregressive Speech Recognition Models with Limited in-domain Supervision”, Chak-Fai Li, Francis Keith, William Hartmann, and Matthew Snover, arXiv preprint arXiv:2210.15135, 2022. [arxiv] [bib] [post]
-
“Combining Unsupervised and Text Augmented Semi-Supervised Learning for Low Resourced Autoregressive Speech Recognition”, Chak-Fai Li, Francis Keith, William Hartmann, and Matthew Snover, in Proceedings of IEEE ICASSP, 2022. [publication] [arxiv] [bib] [post]