home..

Using Heterogeneity in Semi-Supervised Transcription Hypotheses to Improve Code-Switched Speech Recognition

image generated from craiyon.com

The focus of this work was on building ASR systems for code-switched languages without transcribed code-switched data. For the purposes of investigation and analysis we used the SEAME corpus of code-switched Mandarin and English, but eventually treated the data as if it was untranscribed.

We entered into this project assuming the lack of transcribed code-switched data would be a critical hindrance, especially with respect to the language model. Surprisingly, removing code-switched data from the language model had a comparatively minor effect on performance. We found the effect of domain mismatch to be a much more critical factor.

When starting from monolingual corpora, it is unlikely for both corpora to be in-domain. The two languages interact. Improving performance on one of the two languages, can hurt performance on the other. We noticed this phenomenon most strongly when one of the languages was in-domain. This would drive the accuracy of the other language down to zero.

Obviously this has implications when using semi-supervised training. It is easy to produce a degenerate model that, while trained on two languages, only works for one language. We found that through combining multiple systems we could generate transcriptions that could be used to further train a single, final system for code-switched speech.

Comments? Send me an email.
© 2023 William Hartmann   •  Theme  Moonwalk