IARPA MATERIAL Program
The IARPA MATERIAL program ran from 2018-2021. I participated as part of the Raytheon BBN Technoligies (FLAIR) team. The program could be seen as a spiritual successor to the IARPA BABEL Program.
The primary goal of the program was to help analysts triage large volumes of audio and text in languages they did not understand. An analyst would interact with the system by searching for keywords and phrases in English. The system would then return potential relevant documents, translate them, and present a summary for the analyst. While such a system might have been feasible in Spanish or German at the start of the program, MATERIAL focuses on languages with limited training data. The limited training data was also typically out-of-domain as well.
A successful system required advances in automatic speech recognition, machine translation, information retrieval, and summarization.
Related Publications
-
“Combining Unsupervised and Text Augmented Semi-Supervised Learning for Low Resourced Autoregressive Speech Recognition”, Chak-Fai Li, Francis Keith, William Hartmann, and Matthew Snover, in Proceedings of IEEE ICASSP, 2022. [publication] [arxiv] [bib] [post]
-
“Overcoming Domain Mismatch in Low Resource Sequence-to-Sequence ASR Models using Hybrid Generated Pseudotranscripts”, Chak-Fai Li, Francis Keith, William Hartmann, Matthew Snover, and Owen Kimball, arXiv preprint arXiv:2106.07716, 2021. [arxiv] [bib] [post]
-
“Reformulating Information Retrieval from Speech and Text as a Detection Problem”, Damianos Karakos, Rabih Zbib, William Hartmann, Richard Schwartz, John Makhoul, in Proceedings of the Workshop on Cross-Language Search and Summarization of Text and Speech, pp. 38-43, 2020. [publication]
-
“The 2019 BBN Cross-Lingual Information Retrieval System”, Le Zhang, Damianos Karakos, William Hartmann, Manaj Srivastava, Lee Tarlin, David Akodes, Sanjay Krishna Gouda, Numra Bathool, Lingjun Zhao, Zhuolin Jiang, Richard Schwartz, John Makhoul, in Proceedings of the Workshop on Cross-Language Search and Summarization of Text and Speech, pp. 44-51, 2020. [publication]
-
“Cross-Lingual Information Retrieval with BERT”, Zhuolin Jiang, Amro El-Jaroudi, William Hartmann, Damianos Karakos, Lingjun Zhao, arXiv preprint arXiv:2004.13005, 2020. [arxiv]
-
“Neural-Network Lexical Translation for Cross-Lingual IR from Text and Speech”, Rabih Zbib, Lingjun Zhao, Damianos Karakos, William Hartmann, Jay DeYoung, Zhongqiang Huang, Zhuolin Jiang, Noah Rivkin, Le Zhang, Richard Schwartz, John Makhoul, in Proceedings of ACM SIGIR, pp. 645-654, 2019. [publication]