home..

A Script for Combining the Outputs of Keyword Spotting Systems

The Keyword Spotting (KWS) task seems to be quite popular lately. For some work I am doing, I have had the need to be able to quickly merge the outputs of multiple systems (and sometimes the same system). I have written a simple python script to take care of this.

KWS is the task of detecting specific words in audio. The typical output of a KWS system is an XML file (at least in the case of NIST-style evaluations). The basic format is

<kwslist system_id=“ID1" language=“Unknown" kwlist_filename=“hitlist.xml”>
    <detected_kwlist kwid=“KW001" search_time="1" oov_count="0"> 
        <kw tbeg="195.23" dur="0.27" file=“file1 score="0.482474" channel="1" decision="NO" />
        <kw tbeg="314.55" dur="0.83" file=“file2" score="0.470213" channel="1" decision="NO" />
    </detected_kwlist>
</kwslist>

My script takes a set of these XML files and combines them in one of several ways; run with the -h option to get the full help. I refer to each of these files as hitlists and each of the individual entries as hits.

If we can assume that the XML files contain no overlapping entries, then we can use the fast (-m ‘fast’) option. When using this option, the merging is very fast because it does not have to search over the previous entries when adding a new one. Often I perform KWS by using a separate process for each audio file. I then merge them using this script.

In the case of overlapping hits, a decision must be made on how to combine the probabilities. The script gives five options: min, max, mean, gmean, and rank. Min, max, and mean are obvious. Gmean is similar, but uses the geometric mean instead of arithmetic mean.

The final option is rank. Rank simply considers the XML files in order. If there are overlapping entries, it takes the probability of the entry seen first.

There is one other option I wanted to mention, --unscored. When this option is given, any XML files that do not have an overlapping entry for a detection seen in another file are treated as having a probability of 0.

To be honest, I only use --merge fast when merging files from the same system, and —-merge mean when merging files from separate systems. I have also never had a use for the -—unscored option. In all of my experiments, --merge mean gives the best results out of all the merging options. However, others may find a use for the other options.

Once again, the script is called merge_hitlists.py.

Comments? Send me an email.
© 2023 William Hartmann   •  Theme  Moonwalk