home..

Citing Older Empirical Literature

When working on a problem, I like to read some of the older literature commonly cited in more recent articles.  Often I am underwhelmed by the evidence presented in these older papers.  This is not an attack on the quality of these papers—when they were published, the authors likely did not have access to the same kinds of data and computational resources available today—but their conclusions may be less valid as time progresses.

As an example, a classic paper, who will remain nameless, was published about 30 years ago demonstrating feature type A was superior to feature type B for speech recognition.  This paper has over 2000 citations and continues to be cited today as proof for the superiority of feature type A.  The experiments used a dataset consisting of isolated words spoken by just two speakers.  If a paper was submitted today using an identical experimental setup, it would never make it past the review process.

Given the paper could not be published today, should we believe the conclusions?  A refresh of the paper would be nice.  My guess is that several people have confirmed the results on more modern tasks, but these results are not published.  Confirmations of assumed beliefs are not typically the types of papers anyone is excited to read or write.

It is surprising how quickly techniques can be dismissed because they perform poorly in one very particular setting. While I honestly do not know much about publishing in other scientific fields, I get the impression confirmation studies are more accepted. What is the reason for publishing reproducible results if the studies reproducing the experiments cannot be published?

There are legitimate reasons confirmation studies are not typically seen. Many techniques are very complicated and would take a great deal of work to reproduce; they may even be impossible to reproduce based on the description in a conference paper. If the results are modest or inconclusive, there may be little benefit to reproducing the results. Simpler techniques or papers that also publish implementations tend to be used more often. When choosing comparison techniques, I know I am much more likely to choose one if code is already available.

In many cases, the number of studies reproducing or extending the results may be proportional to the claims of the original paper. If I read an author stating his technique works just as well as a dozen others, I will just take his word for it. If he states his technique is makes all other approaches obsolete, many people will want to confirm that claim. A good example are Deep Neural Networks; great results were originally reported on one small dataset. In the past few years, similar results have been shown in a range of datasets and fields.

While in most cases it may be reasonable to assume older results of popular classic papers have been confirmed by other researchers—even when no published confirmations exist—this will not always be true. I know many researchers will occasionally see a claim and test it because it is relatively simple to test in a framework they already have. I do the same thing and usually think, “thats interesting”, and move on. I would like to see researchers make those kind of results available in some format—on their website, on a blog, or as a footnote in a larger study. In the future, I will try and take my own advice.

Comments? Send me an email.
© 2023 William Hartmann   •  Theme  Moonwalk