Progress in speech recognition over the last decade has been monitored
by a series of annual evaluations run by the US National Institute of
Standards and Technology and the US Defence Advanced Research Projects
Agency. In the early 1990s, the best recognisers in the world produced
around 15% errors on a 20,000 word speaker-independent dictation task.
In 1994, the HTK system developed at CUED recorded just 7% errors on a
much harder unrestricted vocabulary task beating 14 other competing systems
including ones from IBM, AT&T, Dragon, BBN Systems and various other universities.
Although still not perfect, it was known that speaker-specific enrolment
would improve accuracy further. The HTK system therefore effectively demonstrated
that desktop dictation was possible and companies like Dragon and IBM
subsequently converted the ideas demonstrated in these research systems
into commercial reality. In the meantime, the CUED team continued to improve
their system tackling harder tasks such as dictation in noise, and most
recently transcription of broadcast news material. The latter is particularly
difficult because the recogniser must cope with a sequence of unknown
speakers, speaking over different channels with varying degrees of background
noise including music and sound effects. Despite their limited resources,
the CUED team have continued to stay ahead of the competition. For example,
in the 1997 broadcast news transcription evaluation, the HTK system had
a word-error rate of 16% which was the lowest recorded and better by a
statistically significant margin than its nearest rival IBM.