A Weekend Encounter with the
Codecs
It might strike some as peculiar that
I would choose to spend one of the most beautiful weekends of the summer in a darkened
room in the basement of a government building near Ottawa. But several years ago, I was
invited to participate in a research project being conducted by the Radio Broadcast
Systems Group of the Canadian federal government's Communications Research Centre (CRC),
comparing the audible effects of a number of digital audio encoding systems, and I was
happy to give up a few days of sun to do so.
Canada has long been at the forefront of scientifically
based audio listening tests. For years the National Research Council, also in Ottawa, led
the world in developing blind testing of speakers and other audio components, and CRC has
been instrumental in the evaluation of data-reduction schemes for digital audio.
Over a decade ago, they conducted an in-depth comparison of
a number of such systems in order to select the one that would be used in a future digital
audio broadcasting standard. Most of the listeners in that series of tests were students,
but a number of audio journalists took part as well, of which I was one.
Data reduction -- or perceptual encoding, as it's also
called -- was very new then, at least outside the labs where it was created. However, we
were struck by the fact that, although some of the systems we heard were awful, at least
one seemed to be virtually transparent. That, as it turned out, was Musicam: the system
that was eventually chosen for the Eureka 147 standard that will be used for digital radio
in many of the world's countries, including Canada.
The United States, after some initial enthusiasm for Eureka
147, decided to investigate other systems, of which seven were proposed. The electronics
of these were extensively tested in the States, but CRC was selected by the US Electronics
Industries Association to perform the subjective listening tests, in preference to any
American facility. The report, issued in 1995, showed the Musicam system way ahead of
anything else.
The later CRC tests, which were completed in 1997, were
pure research, and the results are available to anyone who is interested in them. CRC
rounded up some 17 data-reduction systems -- called "codecs" (for code/decode)
-- and established a method for conducting very detailed comparisons of their effects.
The whole notion of data reduction is based on the
knowledge that, in a complex audio waveform, a large amount of the information is masked
by other parts of the sound and thus not heard. A system that can analyze a signal to
identify those parts we won't actually hear and remove them, can theoretically reduce the
overall amount of information needed for a given sound, and thus the bandwidth needed to
reproduce it.
Discussions of this subject tend to concentrate on
"bit rates," the number of binary digits used in a given time period. Bit rates
are stated in kilobits per second, or kbps, and a full, non-compressed audio signal needs
about 1500 of them. The Musicam system we liked so well back in 1992 uses approximately
250kbps, for a data reduction of about 6:1; more than 80 percent of the information is
removed with virtually no audible degradation.
A major trend since those earlier tests was to ever-lower
bit rates, according to Dr. Ted Grusec, whose project I was taking part in. In the 1997
tests, the top system used 192kbps, the lowest about 64kbps, for a reduction of 24:1. The
codecs used in the tests were not all totally unrelated to one another; systems from five
companies were selected, but many of those were available in several different bit rates,
for a total of 17.
Comparing them was by no means a simple process, especially
when it came to the ones that approach transparency. It was hardly a matter of sitting
back, throwing on a CD and declaring that this one sounds good and that one sounds bad.
Instead, the designers of the test selected eight short
audio samples, each ranging from ten to about 30 seconds in length. These included such
things as an arpeggio played on a bass clarinet, a snippet from Mussorgski's Pictures
at an Exhibition on solo trumpet, the opening passage from a Dire Straits cut, a sound
effects clip from one of the Indiana Jones movies, and so forth.
Each sample was recorded on a computer's hard drive in
unencoded form and also processed through each of the 17 codecs. Thus, to hear all the
comparisons, each listener -- I was the 21st to take part, and several were scheduled
after me -- had to make 136 trials over three days.
Each day started with a training session conducted by
research assistant Darcy Boucher, designed to familiarize listeners with the recorded
samples, and to sensitize them to the sorts of artifacts, subtle and otherwise, the codecs
might produce. The afternoons were devoted to the tests themselves, in which the
sample/codec combinations were randomly ordered and arranged in groups of 15 trials. Three
such groups took place every day.
In the specially designed listening room, the participants
sat in a comfortable chair the appropriate distance from a pair of high-quality monitor
speakers, with a computer monitor on a table in front and a mouse to one side. Headphones
were also available.
On screen was the Play button, used to start each sample,
which would then repeat continuously until the Pause button was activated. A looping
feature let listeners focus on short bits within the sample -- single notes, even -- to
isolate some perceived anomaly. The screen was dominated by three buttons marked A, B, and
C, which could be selected by the mouse.
In each trial, A was the unencoded signal, and either B or
C was a duplicate of that reference; the other was processed through the codec. The coded
and unencoded signals were perfectly aligned so participants could switch seamlessly
between them.
The task was first to identify which of B or C was the
coded signal. Sometimes that was immediately obvious, sometimes it took very careful
listening (and lots of time), and sometimes -- for me anyway -- there was no audible
difference at all. Then when the coded channel was identified, its sound had to be rated
on an annoyance scale from 1 to 5 in decimal increments.
It's an exhausting process, and very far from a pleasurable
listening experience. But it was fascinating to focus so minutely on what has become a
major factor in audio, and to find out what sorts of distortions digital processing can
inject; they're very different from what we'd been used to in analog times.
All we knew as subjects was that there were a number of
different systems in the test (but not what they were), and that they all had bit rates
much lower than anything currently being used commercially -- they involved much more
severe reduction than things like Dolby Digital or DTS surround sound. We also knew that
the snippets of audio used in the tests had been chosen specifically for their ability to
make the systems misbehave.
When I first wrote about this experience, the results of
the tests had not yet been published so all I could say for sure was that there were at
least some systems that appeared to be completely transparent. Some months later, the
paper detailing the results was published (unfortunately coinciding with the death of its
main author, Dr. Ted Grusec), and it turned out that several of the systems with the
lowest bit rates had performed best.
With that perspective, the amount of data retained in the
reduction systems now actually in use seems positively wasteful.
...Ian G. Masters
ian@mastersonaudio.com
|