MASTERS ON AUDIO AND VIDEOFeatures Archives

July 1, 2004

 

A Weekend Encounter with the Codecs

It might strike some as peculiar that I would choose to spend one of the most beautiful weekends of the summer in a darkened room in the basement of a government building near Ottawa. But several years ago, I was invited to participate in a research project being conducted by the Radio Broadcast Systems Group of the Canadian federal government's Communications Research Centre (CRC), comparing the audible effects of a number of digital audio encoding systems, and I was happy to give up a few days of sun to do so.

Canada has long been at the forefront of scientifically based audio listening tests. For years the National Research Council, also in Ottawa, led the world in developing blind testing of speakers and other audio components, and CRC has been instrumental in the evaluation of data-reduction schemes for digital audio.

Over a decade ago, they conducted an in-depth comparison of a number of such systems in order to select the one that would be used in a future digital audio broadcasting standard. Most of the listeners in that series of tests were students, but a number of audio journalists took part as well, of which I was one.

Data reduction -- or perceptual encoding, as it's also called -- was very new then, at least outside the labs where it was created. However, we were struck by the fact that, although some of the systems we heard were awful, at least one seemed to be virtually transparent. That, as it turned out, was Musicam: the system that was eventually chosen for the Eureka 147 standard that will be used for digital radio in many of the world's countries, including Canada.

The United States, after some initial enthusiasm for Eureka 147, decided to investigate other systems, of which seven were proposed. The electronics of these were extensively tested in the States, but CRC was selected by the US Electronics Industries Association to perform the subjective listening tests, in preference to any American facility. The report, issued in 1995, showed the Musicam system way ahead of anything else.

The later CRC tests, which were completed in 1997, were pure research, and the results are available to anyone who is interested in them. CRC rounded up some 17 data-reduction systems -- called "codecs" (for code/decode) -- and established a method for conducting very detailed comparisons of their effects.

The whole notion of data reduction is based on the knowledge that, in a complex audio waveform, a large amount of the information is masked by other parts of the sound and thus not heard. A system that can analyze a signal to identify those parts we won't actually hear and remove them, can theoretically reduce the overall amount of information needed for a given sound, and thus the bandwidth needed to reproduce it.

Discussions of this subject tend to concentrate on "bit rates," the number of binary digits used in a given time period. Bit rates are stated in kilobits per second, or kbps, and a full, non-compressed audio signal needs about 1500 of them. The Musicam system we liked so well back in 1992 uses approximately 250kbps, for a data reduction of about 6:1; more than 80 percent of the information is removed with virtually no audible degradation.

A major trend since those earlier tests was to ever-lower bit rates, according to Dr. Ted Grusec, whose project I was taking part in. In the 1997 tests, the top system used 192kbps, the lowest about 64kbps, for a reduction of 24:1. The codecs used in the tests were not all totally unrelated to one another; systems from five companies were selected, but many of those were available in several different bit rates, for a total of 17.

Comparing them was by no means a simple process, especially when it came to the ones that approach transparency. It was hardly a matter of sitting back, throwing on a CD and declaring that this one sounds good and that one sounds bad.

Instead, the designers of the test selected eight short audio samples, each ranging from ten to about 30 seconds in length. These included such things as an arpeggio played on a bass clarinet, a snippet from Mussorgski's Pictures at an Exhibition on solo trumpet, the opening passage from a Dire Straits cut, a sound effects clip from one of the Indiana Jones movies, and so forth.

Each sample was recorded on a computer's hard drive in unencoded form and also processed through each of the 17 codecs. Thus, to hear all the comparisons, each listener -- I was the 21st to take part, and several were scheduled after me -- had to make 136 trials over three days.

Each day started with a training session conducted by research assistant Darcy Boucher, designed to familiarize listeners with the recorded samples, and to sensitize them to the sorts of artifacts, subtle and otherwise, the codecs might produce. The afternoons were devoted to the tests themselves, in which the sample/codec combinations were randomly ordered and arranged in groups of 15 trials. Three such groups took place every day.

In the specially designed listening room, the participants sat in a comfortable chair the appropriate distance from a pair of high-quality monitor speakers, with a computer monitor on a table in front and a mouse to one side. Headphones were also available.

On screen was the Play button, used to start each sample, which would then repeat continuously until the Pause button was activated. A looping feature let listeners focus on short bits within the sample -- single notes, even -- to isolate some perceived anomaly. The screen was dominated by three buttons marked A, B, and C, which could be selected by the mouse.

In each trial, A was the unencoded signal, and either B or C was a duplicate of that reference; the other was processed through the codec. The coded and unencoded signals were perfectly aligned so participants could switch seamlessly between them.

The task was first to identify which of B or C was the coded signal. Sometimes that was immediately obvious, sometimes it took very careful listening (and lots of time), and sometimes -- for me anyway -- there was no audible difference at all. Then when the coded channel was identified, its sound had to be rated on an annoyance scale from 1 to 5 in decimal increments.

It's an exhausting process, and very far from a pleasurable listening experience. But it was fascinating to focus so minutely on what has become a major factor in audio, and to find out what sorts of distortions digital processing can inject; they're very different from what we'd been used to in analog times.

All we knew as subjects was that there were a number of different systems in the test (but not what they were), and that they all had bit rates much lower than anything currently being used commercially -- they involved much more severe reduction than things like Dolby Digital or DTS surround sound. We also knew that the snippets of audio used in the tests had been chosen specifically for their ability to make the systems misbehave.

When I first wrote about this experience, the results of the tests had not yet been published so all I could say for sure was that there were at least some systems that appeared to be completely transparent. Some months later, the paper detailing the results was published (unfortunately coinciding with the death of its main author, Dr. Ted Grusec), and it turned out that several of the systems with the lowest bit rates had performed best.

With that perspective, the amount of data retained in the reduction systems now actually in use seems positively wasteful.

...Ian G. Masters
ian@mastersonaudio.com


MASTERS ON AUDIO AND VIDEOAll Contents Copyright © 2004
Schneider Publishing Inc., All Rights Reserved.
Any reproduction of content on
this site without permission is strictly forbidden.