In this paper I will examine the psychoacoustic theories which guide the design of lossy sound compression; the technological underpinnings of lossy bitrate reduction, with specific attention paid to the MP3 standard. I will also present the methods and results of two experiments that attempt to judge just how effective MP3 technology is at compressing audio without perceptible quality loss. First, however, an introduction to digital audio, and why audio compression is worthwhile.
Introduction to bitrate
reduction of sound data
In its most directly encoded form (voltage levels from a microphone digitized and stored without any further encoding), digital audio is a high bandwidth medium. Many thousands of times a second a voltage value (or two for stereo) is converted (sampled) to a digital representation of it’s distance (negative or positive) from zero voltage. This is known as PCM (Plus Coded Modulation) digital audio. Red Book Audio CDs (the standard format for CDs) samples voltage values 44.1 thousand times a second, each time encoding two pairs of voltage values (one for the left channel and one for the right channel) into two 16 bits values (Erickson, 1994). Therefore, CD audio consumes 44,100 * 16 * 2 = 1,411,200 bits (172 kilobytes) per second. One minute of CD quality PCM requires over 10 megabytes of storage.
For
some purposes PCM is a good choice.
Because of its simplicity, very low cost hardware can both encode and
decode PCM in real time. PCM is the
most direct representation of the digitized source of any encoding scheme and
it discards the least amount of auditory data from the signal possible. This does not mean that all PCM encoded
audio is high fidelity. A lot of noise can be introduced into the signal (as
well as higher frequencies lost) when it’s digitized at low sampling rates with
few bits used to encode each voltage value. However, with bit rates at, or
above CD audio, the quality rivals and exceeds analog technologies. It is ideal for professional uses because
encoding audio at high bitrates introduces very
little noise. In digital audio editing,
every time a modification is made to the signal that same amount redigitizing
noise is added into the result. A more
compact representation might sound as good as PCM on the first encoding. But with each modification the encoding noise
will be compounded, and very possibly, significantly reduce the overall quality
of the signal after several digitization cycles. When the amount of storage
space and/or bandwidth available is plentiful, PCM is clearly the best format
for storing audio for further editing.
When only needed for playback, however, PCM becomes less appealing
because of its storage requirements.
One
alternative is to utilize the mathematical properties of the PCM bitstream,
applying a transform that produces an exact representation of the same bits,
but in a more compact form. Compression
software (PK-ZIP, StuffIt) designed for compressing text files and other
computer data files can also be used on
PCM audio. When used on text these systems reduce file size by 60%-80%. For CD quality PCM, however, the resulting
files are only reduced by 10%
(Gilchrist, 1999a & 1999b).
When
tuned to the characteristics of PCM audio, lossless compression improves by a
significant margin. In one cross comparison of several audio sources and
compression technologies (Whittle, 1998), some sources were reduced by as much
as 66%. Other sources did not fare as
well, only reducing by 25%, with the average across encoders and sources
tending towards 50% reduction in file size.
While 86 kilobytes per second is an improvement over 172, it is still
too large for many applications, most notably transferring audio over the
Internet. To enable further
compression, current solutions rely on encoding the PCM bitstream in a
representation which cannot exactly reproduce the original bitstream. Such technology is termed lossy compression
because between the encoding and the decoding cycle, some of the audio data is
lost. In its simplest form, this is
done by dropping out some of the audio samples in the bit stream, which removes
the higher frequency spectral content while maintaining the lower frequencies. For example, halving the number of samples
will cut off the top half of the frequencies a PCM stream can represent. Depending on the audio in question the
higher frequencies can play a small or large role; indeed if the original has
no frequencies in the range cut out by lowering the sample rate, no loss of
audio quality is perceived. Another
simple solution is encoding each sample with less resolution (a process known
as quantization). Whenever digital
audio is encoded, it has to be quantized to a certain level, since an analog
voltage amplitude has an infinite number of distances from 0, whereas as 16 bit
audio can only encode 65,535 discreet voltage distances. While it does reduce the audio quality,
dropping to 8 bits per samples (256 unique voltage values) still maintains a
recognizable signal, and reduces the total bitrate by half.
Neither
of the above bitrate reduction methods are at all intelligent about which parts
of the audio are discarded; what is lost is a byproduct how it was easiest to
store digitized voltage values. Better
lossy encoding schemes rely on removing parts of the audio which the mind/brain
does not and/or cannot attend to.
Primarily,
these lossy encoding schemes rely on frequency masking (sound at one frequency
and volume masking another at a different frequency and volume). Some systems also exploit the lack of
sensitivity to stereo for sounds at low frequencies. Based on these two theories of audio perception, popular
compression algorithms achieve 10 to 1 bitrate reductions with minimal perceived
quality loss.
The psychoacoustic theories
of bitrate reduction, in depth
![]()
By
far the most important psychoacoustic theory used by perceptual audio coding is
that of masking. Most simply, this is
the result of hearing two sounds at different energy levels at nearby
frequencies, whereby one sound obscures the perception of the other. Most current theories state that the
incoming spectrum is split (filtered) into separate bands (known as critical
bands); the number and size of which varies by theory, although all agree that
the bands are arranged logarithmically, with most bands occurring at lower
frequencies. This logarithmic
arrangement corresponds to physical locations on the basilar membrane that each
respond to limited ranges of frequencies, with more space on the basilar
membrane devoted to the lower frequencies (Moore, 1997).
Frequencies within a critical band do not
interact with sounds in other critical bands, and the mind/brain can
selectively attend to all the bands at once, or just the ones of interest (say,
where the frequencies of speech are concentrated). Within a band, however, there is less ability to differentiate
between co-occurring sounds. In fact,
frequencies of significantly higher volume will partially, or even completely,
occlude (mask) the perception of other frequencies within the same band. In addition, lower frequency sounds mask
high frequency sound better than the reverse.
Finally, masking ability of a given sound also varies depending on its
tonality and noisiness (Brandenburg, 1996).
Tonal
sounds are simple, repeating sounds, which occupy a small range of frequencies
at any one time. Conversely, noisy
sounds are much more complex, the most extreme example being a random
distribution of energy spread over a wide range of frequencies. As a rule, noisy sounds are much better at
masking than tonal sounds. There are
two theories for this. From a neural
viewpoint, the noisy sound swamps the neurons that react to that CB, the
patterns of activation changing so frequently that they hide the static
activation caused by any tonal sounds (Moore, 1997). From a physical standpoint, sounds interfere with each other on
the basilar membrane, noisy sounds have more frequencies than tonal sounds, and
therefore stimulate the same limited area on the membrane more than a tonal
sound does (note, this is a theory put together from what I’ve read from many
sources, I do not have a single source which explicitly states this).
Another
psychoacoustic characteristic which is used in bitrate reduction coding is
stereo imaging. Due to the nature of most
stereo sound recordings, the sound in the left channel is correlated to the
sound in the right channel. This
relationship is not crucial to the sound; when the two channels are downmixed
into one center channel, the original content is still recognizable; however,
the information about the spatial locations of individual sounds are lost. Not all the differences between the two
channels are necessary, however, to
maintain our sense of location.
Particularly, the human auditory system is less sensitive to the details
of the higher frequency critical bands (above 2kHz), deriving most of the
localization information from time delay between signal onset and volume (Pan,
1995).
In depth discussion of
technologies of bitrate reduction audio encoding
In
its traditional form MP3 encoding works on one channel at a time, so encoding a
stereo file is identical to encoding two mono files and storing the encoded
result in one file. Not only does
encoding stereo take twice as long, but it also results in a file twice as
large. The following explains how one
channel of audio is encoded.
At
its heart, MP3 encoding consists of two major steps. First, the uncompressed
PCM bitstream is filtered into 32 log scaled, overlapping bands (based on the
critical bandwidths (CB)) by a polyphase filter. Then each band is further
subdivided into 18 sub-bands (for the total of 576 frequency bands) by a
Modified Discrete Cosine Transform filter. (Brandenburg & Stoll, 1994).
Note, this combination of filters is used because it is both reasonably fast
and gives enough detail to determine masking accurately, not because it has any
direct correlation with how we think auditory perception works.
In
the second step, the bands are categorized as either noisy or tonal, and based
on the perceptual coding model, judged for their ability to mask other bands
within the same CB. The bands which are
totally masked are discarded and the bands which are only partially masked are
set aside for further processing.
For
each of these remaining bands, the perceptual model calculates how much noise
can safely be inserted into the sample, such that the noise will still remain
masked by other signals. The source of
noise comes from encoding each sample within the band with a lower resolution
(i.e., quantization).
One
problem with this is that filtering audio into frequency bands always reduces
the time resolution of each band. While
this cannot be completely solved, the MP3 standard addresses this by using four
types of filter windows. The normal window is 1024 samples long, with ½ overlap
between successive windows and is shaped like a bell curve. The other windows
are: another bell curve 1/3 as large, a bell curve skewed towards the beginning
of the window, and another skewed towards the end. Whenever the encoder determines the normal sized window is
nearing a section of sound with a lot of transient energy, the encoding
switches to a window shape that allows for as small a part of the transient
energy to be captured at a time as possible. (Brandenburg & Stoll, 1994).
Without this, whenever the encoded bitstream nears a transitory moment, the that part of the signal starts to be heard before the actual event (this is known as pre-echo). The problem with switching window sizes, however, is that smaller windows require more encoded bits per second. To keep the total bits per second constant, the encoding system normally uses slightly less than the full bandwidth allowed, and then in times of transience spends the reserve bits on smaller window sizes.
When
music is not highly transitory, but rather filled with silence or constant
tones, traditional run length encoding and Huffman encoding can be used to
reduce the bitrate. MP3 layers these
two lossless compression algorithms on top of the lossy compression, however,
they only decrease the resulting file size by about 10 percent.
All
of the above applies to mono MP3 files and files stored in standard stereo
mode. Although more computationally
demanding to encode, two additional systems exist that can improve audio
quality for a given bitrate.
The
first is very simple. Usually the audio in both channels of a stereo file is
highly related. One way to take
advantage of this is to encode three
channels: Right, Left, and Center. The
center channel contains the left and right channels from the original file,
mixed together. The encoded left
channel specifies how the original left channel differs from the encoded center
channel. To retrieve playback data for
the left channel, subtract the encoded left channel from the encoded center
channel. The process for the right
channel is analogues. As long as the
two original channels are highly correlated, the amount of difference that must
be coded for the left and right channels is minimal, and can be encoded with
fewer bits than one normal channel would require, leaving more bits over to
represent the middle channel.
(Brandenburg, 1996).
The
other method is more based on psychoacoustics. As I have stated before, the
human auditory system is less sensitive to the stereo content of the higher
frequency critical bands, deriving most of the localization information from
time delay between signal onset and volume.
For bands that fall into this category, the left and right channel
frequency values are mixed together. For later retrieval of the lost stereo
information, the two volume envelopes of the two channels are saved. Then on playback both bands are inserted
back into the left and right channels, with their volume over time controlled
by their respective volume envelopes. (Brandenburg, 1996) & ( Chen, et. al,
1998).
Conclusions and thoughts
about compression technology
The
MP3 standard was completed in 92. Eight years is a long time for computer technology.
If nothing else, much more processing power exists on the desktop. Already, new technologies are starting to
gain recognition, such as AAC, and TwinVQ.
Still MP3s are widely used, and increasing penetration by the week, it
seems. It will be interesting to see
how much improvement a new technology will have to bring in terms of bitrate
and quality to usurp the current standard.
Now that we have the background of how MP3s work, the next task is to test just how effectively they reproduce the original signal when used at reasonable bitrates.
Introduction to the
empirical test of compressed audio fidelity.
Since
audio compression is based on perceptual quirks of the human auditory system,
quirks which are only partially understood, there is no mathematical proof
which describes the fidelity of a compressed audio stream. Only by compressing a bitstream and then
subjectively comparing it to the original can a sense of fidelity be achieved. Such evaluation is necessary during the
development of the algorithms, and frequently the developers of the system are
the ones who evaluate it.
Given
the subjective, and not entirely uniform hearing abilities of human subjects,
using just a few people to judge fidelity of a bitrate reduction system is not
necessarily enough to determine how the results compare to the common baseline
standard of RedBook CD audio. In the
case of MP3s, the technology is clearly good enough that with low quality enough
reproduction equipment there is no discernible quality difference. Because the technology does a fairly good
job, suggestibility comes into play; knowing that the audio quality may be
compromised may result in more careful attention to defects in the sound, be
they from the reproduction equipment, the original source, or even from
the lossy nature of the compression
algorithm used. Only with blind testing
can the effect of the lossy compression be isolated and tested without being
confounded by the these other factors.
Fidelity is somewhat hard to judge, and its measure
will vary not only with the type of sound, but also in the subject’s taste for
that kind of sound. If the subject does
not care for a type of music, their rating of its fidelity will very likely be
less accurate for cases where the amount of difference is minimal. Therefore, coming up with a single test
procedure which makes efficient use of all subjects is difficult.
Methods – Experiment # 1
Five
subjects were asked to judge fidelity of 24 sample pairs. Each pair contained
the exact same music sample, but one was compressed and the other was not.
Five
second long samples of music were chosen as test data from the following sources:
1.
Peter
Gabriel; In your eyes (drums,
synthesizers, and vocals)
2.
Peter
Gabriel; In your eyes (drum solo)
3.
David
Lanz; Improvisations, adapted from
Pachelbel’s Canon in D Major (solo piano)
4.
Bizet;
Carmen, Aragonaise (horns, drums,
long; drawn-out cymbal crash)
5.
Doug
Coulter; Stereo Sample (drums,
electric guitar and base)
6.
Robert
Palmer; Simply Irresistible
(distorted electric guitars, drums (w/ dramatic silence between beats)
Each
sample was encoded at the following bit rates: 96kbps, 112kbps, 128kbps, and
160kbps. The L.A.M.E MP3 encoding engine V3.1.4 (retrieved from
http://www.sulaco.org/mp3) was used with the following settings: High Quality,
Joint Stereo.
A
fresh install of WinAMP 2.5C was used to play back all samples; no equalization
or output modifying plugins were installed.
An Ensoniq AudioPCI (fully 16bit-44.khz capable) sound card converted
the output from digital to analog, and a Sony STR D315 stereo receiver provided
amplification to a pair of Beyerdynamic DT 831closed (isolating) circumaural
headphones (claimed frequency response of 5hz – 32000hz).
At
the beginning of the study, the concept of fidelity was described to the
subject, “Fidelity refers to the overall quality of a sound, where a higher
fidelity source will have less noise and distortion. For example, FM radio has considerably higher fidelity than AM
radio.
Once
primed with a concept of fidelity, six sets of four sample pairs were run for
each subject. For all pairs two, 5 second clips, were played together, with one
second of white noise interjected between clips. After each trial, the subject was given 3 seconds to mark on a
two column table which sample had the best fidelity. The white noise was inserted to distract the subject from
directly comparing the two samples, in an attempt to make them focus instead on
their subjective feeling of quality for each sample.
For
every pair, one of the clips was the original sample in 16 bit, 44kHz stereo
PCM format, while the other clip was the same sample encoded at one of MP3 the bit
rates from above. Within each set of
four pairs, the order of compressed and uncompressed audio was randomized, with
an equal number of pairs starting with the compressed sample as uncompressed. The order of the pairs was also randomized,
within the requirement that at the end, all four MP3 bitrates had been
tested. The order of the sets was also
randomized, except that sample 1 (table 1) was always presented first, once to
show the subject how the test worked, and a second time to acclimate them to making
judgements.
Results – Experiment # 1
Note,
as described in the methods section, the In
You Eyes vocal sample was presented twice to the listeners, unlike the
other samples. The data collected, however, was roughly similar to the other
samples, so I have included it in the results.
Table
2: Individual Performance
Percent
correct (uncompressed audio marked as sounding the best):
Experimenter (not included in any
other calculations): 83%
Subject 1: 71%
Subject 2: 66%
Subject 4: 62%
Subject 5: 45%
Average correct among subjects: 69/120 = 57%
Graph 1 (correct responses by sample and compression rate)

This graph shows the number of correct judgements about which pair was compressed, across all subjects. The maximum possible performance for any sample pair was 5, meaning that for that pair of samples, every subject determined the uncompressed sample sounded best. This as achieved only once, namely, on one of the pairings of the Improvisations piece. The comparison easiest to judge was that of PCM verses 96kbps, the hardest, 160kbps verses PCM.
Discussion of results
There
are several interpretations of these results.
Over 120 sample pairs, subjects only answered correctly 57% of the time,
not much better than chance. The most positive interpretation is that people
cannot, in fact, judge the difference between MP3s encoded at 96kps (or higher)
and PCM. When I ran the test upon
myself, I easily identified 83% of the compressed samples. Most of my subjects, however, complained of
not being tell any difference, and having to guess on almost all choices. If they were picking up on any quality
difference at all, it would make most sense if they were more accurate on
samples where severe compression was used.
In graph one, such results would show up as stair-step lines, with the
top line of each group the shortest. If
such a pattern exists in the data, I cannot find it.
The
data seems to indicate no significant perception of quality difference.
Perhaps, however, MP3s do sound worse in general, but my test just doesn’t
expose it to the unaccustomed. Or,
maybe I’ve just learned what the compressed samples sound like as I created
them, and although the difference is small, I can pick up on it, where as the
average listener cannot. It might not
be a question of one sample truly sounding better, just that I can identify how
it sounds different, and associate different with worse.
The important question, however, is not
whether the experimenter can figure out how to crack the experiment, but
whether the average subject can determine a noticeable quality difference. If nothing else, this first study further
underlines the necessity for testing MP3 quality on subjects, rather than just
running blind tests on the person who designed the test, as running it on
myself strongly indicates that there is a difference, and running it on
subjects suggests that there is not.
Because
of the disparity between my own judgement, and those of my subjects, I decided
to run a second experiment, which would give subjects more time to chances to
each sample. After all, I listened to
each sample many times during their creation, maybe that is the most
significant cause of my improved accuracy.
Subjects
auditioned 16 pairs of audio samples, where one sample was always compressed,
and the other uncompressed. To make
sure subjects understood their task, the first two samples were compressed at
56kpbs and 80kbps, both of which had highly audible noise and compression
artifacts. If they did not correctly
identify the higher fidelity audio on these samples, they were asked to listen
again.
After
the first two sample pairs, the procedure followed was always the same. A random pair of samples was chosen. One of
the samples was PCM, and the other, 128kbps MP3. The samples were played in succession with a few seconds space
between the samples. Then the subjects were allowed to listen to the samples as
many times as they wanted to aid in deciding which had the highest
fidelity. Because so many subjects in
the first experiment had been disturbed by having to make guesses when they
felt like they had no idea which sample sounded best, I decided to let all the
subjects in the second experiment answer a/b if they couldn’t tell after
running the samples several times.
In
addition to all the samples used in the first experiment, I also added the
following new audio clips:
·
Moody
Blues; Nights in White Satin (Live From the Red Rocks version, at a
point heavy with applause).
·
Moody
Blues; Nights in White Satin (Live from the Red Rocks version, at a
point with lots of cymbal hits).
·
Vangelis;
Chariots of fire – theme (piano, synthesizers, and light drums)
·
Vangelis;
Direct (layered, high tempo
synthesizers).
·
R.E.M;
Drive (guitars, light drumming, and
vocals).
·
R.E.M;
Drive (guitars and silence).
·
R.E.M;
Drive (guitars – distorted, vocals,
drumming)
·
Enya;
Orocino Flow (slow tempo layered synthesizers,
vocals).
·
Pink
Floyd, Conformably Numb (strings, base, vocals).
Note that while there are several samples from the same song in this set, all of those samples are very different in character as far as number of instruments, and the intensity of playing (fast, loud, slow, melodic, noisy).
Since the subjects were corrected when they guessed incorrectly on the first two samples, that data is not included in the following table.
|
Subject |
# Correct |
Correct/Guessed |
Correct/Total |
# Incorrect |
# Unsure |
MP3s not detected |
|
1 |
6 |
54% |
42% |
5 |
3 |
58% |
|
2 |
8 |
73% |
57% |
3 |
3 |
43% |
|
3 |
10 |
77% |
71% |
3 |
1 |
29% |
|
4 |
9 |
64% |
64% |
5 |
0 |
36% |
|
5 |
2 |
40% |
14% |
3 |
9 |
86% |
|
6 |
10 |
77% |
71% |
3 |
1 |
29% |
|
7 |
5 |
42% |
36% |
7 |
2 |
64% |
|
|Av: |
7.5 |
51% |
50% |
4 |
3 |
50% |
Discussion of Experiment #2
This
time I tried to answer a smaller question: did people judge PCM as sounding
better than MP3 compressed at one fixed bitrate (128kbps)? Since I knew the sample size would be small,
I decided to accept “Don’t Know” as an option to answer to the question, “Which
sounded better?”. Averaged across all
subjects, the average number of MP3s clips undetected (sample pairs marked
either “don’t know”, or the compressed sample marked as sounding best) falls at
50%, (exactly!) the same percent as if people were guessing randomly. Not knowing statistics, I don’t know how
statistically valid this is, however, it suggests fairly strongly that people
cannot, at least for five second clips, tell the difference between MP3s at
128Kkbs, and PCM. The big unanswered
question, of course, is how well that represents their ability to judge quality
for longer clips (or the full length of a song), for music they know and
love. I’m not really sure how to
address that, currently. At least,
however, these results show that MP3 compression works well enough that much
more intensive levels of testing will be needed to show if it really causes a
significant drop in perceived quality or not.
Brandenburg, K. & Stoll,
G. (1994) ISO-MPEG-1 Audio: A generic
Standard for Coding of High-Quality Digital Audio. In Collected Papers on Digital Audio Bit Rate Reduction, edited by Gilchrist, N. and Grewin, C., pp. 23-30.
(Audio Engineering Society, New York).
Brandenburg, K. (1996). Introduction to Perceptual Coding. In Collected Papers on Digital Audio Bit Rate
Reduction, edited by Gilchrist, N.
and Grewin, C., pp. 23-30. (Audio Engineering Society, New York).
Chen, Tsai, & Wu
(1998). Fast Time-Frequency Transform
Algorithms and Their Applications to Real-Time Software Implementations of AC-3
Audio Codec. IEEE Transactions on
Consumer Electronics, 44 (#2), 413-423.
Erickson, G. (1994) A
fundamental introduction to the Compact Disk Player. http://www.amtechdisc.com/file/amtech/CDPAPER.HTML
Fielder, L., Bosi, M.,
Davidson, G., Davis, M., Todd, C., & Vernon, S. (1995). AC-2 and AC-3:
Low-Complexity Transform-Based Audio Coding. In Collected Papers on Digital Audio Bit Rate Reduction, edited by Gilchrist, N. and Grewin, C., pp. 54-61.
(Audio Engineering Society, New York).
Gilchrist, J. (1999a) ACT
2.0 Text Test
http://web.act.by.net/~act/act-text.html
Gilchrist, J. (1999b) ACT
2.0 Sound (WAV) Test
http://web.act.by.net/~act/act-sound.html
Ikeda, K., Mori, T., Morlya,
N. & Kaneko T. (1998). Audio Transfer System on Personal Handyphone
System Using Error-Protected Stereo Twin VQ. IEEE Transactions on Consumer Electronics,
44 (#3), 1032-1037.
Ko, W., Yoo S., Park, S.,
Kim, J., Youn, D. (1998). A VLSI
Implementations of Dual AC-3 and MPEG-2 Audio Decoder. IEEE Transactions on Consumer Electronics, 44 (#3), 872-877.
Liu, C. & Lee W. (1997) The Design of a Hybrid Filter
Bank for the Psychoacoustic Model in ISO/MPEG Phases 1, 2 Audio Encoder. IEEE
Transactions on Consumer Electronics, 43 (#3), 586-591.
Lui, C., Lee, C., Juang, S.
(1998) Design of the Coupling Schemes for the AC-3 Coder in Stereo Coding.
. IEEE
Transactions on Consumer Electronics, 44 (#3), 878-881.
Murphy, C., Anadakumar, K.
(1992) Real-Time MPEG-1 Audio Coding and Decoding on a DSP Chip. IEEE
Transactions on Consumer Electronics, 43 (#1), 30-7 .
Moore, C. (1995) Masking in
the Human Auditory System. In Collected
Papers on Digital Audio Bit Rate Reduction, edited by Gilchrist, N. and Grewin, C., pp. 9-19.
(Audio Engineering Society, New York).
Moore, C. (1997). An Introduction to the Psychology of Hearing
(4th ed). New York: Academic Press.
Pan, D. (1995) A Tutorial on
MPEG/Audio Compression.
http://www.bok.net/~pan/index.html (First published in IEEE Multimedia Journal, Summer 1995 issue).
Stoll, G. (1996) ISO-MPEG-2
Audio: A Generic Standard for the Coding of Two-Channel and Multichannel Sound.
In Collected Papers on Digital Audio Bit
Rate Reduction, edited by
Gilchrist, N. and Grewin, C., pp. 9-19. (Audio Engineering Society, New York).
I
found, read, and cataloged a great deal of papers on the web, some of which I
referenced in this paper and have included on the above list. Those I read, but
did not use as my primary sources can all be found on http://hamp.hampshire.edu/~aer98/mp3links.html