“Let our rigorous testing and reviews be your guidelines to A/V equipment – not marketing slogans”

Human Hearing - How We Hear and Perceive Audio Quality Part 4

by Mark Sanfilipo — May 01, 2005

Hearing

The human sense of hearing is fantastically complex, as both a mechanism and as a process.

That it has evolved into an extremely sophisticated psycho-physiological function should come as no surprise when one considers that in nature, the preceptor possessing the superior hearing mechanism/process is rewarded with survival. It's this same formidable sophistication, forged over time in the harsh crucible of simple survival that allows us now to experience the sublime joys of music - live or reproduced.

The actual conscious awareness of sound takes place near the surface of the brain when the auditory cortex (stimulated by electrical signals fed to it by the hearing nerve) matches the incoming electrical patterns with patterns already stored in the auditory memory. Once a match is made, we consciously perceive (and recognize) the sound. At that instant, an evaluation process begins, whereby another nearby area of the brain assigns meaning to the sound. The hearing nerve I just mentioned - a bundle containing ~ 30,000 nerve fibers - streams the electrical patterns fed to it by the cochlea (or inner ear) to the auditory cortex all our lives. The auditory memory, largely empty at birth, voraciously accumulates & stores sound patterns all our lives, as well.

Owing to the degree of connectivity existing between the auditory, limbic and autonomic nervous system, we can feel energized, excited, soothed or even frightened by the sound we perceive. That is to say, sound can evoke instantaneous, automatic responses outside the auditory system. For example, when we hear our first name spoken or when we first recognize the opening bar of a favorite song.

Ironically, the fact that the highly sophisticated human hearing mechanism/process does what it does so well appeared to me to on occasion confound or otherwise act as a source of frustration for researchers who endeavored to nail down absolutes where it comes to performance parameters.

Indeed, the human hearing process is so sophisticated and adaptable that its not at all unusual to see various studies using different test paradigms, signal sources, etc coming up with quite different results and/or conclusions. This is not to say there aren't any human hearing absolutes known to science, there are. However, there exists many more variables and "unknowns" than clear "knowns" where it comes to performance parameters. Hence my point made in an early part of this series that the study of the human hearing process is a work in progress and there's clearly room for more research.

In this concluding section of the Human Hearing series, I'd like to do a summary-form overview of each of the preceding articles, adding along the way a few anecdotal items drawn from my own experiences.

I. Amplitude Sensitivity

From Human Hearing: Amplitude Sensitivity Part 1, we have the following table:

Study Authors	Year Published	Min. Detectable Fluctuation
Reisz	1928	~1 dB
Dimmick & Olson	1941	JND = 1.5 dB to 3 dB
Atal, et. al.	1962	~ 1 dB
Jestaedt, et. al.	1977	JND @ 80 dB = 0.5 dB JND @ 5 dB = 1.5 dB
Toole and Olive	1988	.25 dB for a 5kHz resonance, Q = 1
Mark Sanfilipo	2005	.75 dB to 1 dB, practical

In this particular article, I settled on a minimum discernable difference dB value of .75 - 1.0. My experience has shown that this is what the average listener, under average listening conditions, listening to music played back through typical consumer-grade audio gear will be able to clearly identify - and do so repeatedly. It would appear that my conclusion contradicts a couple of the sources cited in the table.

Truth is, the conditions under which the two seemingly contradictory studies were undertaken are very different from those which I have enumerated as the basis of my .75 - 1.0 dB conclusion. Therefore, comparisons can't really be made. The fact that the human hearing process, when tested under differing methodologies, source signals, etc resulted in different outcomes is a testament to its flexibility, sophistication and adaptability. I'd be very suspicious if such widely divergent approaches resulted in identical outcomes!

Now for an anecdote from the world of live audio or "Sometimes the impossible takes no effort at all" .

The client in this story is a world-renowned, globe-trotting ballet company that everyone you know - including your dentist's receptionist and aunt Matilde's dog - has heard of.

The character in this company was the stage manager (SM), who prided himself on having the goldenest golden ears ever encountered in live theater audio. He's a nice enough guy and fun to work with, but possessing some self-proclaimed peculiarities where it comes to hearing. (Which he judged, only half jokingly, to be "better than a jack rabbit's"). You meet all kinds in theater.

I was the FOH (front of house) Sound Engineer and we had just finished the tech. rehearsal. (For those readers without a theatre back ground, tech. rehearsal is that rehearsal which includes all cast & crew, sound, lights, scene/set changes, pyro., etc. It's the closest a rehearsal gets to being an actual show).

The SM (seated a few rows ahead of me, located as I was at the mix position), wanted to stay behind and "polish" a few of the sound cues. It went something like this:

SM: "Ok, for Cue 1, I want to warm up the bass line a bit. Punch it up 0.25 dB at 80 Hz".

Me: "Ok, rolling Cue 1… How's that?

(I silently wondered if this guy even knows what a decibel really is).

SM: "Ummmmm …. too much. Can you cut it a bit"?

Me: "Ok. How's that"?

SM: "Sounds good. Mark that level. Let's go to Cue 5"

Me: "Ok…. Cue 5 is ready".

SM: "I need more gain for the entire cue. Can you increase it by 3 dB"?

Me: "Ok, 3 dB it is. Rolling Cue 5….. How's that?"

SM: "Hmmm … yeah, sounds good. Mark that level".

SM: "Queue up Cue 6. Let me know when it's ready."

Me: "Cue 6 ready".

SM: "Play it back, but let me hear it with, maybe, ummmmm, 0.2dB more gain than I heard when we were teching." (Meaning during the tech. rehearsal).

Me: "Cue 6 rolling…. How's that level"?

SM: "Ummmm … too much. Bring it down just a little bit".

Me: "Ok…. How's that"?

SM: "No, too quiet now. Bring it up just a bit …"

SM: "Perfect! Beautiful! Gorgeous! That's exactly what I needed to hear. Mark that level! A lotta guys have argued with me about that one (the 0.2dB thing) … But I know what I hear!"

I should mention in passing that, unlike the jack rabbit, this guy needed to engage only one of his goldenest golden ears as the whole "polish" session was conducted with him wearing his single-earpiece Clear-Com intercom headset.

As you might have guessed, for Cues 1 and 6 neither of my hands went anywhere near the board (a Yamaha PM4000) or any item in the effects rack next to me. Cue 5, the +3 dB request was, however, accomplished effortlessly by a minor adjustment at the board.

As for Cues 1 & 6, the mighty Placebo effect had indeed accomplished the impossible with no effort whatsoever. That night, along with all the other's where I mixed sound for this particular company, the talent, audience and most of all the stage manager were happy with the quality of sound.

Having earned respect for the abilities of the placebo effect to skew people's judgment, I find it useful to keep a wary eye out for it when judging the merits of everything from scientific researcher's conclusions to a marketer's commercial claims.

II. Phase Distortion Audibility

In "Human Hearing - Phase Distortion Audibility Part 2", I came to the following conclusions:

1. Given the data provided by the above cited references [in the original article] we can conclude that phase distortion is indeed audible, though generally speaking, only very subtly so and only under certain specific test conditions and perception circumstances.

2. The degree of subtlety depends upon the nature of the test signal, the dB SPL level at which the signal is perceived, the acoustic environment in which the signal was recorded and/or played back as well as the Q (Quality factor; a unitless factor of merit) & fo (crossover frequency, Hz) of any filter networks in the signal stream. Certain combinations of conditions can render it utterly inaudible.

3. Room acoustics further masks whatever cues that the hearing process may depend upon to detect the presence of phase distortion.

In "Perception of Phase Distortion in All-Pass Filters", ( Deer, J. A., et. al .: Journal of the Audio Engineering

Society, Vol. 33, No. 10, October 1985), the researchers studied phase perception from the perspective of group delay (the rate of change of the total phase shift with respect to angular frequency).

Their conclusion,

"a statistically significant perceptual threshold is reached when peak group-delay distortion at 2 kHz is in the neighborhood of 2 ms (for diotic presentation via earphones)". indicates that phase distortion, when viewed from the perspective of group delay, is indeed audible, at least where it comes to listeners hearing the "click" test signals in their earphones. I point this study out as I think it's a great example of "certain specific test conditions and perception circumstances" mentioned in Point 1 of my conclusion. Personally, I'd like to see more research linking in findings such as these to how the hearing process reacts in the presence of actual music.

In the loudspeaker industry you'll see a spectrum of attitudes or philosophies regarding whether it is or isn't important to "get the phase correct". Which leads me to the anecdotes.

In the late 1990's, I was hired as a contract-based consultant by a very large asian-based loudspeaker manufacturer which had set up an R & D facility in north America as a first step to establishing a presence in the north American market. As things turned out, the facility was used by the company to develop products for both the asian and north American markets. While there, I actually spent more time working on products destined for release in the high-end segment of the asian market than I did products destined for the north American market.

Anyway, in this particular company, each new product began life as a concept study, faxed by headquarters to the head of R & D, who happened to be my boss. He'd then call together all the engineers and we'd work out a number of design particulars we needed to craft in order to attain the performance goals set by him.

In all the design meetings I attended, for all the products of which I played a role in creating, never was getting the phase correct considered. It just wasn't considered important by this particular manufacturer. The products certainly didn't suffer because phase response wasn't considered. Indeed, the products would go on to commercial and critical successes, in some cases actually winning awards. This company's philosophy is clearly at the opposite end of the spectrum from that held by a number of other manufactures such as Focal-JMlab. From their website we read:

"the phase between the midrange and tweeter must be perfectly matched for the two to overlap and sum perfectly and create a balanced tonality. The difference in phase between the tweeter and midrange at the crossover point must be zero. Thus, there is total summation between the two emission sources at the crossover frequency".

Here we have two strongly opposing philosophies from two companies, each commercially successful in their own right. Given what I've seen in researching this article, along with my experience with all things loudspeaker, and taking in to account just how subtle the presence of phase distortion can be, I agree with Dr Toole's assessment, given in the Feedback section attached to this article:

"All things being equal, and if one has the option, of course get the phase correct ..,"

III. Distortion Audibility

In "Human Hearing - Distortion Audibility Part 3", we find the following table:

Item	Study/Report Author & Date	Subject	Results/Conclusion				Misc. Notes
1	James Moir/1981	Just detectible distortion (JDD) levels	JDD "level can be no lower than 1%"				Assumed non-linear distortion was the primary type of distortion involved. JDD levels dropped as listeners learned to recognize distortion.
2	D.E.L Shorter/1950	Sound quality of systems with known quantities of harmonic distortion	Just perceivable distortion values of 0.8% to 1.3%				Multiplying harmonic amplitudes by n^2/4 (n = harmonic order) before rms summing produced better correlation between objective measurement and subjective assessment.
3	P.A Fryer/1979	Listening tests for intermodulation distortion	2% - 4% distortion detectable in piano music, 5% in other types of test signals				Distorted test material using 1st order IM distortion products
4	Von Braunmühl & Weber/1937	Distortion sensitivity at selected frequency bands	1% - 2% at frequencies > ~ 500 Hz				Noted that at lower frequencies JDD levels could go much higher
5	Harry F Olson/1940	Just detectible distortion (JDD) levels	JDD level of .7% using 40 Hz to 14 kHz bandwidth test system				Restricting system to 4 kHz doubled the JDD level
6	M. E. Bryan & H. D. Parbrook/1960	Just detectible amplitude of 2nd - 8th harmonic in the presence of a 360 Hz fundamental (f1)		2nd	3rd	4th	Chart at left shows just discernible amplitude in dB for the 2nd, 3rd, and 4th harmonics below (f1) levels shown in left column
			52.5	44	52	52
			60	52	57	61
			70	47	62	67
			76	-	54	59

Not long ago my wife and I were sitting in Copps Coliseum, a 19,000-seat arena located in Hamilton , Ontario , Canada . The occasion was a concert. It was a rare treat as I was in a venue, with live music and I didn't have to work ! Yeah!

While listening to the show, I noticed the all-too recognizable audible signs of a system pushed too hard for the sake of maximal dB SPL output. Later, I got to thinking about all the measurements I've done over the years, exposing linear and non-linear distortion artifacts in all sorts of drivers, from fragile fiberglass tweeters to 18" behemoths destined to move mammoth amounts of air in the quest for that extra bit of low-frequency extension.

Each transducer displayed linear and non-linear signatures that were as unique as a fingerprint. Using single or multi-tone test signals for assessing non-linear distortion (harmonic & intermodulation) and burst tones for linear distortion, it doesn't take long before you learn to recognize the presence of non-linear distortion. The same can be said for linear distortion as well.

Across all the drivers I've tested, under various conditions, using a large variety of test signals, I never encountered two with identical distortion signatures, linear or non-linear. This held true even in those cases where I tested several otherwise identical units drawn from the same production run. Though I've not put this through any sort of rigorous scientific study, I have found that those drivers exhibiting less-than-average distortion levels tended (though not always) to be subjectively judged by market & media as being of higher quality in terms of ability to cleanly & accurately reproduce in acoustic form the electric signal fed to it.

Making use of low-distortion drivers and using multiples of the same drivers (when warranted) are just a couple examples of factors that can contribute to a clean, musically accurate sounding loudspeaker system. Of course, running a loudspeaker system at low levels keeps (as we've already seen) driver non-linearities minimized. Auditory masking plays a role here as well.

Recognizing audible distortion using test tones is one thing, predicting how a transducer's distortion that will affect orders-of magnitude more complex signal, i.e. music, is quite another. Now that sounds like nice research project. Any takers?

Conclusion

The time taken to gather & process the background material for this introductory-level overview of human hearing and the perception of phase, amplitude differences and distortion was time extremely well spent.

In my opinion, anyone, enthusiast or pro, who makes the effort to apprise themselves of the published research currently out there, gains. Be aware, though, you may find these gains are not easily won: the published material quite often presents a difficult read and typically requires at least a basic understanding of statistical analysis methods. (The latter is especially important in that without it, one can easily be lead astray by the numbers). As well, you may find your reading raises more questions than it answers.

So for those willing to take the time, what, specifically are the benefits?

For the enthusiast, a broadened understanding of psychoacoustics, at the very least, makes for better informed purchase decisions. For the professional designer or engineer a solid understanding of psychoacoustics provides, at the very least, for one effective means by which to avoid the "brilliant-solution-to-irrelevant-question" pitfall.

The earliest reference I found to a scientific study of psychoacoustics came from the 1840's. (There are probably earlier scientific studies, but I did not encounter them in researching for these articles). So it would appear that psychoacoustics isn't a new scientific endeavor, yet its research frontiers remain wide open, very much like a "young" science.

One metric I find useful in judging the degree of maturity attained by a science or engineering endeavor is the mix proportion of synthesis to analysis. Across the published research works I read in preparation for writing these articles, I found far more analysis than synthesis, yet another indicator of a "young" science.

Its for these and other reasons, I referred earlier to psychoacoustics as a "work in progress".

Research into how the ear and all the post-processing mechanisms downstream of it work is a critical component needed to drive forward qualitative advancements in loudspeaker design. After all, (to paraphrase Harman's Dr Floyd Toole), the room may be the "final audio component", but the ear is the final judge.