“Let our rigorous testing and reviews be your guidelines to A/V equipment – not marketing slogans”

Loudspeaker Double Blind Test and Demo Flaws

by Tom Andry — June 28, 2011

Loudspeaker DBT (double blind) testing

What is the point of a listening test? For the listener, the intent is easy to identify. They are (often) thinking of buying a product. They want to know if they like it enough to spend their hard-earned money. But for the salesman, the intent is not always so clear. Do they want a life-long customer? If so, they want you to be happy with your purchase and they'll make sure you buy the best product you can afford. Do they want a quick sale? Then they'll show you something that impresses at first blush, but maybe isn't something that you'll enjoy over the long term. Is the salesman working on commission or have they been instructed by their superiors to "push" a certain brand? Then who knows what product they'll try to convince you is the best for you. A tell-tale sign that you are going to be pushed is when the salesman tells you what you're going to hear before they press play.

Things get even murkier when the listening tests are run by a manufacturer. They have a vested interest in their product. They want their product to "win" or, at the very least, to not lose. They'll take a tie. Especially if it is a tie with a much more expensive speaker. But, as we'll see, a tie really isn't as great as it would seem at first.

So if you, the listener, isn't setting up the test, you have a lot to worry about. You have to worry about the potential bias and motivations of those that are setting up the test. You also have to worry about your own ability. If you aren't familiar with the room or the music/content, you're going to have a much harder time. Plus, you have to worry if the demo will test the speakers to their limits. Many speakers won't exhibit real problems until they are driven hard. If the demo doesn't drive the speaker to near its limits, you may never know of their problems until you get them home. So, many times, potential customers will have to rely on other things to help inform their buying decision. One thing that will draw a consumer is familiarity. If they've heard of a brand, they'll be predisposed to think that there is some value to them. Even more so if they've bought a brand in the past and haven't been disappointed. But this limits the pool of potential purchases to those with adequate advertising budgets or those that have drummed up enough of a following to have a vocal forum support.

Buying Direct Means a Better Deal? Not Always!

If you aren't afraid of buying from a "new" (to you at least) company, you might end up looking at some of the online-only companies. They often have huge forum followings (where huge=a small number of very, VERY vocal forum members) who will, to no end, extol the virtues of the online business model. "Cut out the middlemen," they'll say. "You're saving money that they'd spend in paying distributors/marketers/etc." It doesn't take a business degree to see through that logic. If they can make a speaker for $100 that would ultimately sell in the store for $500, why sell it online for $110? Why not sell it for $300? They'd be making a lot more profit, while still offering a deal. This doesn't even take into account that the same speaker, from one of the larger manufacturers, could be made for much less. Why? Economy of scale. While the Internet-direct company is making hundreds of speakers a year (if they are lucky), the larger manufacturer is making thousands. And we all know that buying in bulk saves you money. This is just as true with the components of a speakers as it is with coffee, shampoo, and soap that you find at your local warehouse store

External Data

The obvious solution is third party validation. Reviews, shootouts, blind testing - any and all of it. Google any audio component and you'll find any number of pages of information on it. You'll likely come across prefabricated PR "reviews" written by marketing execs and posted blindly on tons of sites. These are easy to see through as they'll be posted word-for-word on many different websites. You'll also find consumer reviews on forums as well as retailer sites (if the product is old enough). If you are lucky, you'll find a professional review or two that don't stink of bias. Unfortunately, you'll also run into a lot of "tests" and "reviews" on the manufacturer's website and forum. These reviews are always glowing (why would they post anything on their site that wasn't?) and the tests, while claiming to be free from bias, often have issues. One thing that needs to be addressed closely is the use of the "Double Blind Test" (or "DBT" as it is often referred to online).

The Double Blind Test Fallacy

The fact is that Double Blind Testing (DBT) really has very little place in the audio world - regardless of the claims and beliefs of manufacturers and some forum members. Double Blind Testing is designed to eek out small differences between very similar stimuli. This is why it is used so heavily in the medical field. Participants, for example, are given either a new type of analgesic or a sugar pill. We all know that the brain can trick you into thinking something is happening when it isn't. The participants don't know which pill they got. That's part one of the "double" part of the test. The second part is that the persons distributing the pills don't know which is the real one and which is the placebo. In very rigorous testing, the identities of the pills won't be revealed until after the statistical analysis is run.

But, even after the DBT is run and the results are obtained, what do you know? You know that there either was or wasn't a difference. That's it. It doesn't tell you what the difference was, or whether that difference was even meaningful. If there is no difference, you've failed to reject the null (nothing) hypothesis. The null hypothesis is always that there is no difference. DBTs are set up so that very small differences can be revealed. They do this by controlling for bias (part if which is keeping everyone in the dark) and making it very hard to reject the null hypothesis (in the least rigorous tests you'd have to have 19 out of 20 people agree that there is a difference to reject the null).

If you've ever taken a statistics course you know that you can never PROVE the null hypothesis (that there is no difference), only fail to reject it. When manufacturers say that their speakers sound equally good to other (usually much more expensive) speakers (the "tie" situation mentioned above), they are basically saying that they've proved the null. But why can't they say that? Here is a short list of reasons they erroneously could have failed to reject the null:

Speaker placement affected the test
Listeners were inexperienced
Participants didn't understand how to properly fill out the forms
The room affected the sound
The components affected the sound
Participants were tired/deaf/distracted/hungry
The test was run incorrectly
Alien mind control
Michael Bay (he ruined Transformers for me, why can't he ruin a DBT for you?)

And on and on. You can't prove the null because these sorts of tests do NOT tell you why. They don't tell you why the null was rejected or accepted. Just that it was.

But we started off saying that Double Blind Tests have little place in audio testing. But it is the gold standard isn't it? Sure. If you are trying to test things that are very, very similar. But that often isn't the case in audio. Especially with speakers. Do you need a double blind test to "prove" that the driving experience is different between a Honda and a Ferrari? No. Do you need a DBT to tell you that you don't like broccoli? No. You don't even need a Single Blind Test for any of these. The reason we say DBTs have little place in audio is because they are just too impracticable (yes, it's a word - look it up). If we have to wait for every piece of gear to be put through a rigorous DBT, we'll have a very limited amount of data on any gear. In almost all cases, a Single Blind Test (where the participants are in the dark about what they are testing but the experimenter knows), is more than enough. If a Single Blind Test isn't rejecting the null hypothesis (or is), and you think the problem may be the methodology, then you can go through the hassle (and expense) of a DBT.

Alternative Testing Methodologies

A while back we did a comparison of $1500-$2000 Floorstanding Loudspeakers. We urge you to re-read the method section of that report. That is about as good a Single Blind Test as you can do without specialized equipment. The participants reported after the test that they had no idea which speaker they were listening to at any time. Even the experts. That's what you want to hear. That way you're eliminating a huge source bias. But will it always work?

Let's say that you're reading a report on a test run by a manufacturer. On the surface, it looks like they did everything right. The participants didn't know which speaker was which. The speakers were equalized. The room was treated. On and on. One potential pitfall with manufacturer-run tests is the participants. Even when they are run at a conference or some sort of audio event, the participants will create a problem. They will either be fans of the manufacturer's speakers (why else would they be there?) or they will be familiar with the sound of their gear (if there is a specific sound). Familiarity, in audio, correlates very highly with preference (we don't have stats on that but anecdotally, it seems to be true). If you have Manufacturer's X speakers in your home, you'll be familiar with that sound. In fact, you'll typically prefer it. So something that sounds different will immediately be classified as bad. Not always, but that tends to be the case. So when a manufacturer is setting up the test, the participants will often present a fairly significant source of bias.

An expert listener is an even harder nut to crack. An expert listener has the experience not to have (as much of) the familiarity bias. Instead, they have the ability to see through your test if you are not careful. It won't take them long to identify a speaker through its sonic signature. And if they do, your test is toast. If they've spent time with a speaker, they'll be able to pick it out regardless of what you do (sighted or blind). The way to combat that is to pick experts that aren't familiar with the brands in question. You also need to make sure that you switch the nomenclature used during the test (Speaker A isn't always the same speaker). Added protection would be to move the speakers during the test (switch locations if you are not using some sort of turntable arrangement for keeping all the speakers in the same location) and to lie about which speakers you are actually testing. The Audioholics team have been in blind tests of speakers known to us and we easily, and accurately, picked them out by how they sounded.

Herein lies the rub for the manufacturer-run speaker tests. Even if we accept that they are doing everything they can to keep the test bias-free, the participants will always be a problem. If they use staff, they'll have both familiarity bias and risk having the ability to pick out the speakers. If they use lay people, they risk them self-selecting a speaker that they like. While this is not to imply that manufacturer-run tests are all biased and inaccurate, it is important to point out the possible limitations (something they are reticent to do).

One More Time with the Null

We think it needs to be reiterated that you can't prove the null. All you can do is fail to reject it. So, when you hear, "Our speakers sound just as good as speakers 5x's the price," or whatever, what they are saying is, "We rejected the null! Yea us!" We could easily design a test where a particular speaker would always either win or tie. It isn't that hard. Just pick the right room. Let's say you have a speaker that is a little bright with a rolled off bass response. Well, if you conduct the test in a room that is well dampened for the high end and is very active on the low, that speaker will sound much better (better, in this case, is defined as "more flat"). Any speaker without a similar sonic signature will sound, at best, very similar to the test speaker or, at worst, much, much worse than it actually does (or would in a flat room). So if you have a speaker with a more bass and a flatter high end, the bass will sound bloated and boomy, the high end will sound muted and lifeless. The same holds true for those that demo in an overly lively room. A room with reverberant characteristics will mask the good and bad qualities of a speaker, making it difficult or even impossible to single out the better performing one.

And this is just by tuning the room. Add in participants that may have biases predisposing them to a particular sound and you've got a test that looks (on the surface) to be valid, but isn't.

Conclusion

Let's not forget that if you are buying speakers for yourself, the only opinion that counts is yours. You're the one that has to like the speakers. The point here is to point out how a test, Double Blind or otherwise, can be skewed without looking skewed. It is also to remind you that finding no difference between two different speakers doesn't mean that the speakers are the same. It just means that they couldn't find a difference in that specific test on that specific day with those specific participants. In a separate test, they may very well find differences. But if it is the manufacturer that is running the test, you'll never hear about it. A bright or boomy speaker may, at first listen, sound impressive, but will tend to be fatiguing over extended listening sessions. Spending time with speakers in your own listening environment will be much more revealing than a few minute demo comparison between a few competitors.

Discuss This Article