Setting up a Speaker Shootout or Component Comparison the Right Way
One of the most popular articles we ever write on Audioholics is speaker or cable shootouts. Shootouts are really just a comparison of two or more products. This sounds like a fairly straightforward process where you place two competing products in the same room and take a listen/look. But the reality is that it is much more complicated. There are many ways that you can affect the outcome of a shootout by placement, accessories, even where you have your listeners sit. Through this technical article, we'll explore many of these issues and give you some helpful hints on how to set up the most valid and fair comparison.
The first concept to address is the purpose of a comparison. Is the purpose of a comparison to determine absolute quality? No. A comparison is NOT about rendering judgment on absolute performance - it is about comparing two or more things. Through the comparison you ferret out differences. That is all. What changes a comparison into a shootout is that you ask one critical question at the end - "So, which did you like better?" That is an evaluation. That is a shootout.
When Audioholics does comparisons, we spend the vast majority of the write-up discussing differences. Basically this is because value judgments are really only useful to a handful of people. Our readers know their preferences. They can read the descriptions of a product (no matter how much we personally like a particular item) and decide if that is something that would interest them. Sure, at the end we give our evaluation but that isn't the meat of the article. The meat is the comparison - as it should be.
Picking the Products
The first step, as you might imagine, is to pick the items you'd like to compare. Are you interested in speakers, CD players, interconnects? You probably already have something in mind. Many enthusiasts have a particular product they'd like to compare (usually something they own or something they want to own) and a limited selection of comparison models. Maybe they want a new pair of speakers and want to know if the ones at the store are better (and more importantly are they a large enough improvement to justify the price) than the ones they currently own. In a more professional setting (like an Audioholics shootout), we decide what type of product and start to collect samples from manufacturers.
Once you've decided what you want to compare, you've got two different scenarios. If you are looking to upgrade (or just change) from one product to the other, MSRP doesn't matter. What matters is the amount of money the upgrade will cost you. This might just be the MSRP or it might be the MSRP minus the money you can get for you speakers on Craig's List or EBay. Sales, other discounts, taxes, and possibly shipping will also have to be considered in this calculation. Once you arrive at that number, you now have a basis for determining if the performance increase is worth the expenditure.
In a professional setting, MSRP is extremely important. Not street price, not b-stock price, not how much you can get them off an auction or classified website. MSRP is the value the manufacturer places on their product. It tells the consumer what other products should perform similarly. Now, many manufacturers will set an MSRP on their product and immediately put them on "permanent sale." This is a marketing tactic designed to trick consumers into thinking they are getting a great deal on a high performing piece of equipment. Generally, the street price much closer reflects the performance than the MSRP. In a comparison setting, you want to compare it to other similar MSRPs instead of street prices. Will the product compare unfavorably to others in its category? Probably. But there is nothing you can do about that. The big problem with using street price is that they are malleable. What costs $100 on Black Friday costs $250 on sale the next day (MSRP $399). Do you use the Black Friday price or the regular sale price?
Some people love to throw "ringers" into their comparison. These usually take the form of a product that is either way above or way below the MSRP in question. I highly recommend that you don't do this. No one wins. If it is way above the MSRP and does well, no one is surprised and you've just wasted everyone's time. Not only that, but you've probably lowered the ratings of the other products because of the unfair comparison. On the flip side, if it is lower MSRP and does badly, you've artificially inflated the ratings of the rest and perhaps kept your observers from being more critical of the items that were lower performing than the rest but not nearly as bad as the ringer. Again, this is a waste of time. Now, if the low MSRP one does well or the high MSRP one does poorly, you've either just alienated the manufacturers of the rest (former) or the one (latter). No matter how you look at it, you lose. Don't do it.
Equalizing the products on price is only the first step. To reduce bias (for a more complete discussion please read my extensive discussion on the matter), you'll need to make as many of the variables the same as possible. Type, size, configuration… anything and everything that can be equalized should. If you are comparing amps, you may want to select only based on MSRP. Of course, you could whittle that down and choose only digital (or Class A/B… etc.). For speakers, you could compare bookshelf speaker, you could limit the size, whether they are ported or not (or even location of the port), size of woofer… whatever you want. The danger of over-limiting is that you may limit yourself down to a very small sample size (i.e. there may only be a few products available that fit your requirements). The key is to limit as much as you can without making your comparison too narrow. Also, make sure that your limiting factors are "real." Is the shade of the button backlighting really going to make a difference in an amp's performance? If not, don't limit your comparison group based on that factor.
Consumers doing a comparison naturally do this and they do it in a way that would be invalid in a professional setting but perfectly valid for them. Their limiting factors are often price (not MSRP but what they have access to within their budget), availability, and looks. With the famous WAF (wife acceptance factor) often playing an overly large roll in the decision. In a professional comparison, you need to be more systematic than that. Since a consumer only needs to identify the components they want to buy, it doesn't matter that they haven't equalized on type or MSRP. In a professional comparison, you are doing so for an audience. A consumer has an audience of one.
Validity is a touchy subject when choosing which items to include or exclude from your sample group. For a professional comparison, there are always people that will criticize your choices based on over or under limiting. The idea is to get the "big" things that will make the most difference. With speakers, no one would disagree that it is unfair to compare a bookshelf to a tower speaker. On the other hand, some would think it is valid to limit on woofer size or orientation. As a general rule, that'd be a much more "controversial" claim and could probably be excluded if you wish. Of course, if you have a bunch of speakers all with 6.5" drivers, perhaps you could. It's really up to you.
The next, and probably most important thing to equalize is everything else. EVERYTHING. Whereas before I was suggesting that you make judgment calls based on the things that will make the most difference, here you need to be very careful. What you want to do is make sure that every other component of the comparison is as identical as possible. Use the same components, cables, cable lengths, similar placement, identical connection method, etc. Any deviation from an identical setup will bring down the wrath of the dissenters and, I might add, rightfully so. When you are doing a comparison, you want to ensure, as much as possible, that they differences can be attributed to the two items in question. If you hook up one pair of speakers with a mid-level receiver and the other with an external decided amp, people will cry foul. No matter how many times you reassure them that the amps weren't clipped and the speakers were played within their tolerances, there will always be the question of whether the differences heard were a function of the different amplification methods. It is just safer all the way around if you equalize everything.
On the other hand, if you are testing a component (like a CD player, amp, or receiver), you'll want to make sure you are using the same speakers, components, connection methods, etc. The idea is to isolate the items under consideration. Everything behind and in front of them in the chain should be the same. Even things like cable length (which no self respecting engineer would suggest makes an audible difference) should be equalized as much as possible. Why give people fuel for the fire when you don't have to?
1)That's a good idea but it bring up a question of how different size speakers will interact and affect each others sound in free space. Baffle step compensation is designed differently by different manufacturers.
I'd go with turntable. It's not hard to manufacture. Basically it's a large Lazy Susan with carpeted base.
Apparently the turntable method at the labs are enough not to have interference between the speakers when they are 180 behind in the off mode.
2)Ehh, yes if we want to be perfectly precise. How do we account for differences in off axis FR between different speakers? It will affect SPL matching unless the reviewer is in anechoic chamber.
Personally, I'd go with pink noise within 0.5db. Differences in FR between the speakers are very obvious in comparison to amps, CD players and other electronics.
The point of level matching is not to make it harder to differentiate, it is to eliminate a volume difference that the listener may misconstrue as being better. You don't EQ the hole frequency band, perhaps at one frequency, like 1kHz. And the other parameters are really what will tell one speaker apart from the other and stand it out, that is the whole point, which is the better speaker.
3)I was being a bit sarcastic about it.
On another note, I think a reviewer should use real MLS measurements and not a Micky Mouse analyzer.
What's wrong with Mickey Mouse?
They might be the best sounding speakers in the world.
But if their ugly, they won't be going in my home.So yeah, there is a bias in sighted tests. But we are talking about comparisons, not strictly testing for performance.
Besides, if you do it right there will be a non-voting person controlling the a/b switch and not telling you which pair your listening to. .
Yes, but, if you can see the speakers I am sure you will have a good probability of knowing which one is on
The visual impact should play a role, of course, but in my estimation after your sonic experimentation and see which has more weight for you.
Even with that switcher person not telling you, it still matters and can affect the bias responses. That is why DBT is used when it matters.
In this case though...
I've conducted a few comparo's myself, and to be honest, the whole single-blind, double-blind stuff is totally unnecessary. When a test is set up properly, you'll find that speakers differ SO MUCH that you'll laugh at the notion of DBT's. ...
Go to town and be amazed how quickly and easily you can tell differences.
The issue is not that there are still sonic difference between speakers but where does one's bias will take them; the one that visually impresses and affects the sensory inputs or, in fact the one that one really prefers due only to the sound and nothing else.
If DBT was not necessary, even with speakers, research labs would not use them. After all, that is what they used to find out what people of all backgrounds prefer, flat frequency response, wide dispersion, etc. And demonstrated that without it, the other sensory inputs do a number on preferences.
You cannot and don't think the labs EQ the speakers so the levels are matched, but you do need to level match and perhaps, with speakers, you do need to use a very sensitive spl meter that is capable of such level.
Come to think of it, the same voltage may not be enough as when the speakers are the constant and the components are swapped; then, it is imperative to use a volt meter as the speaker will output the same level with the same input voltage.
I made this post in another thread, but considering the disucssion here I feel it is worthy of being copied:
They might be the best sounding speakers in the world.
But if their ugly, they won't be going in my home.
So yeah, there is a bias in sighted tests. But we are talking about comparisons, not strictly testing for performance.
Besides, if you do it right there will be a non-voting person controlling the a/b switch and not telling you which pair your listening to. The people listening just need to sit back, relax, close their eyes and listen. Once your ears pick out a particular pair you like, it will become very obvious when that pair is playing. It worked out quite well at my house last year when Gene ran the show. We came up with the top two bookshelfs being the Status Acoustic Decimos and a pair of AV123 x-ls speakers. Definitely not in the same ballpark with appearance and price range. And for what it's worth, the worst performers were the SVS bookshelves.
Unfortunately the Decimos sounded so good I had to buy a pair. And at this point, I don't want to find anything that sounds better. My wallet couldn't stand it and wife would kill me.
So what's the lesson to learn from this?
DON'T KEEP EXPENSIVE GEAR THAT YOU DON'T OWN IN YOUR HOME FOR VERY LONG.
YOU MIGHT END UP WANTING TO BUY IT.