Overview of Testing Methodologies - page 2
Reliability - More than what your girlfriend says you aren't
I briefly touched on reliability above. Why is reliability so important? When NASA engineers want to know the weight of the space shuttle so they know how much fuel to include, it's pretty darn important that their scale is reliable. If it is not, they might include too much fuel (a waste at best, a falling bomb at worst), or too little (nothing good can come of that), but what about for audio?
Say you want to know if your new speakers are better than your old speakers. You set both pairs of speakers up on a switch, and fill the room with your family. You play the same material, and switch between the speakers, each time asking which set of speakers the family liked better. Systematic bias would be if one set of speakers were louder than the other (studies have shown that louder equates to better for most people). Random error would be if your house was close to an interstate, and on occasion, the noise from the vehicles interfered with the test.
But, you say, only the second scenario shows an unreliable test. As long as you always set the speakers up in the same way, the test would always be biased towards the louder speakers. True, and herein lies the true evil behind the unreliable test:
An unreliable test can either make differences harder to detect (random error), or make you think that differences exist when they don't, or don't exist when they do (systematic error).
That's right, true believers, an unreliable test that is fraught with systematic error will actually make the test seem more reliable. You'll measure the same thing multiple times, and it'll come up the same every time. Reliable! Nope, 'cause it's wrong. Give someone else the same ruler, and they'll get a different result. But, it'll be the same every time. Give it to a third person, and they will consistently get a result, which is different from the first two. Unreliable – but, it appears to be reliable. Oooooh, evil!
But, how does random error make differences harder to detect? Well, you've all seen those statistics that have the + or - some % points or something. Well, that's random error. They are telling you that random error could change the results as many as X points in any direction. Basically, the larger the random error, the bigger the number; the bigger the number, the larger the range; the larger the range, the more chance there is that the middle of that range (the number they always report) will fall somewhere in the "didn't detect a difference" realm. The only fix is to increase the number of measurements taken. By increasing the number of measurements, you slowly become more and more confident that the average of your measurements more closely approximates the actual, true value. If an instrument were perfectly reliable, you could measure something once, and be done with it (even carpenters measure twice, right?). As the reliability of the instrument decreases, the number of measurements you have to take increases in order to be confident that you are close to the true value.
The Double-Blind Experiment - Finally!
Ahh, it's about time. What exactly is the double-blind method, and why is it so desirable? By now, you should be able to guess that the reason it is so desirable is that it is very good at controlling bias. The double-blind experiment is simply one in which both the participants and the researchers do not know who is in the experimental group. Aha! Clear as mud.
An experiment (in the truest sense) tests something (usually a theory or hypothesis). In a double-blind experiment, there are two groups of subjects (participants). One group gets the treatment, and the other doesn't. Generally, the subjects don't know whether they are receiving the treatment or not, but the researchers do: this is called a single-blind experiment. In the double-blind, the researchers don't know which group is which, either. This methodology is most often used in pharmaceutical research, so let's use that as an example:
Example 7: StickItToDaLittleGuy Inc. is a large pharmaceutical company that has developed a new drug to treat chronic headaches. The company recruits a large number of people with chronic headaches to be participants. A bunch of pills are created and put in either a blue or red bottle. One bottle contains the actual drug, while the other contains a sugar pill (called a placebo). Participants are randomly given either a red or blue bottle. They are instructed that, when they get a headache, they are to take two pills. If the pills haven't worked within 30 minutes, they are to take their normal medication. The company wants to know how many times the pills worked, how many times they didn't, and what, if any, side effects are experienced.
Basically, only one person, or a small group of people at the top, knows which bottles have the real medication. The people handing out the pills, collecting the data, analyzing the data, and taking the pills, have no idea which is which. But why!
The placebo effect is a real phenomenon. When someone expects something, sometimes they will experience it, regardless of whether there was a real change or not.
Example 8: When I was in Junior College, waaaaaaaaay back in the day, I ran the light board for a production of King Lear. The lighting director loved to make little changes to the lights. Eventually, whenever he wanted me to bump the lights down or up a "hair", I'd just say, "How 'bout that?" a few times, and eventually he'd say, "Perfect." Of course, I hadn't touched the lights. But, he was happy and would have sworn on a stack that there was a change.
The double-blind test puts everyone in the same position: they don't know what to expect. Of course, after the participants take the drug for the first time, they form an opinion. The researchers and the people handing out the medication form opinions as they talk to the participants, or go through the data. But, you see the idea.
ABX is another term often bandied around the forums in the same breath as double-blind. ABX is not a methodology, per se, but a way of implementing the double-blind test for audio equipment (see http://home.provide.net/~djcarlst/abx.htm). The overview is that you plug the equipment into a box (or a computer using the newest software version), and each person can switch between component one (A), component two (B), or a random selection of one or two (X):
Example 9: So, say you are testing amps. Amp A is a $100 pro amp, while amp B is a $50,000 Krell. For each test, the participants can press: the A button, and hear amp A; the B button, and hear amp B; or, the X button to hear the randomly selected amp. The participants can switch between the buttons at will until they make their decision as to which amp X is. They write that down, then move on to the next test.
So, how is this double-blind? Well, the box (or program) decides which amp is played by the X button during each test, so neither the researchers nor participants know which amp is being played by button X until after the experiment.
Confused about what AV Gear to buy or how to set it up? Join our Exclusive Audioholics E-Book Membership Program!
Recent Forum Posts:
Your link no longer takes you where you want to go.