Katz's Corner: The Great Headphone Shootout Part 2

The Headphone Slutz group meeting was fast approaching, and the demo cans would soon arrive from the Cable Company. I was excited! In a few weeks I would be able to hear and compare four of the world's best headphones, in my own room with the finest associated gear and music sources, and share that experience with seven friends.

I knew I would need a real good dynamic headphone amplifier, with a quality that could hold its own against my KGSS Stax amp. So after discussing headphone amps that I could afford with my good friend (and fellow columnist) Steve Guttenburg, I dropped my dimes on a new Burson Soloist headphone amplifier sold by a private party at a discount I luckily found on EBay. At the time the Burson arrived, the best headphone I could throw on it for auditioning was my Sennheiser HD 600, which I am sure is not in the same league as the phones I would soon be testing and hopefully buying. I listened briefly to the Sennheiser on the Burson, and reached no more conclusion than that it produced music. I longed for the robust quality of the Audeze's that were coming.

The build quality of the all-discrete Burson Soloist is impeccable: it's solid, impressive looking, and its interior layout is very clean. Before measurement, as is my wont, I cleaned, preserved, and enhanced all internal and external contacts with Stabilant 22. The Soloist can be used as a three-input unbalanced line preamp with three output gain settings; when the phones are plugged in, the line output is defeated. Its custom-built switched analog attenuator has a solid feel. There's something about a stepped pure-analog attenuator that feels assuring, and I believe has less distortion than traditional potentiometers and digitally-controlled analog ICs (such as MDACs).

The published specs of the Burson are skimpy, to say the least. The only specification they give is the wattage capability with a "1 kHz line level input" (whatever that means) with "30 ohm output impedance" (whatever that means). Off the bat this is two mistakes: the first is not to specify the source voltage and the second is to confuse "output impedance" with "load impedance". Of course they meant "load impedance" but still they lose points for imprecision of language. Without knowing these, the purchaser is in the dark—all he has is a vague idea of the power output capability of the amplifier. Actual output impedance is not given, nor is the voltage gain. Three wattage levels are specified: 0.18 W, 0.7 W, and 2 W at the three different gain settings, but without giving a source voltage or the position of the volume control these specifications are specious and incomplete. A knowledgeable engineer could produce any of those wattages or any variation in between at any of the gain settings by using a different source level, assuming the amplifier doesn't go into clipping. So every amplifier manufacturer should specify the voltage gains, as well as the output capability, at a specified distortion.

So, I knew almost nothing about the technical capabilities of this amplifier except that it could probably produce 2 watts, probably into a 30 ohm load. While waiting for the Audeze phones to arrive and to ensure that all was in order with this new amplifier I performed measurements on the Burson with a 20 ohm, 110 ohm load, and high impedance, representing a range of typical headphones. (Please let me know in the comments if you'd like to see even more comprehensive technical measurements in my column. I can perform them, but the demand from the public for equipment measurements seems to be very low.) In general I've noticed a lack of published specifications in audiophile gear with some exceptions. Even though a specification such as THD does not seem to correlate with sound quality once it is below a certain per cent, at least we can observe the voltage level where the device clips or begins to produce a high THD, assess a headphone amp's output impedance, and measure frequency response to confirm the most basic performance parameters.

Burson Soloist Measurements
First measurement was the steps and tracking of the 24-step attenuator (volume control). Burson has constructed a rugged rotary switch with carefully-selected metal-film resistors. The measurements show diligent resistor matching, with a remarkable 0.1 dB or less difference between channels at most levels of attenuation. The steps (changes from position to position) have been sensibly chosen, with the highest volume steps being 0.3 dB, 0.3, 0.55, 0.65, and gradually growing as volume reduces to between 2 and 3 dB per step. Larger steps are only employed at the lowest volume control positions. Voltage gain to the line outputs is the same as to the headphones. At the top of the volume control (0 dB attenuation), the maximum voltage gain is 11.55 dB, lowering to 7.3 dB and 1.3 dB at the other two gain positions (measured at 1 kHz). This is enough gain to drive any current dynamic headphone to deafening SPLs with any typical source. I advise listeners to stick with the lower gain unless necessary, and start with the volume control turned down and carefully turn them up "enough to satisfy".

Next, I measured the clipping point of the Burson at 1 kHz using an oscilloscope. Visually assessing clipping is a rough, subjective interpretation, but it gives a quick idea that the amplifier is working and roughly producing its specified output. Into an open circuit, the amp clips symmetrically at about 9.49 volts RMS, which demonstrates an excellent power supply voltage, no skimping there. Into 110 ohms (the load of an LCD-3) the clipping point is the same, equal to 819 mW or 0.8 watts. Into 20 ohms (the load of an LCD-X) it clips around 8.00 volts, which is a hefty 3.2 watts. On the scope the waveform seems to soft clip, which I think is a good sign. As you will see, I should have looked for clipping at low frequencies as well. Output impedance (calculated with a load drop) at 1 kHz is less than 1 ohm, another good sign.

Long before visual scope clipping, distortion components rise as the level approaches clipping. I like to calibrate the vertical scale of a harmonic distortion graph of a headphone amp in equivalent SPL, which gives us a tangible idea of the amplifier's real-world capabilities. Using the LCD-X as an example, its rated load impedance at 1 kHz (not measured) is 20 ohm, and its sensitivity is rated as 103 dB SPL at 1 mW, which is only 141 mV. I like my forte passages to be about 83 dB average SPL per channel; leaving 20 dB (or more) headroom for short term peaks (transients), that would come out to 103 dB on peaks. On occasion, with large orchestral or epic rock pieces I may try out a headphone circa 90 dB average on forte passages, but only for very brief periods, or I would go deaf. Again, leaving 20 dB amplifier headroom for occasional peaks, this means that to get 110 dB peak SPL, the maximum amplifier voltage needed for the LCD-X would be 317 mV, or only 5 mW into 20 ohms. So why this craze for super headphone amplifier power?

The best answer I can give is that superior headroom seems to translate into better sound. In other words, although in reality we aren't requiring that amount of peak power, it seems that the capability to do it is necessary for good sound quality. There are other considerations, such as using well-designed Class A discrete amplifier stages as opposed to integrated circuits, power supply design, topology differences, but even then, as you will see, when my listening panel compared the Burson to other amplifiers which also use discrete components and have high headroom, sonically the Burson came out on top.

Back to the Burson measurements. I used Spectrafoo and Room EQ Wizard to generate the graphs, calibrating SPL to the published rating of an LCD-X at 1 kHz. All SPL readings below are the equivalent of the measured voltage, no headphones were driven or harmed for these amplifier measurements! My Prism USB interface with an external resistive attenuator permitted measuring up to a maximum voltage equivalent to 136.2 dB SPL (6.48 V, 2 W in 20 ohm) before overloading the Prism's ADC. Crazy high and unrealistic! Although I quickly checked the performance at 1 kHz at this extreme, I wasn't surprised to start seeing the harmonic content go to unrealistic levels into a tough load like 20 ohms. Anyway, let's consider 110 dB SPL to be the fair maximum level for harmonic distortion and frequency response measurements.


Fig. 1) Graph of the harmonic components of a 1 kHz signal at the equivalent of 110 dB SPL.

Most of the visible spikes are the residual noise of the Burson, which are all components of the 60 Hz power line frequency. There is a very small trace of second harmonic at a level equal to 5 dB SPL, so it's surely inaudible, and an even smaller trace of the third harmonic. So for all intents and purposes there is no audible harmonic distortion at 110 dB SPL at 1 kHz, excellent performance.


Fig. 2) 20 Hz signal at 110 dB SPL.

With this signal there may be a trace of second harmonic, but in reality the only visible spikes are components of the power line. You may ask if I confirmed this measured noise is not a problem with my interface or interfacing (grounding). I doubt it, as this graph, Fig. 3, shows.


Fig. 3) Noise in Burson amp with power on (blue) and off (red).

With the sensitive Audeze X phones the Burson has a completely silent noise floor, with no hiss or hum to be heard. Measurements show, in red, the noise floor with the cables connected and the Burson switched off, in blue with it switched on. There are only four spikes in red which could be due to interfacing or grounding issues in my measurement setup, the rest is the noise of the amplifier itself. But be aware the highest spike in blue is 120 Hz at only 14 dB SPL, that is the residual hum from the Burson's power supply, about 6 dB below the ear's threshold of hearing at that frequency. The 180 Hz spike in blue is caused by induction, probably from the power transformer, again close to the the ear's threshold in a quiet room. During listening I never heard anything but a completely silent noise floor from the Burson, so if these noise components are there, they are way below normal hearing even in a quiet room. And headphone listening is far more sensitive than loudspeaker listening to any hum or noise because of the proximity of the drivers.

I did discover an anomaly that is a bit disconcerting: The Burson Soloist unit I purchased clips (severely distorts) at low frequencies (below 50 Hz) at levels above 126 dB SPL (above about 2 volts) into 20 ohms, or 200 mW, about 6 volts below its clipping point at higher frequencies. The low frequency response starts dropping radically at this level and the THD goes through the roof, a definite misbehavior. Other headphone amps I measured did not exhibit this anomaly so my test gear was not the culprit. This issue manifested itself identically in both channels. It was also the case into higher impedance loads. I'll be happy to check another unit to confirm this is not an anomaly. But to put this into perspective, 126 dB SPL is a deafening level, so I think we are safe to say that in the real world, no one will operate near this level and the Burson will probably perform very well!


Fig. 4) The performance is wonderful at 126 dB SPL.

At 126 dB SPL frequency response is ruler flat from 10 Hz to 20 kHz, with total harmonic distortion at 33 dB SPL for the majority of the spectrum, equivalent to 0.002%.


Fig. 5) Low frequency anomaly suddenly appears when drive to 127 dB SPL.

So we can conclude that below its low frequency clipping point, the Burson performs impeccably, although, at least in the unit which I own, it exhibits measurable problems at high levels and low frequencies. This could be a power supply-related issue or a lack of negative feedback below a certain frequency, I cannot say for sure.

Benchmark and Prism Headphone Amplifier Measurements
I measured two other dynamic headphone amplifiers. First, the one built into a Benchmark DAC-1, which has always been a respectable, solid DAC for me. In fact, I was one of the first to discover the Benchmark DAC-1's virtues and wrote an early review in Pro Audio Review magazine which single-handedly provoked the Benchmark craze amongst audiophiles and professionals. Never having given its headphone amp much thought, I was quite surprised to find measurement-wise the Benchmark smokes the Burson.

At an equivalent 127 dB SPL I could not see any anomalies in the Benchmark's readings. Benchmark considers its HPA-2 discrete headphone driver to be a "premium" amplifier—in professional circles it's rare to find such an amp built into a DAC, though in the audiophile world we expect to find one in any DAC equipped with a headphone jack. I measured about 0.002% THD for the majority of the frequency spectrum at an equivalent of 125 dB SPL for the LCD-X with a 20 ohm load. I measured a near zero output impedance so it can drive most any headphone you can throw at it.

Surprisingly, I cannot say the same for the headphone jack of the best-sounding DAC that I own, the Prism Lyra-2, a rival for the best DAC in the world. But I would not consider its headphone amplifier to be "premium" based on output impedance alone. Its output impedance is about 80 ohms, so I doubt it will damp dynamic headphones as well as amplifiers with a lower drive impedance. At maximum headphone gain into 20 ohms with a -20 dBFS RMS 1 kHz sine wave the Prism can "only" produce 165 mV, equivalent to 1.3 mW or about 104 dB SPL, which is still very loud. Its THD measures below a very low 0.001% for the majority of the spectrum. We'll see how it performs in listening tests at the Headphone Slutz get-together, described in parts 3 and 4 of this series.

ashutoshp's picture

Hi Bob. We, and by we I imply portable device enthusiasts, NEED MORE MEASUREMENTS. Tyll and Changstar publish quite a few headphone measurements that are very useful but RARELY do I find any measurements of portable or desktop amps (Tyll has promised this but i'm still waiting ;). And surely there aren't that many headphones coming down the pike. Considering the ubiquity of headphone amps, I am certain the value of measurements will not go amiss. I hope Tyll realizes this because he has some of the best writers already in John Grandberg and co. Innerfidelity is a treasure and this will only serve to strengthen its status.
FYI, I actually decided to not buy the Burson because of the dodgy specs. But this behaviour is rampant and i find it annoying because while I do believe listening is the key, I dont have the time or money to experiment with several different amps before settling with one (assuming I was able to get them level-bloody matched). 2 or 3 general reviews with measurements is enough to seal the deal.
I have read almost everything you have written on your (very helpful) website as well as the (just?) 3 articles on Stereophile and I'm a fan, to say the least. Here's to more. one question, in your Audio FAQs under the title "Challenge from a Rap and Hip-Hop Engineer", what was the outcome of the bet you placed with the engineer?

Bob Katz's picture

Dear ashutoshp.... Now that's a hard name to pronounce! Thanks for your kind words. If Tyll permits me, I'll continue this fun blog. In that case if he loans me any headphone amplifiers that exceed his own ability to review, I'll give them a pretty good set of measurements. I do not have a simulated headphone load, however. I have a good 10 watt 20 ohm and 10 watt 100 ohm non-inductive load and at least that will tell us something. I can measure the basics, THD, frequency response. Look at waveforms, and most important--- listen. It's up to Tyll. He has a limited budget to run this site, and I have limited time, but I'm happy to help. I fully agree that Innerfidelity is lacking an amplifier measurements section. There are only so many hours in the day! I also now own only one high end/low impedance headphone -- the LCD-X, and that's not enough to do a good subjective review. So what are we going to do about that? I can't afford to buy any more premium phones right now!

Bob Katz's picture

I'm impressed you read that post at digido.com! http://www.digido.com/audio-faq.html?option=com_fsf&Itemid=93&view=faq&c...

A lot of water has gone under the dam. We now have an objective way to measure micro and macro dynamics. Macro dynamics is measured using the BS.1770/EBU R128 measure called LRA. And a new proposed measure called LDR measures microdynamics. I think the short term attack of the cherry bomb sounds wimpy when it's been overcompressed. It's a matter of balancing transients to the macro power and thrust. Not enough of either and it doesn't sound as good. But at the stoplight only the low frequency power and punch of the cherry bomb will come through, the microdynamics will not. So microdynamics are more or less reserved for high fidelity listening. So the whole explanation I gave back then has been updated in the third edition of my book and it takes more to talk about than what I answered back in that FAQ! Does that help?

ashutoshp's picture

I'm actually concerned about the status of (properly) reviewing audio equipment once these two gentle folks decide to seek pastures anew. While I think/hope reviewers are consistent in the way they describe sound signatures, simple measurements to ensure that the amplifier meets basic engineering criteria is VERY USEFUL.

logscool's picture

I too cannot wait to see more amp measurements and hopefully something like what Tyll has put together as a database for headphones for amplifiers and some good descriptions of what every measurement means.

However I think there is also lots of value to the kind of measure as you go review style that Bob seems to be using. Where when you notice a problem with a particular aspect of an amplifier you can conduct further tests to more closely look at this specific problem and how the amplifier misbehaves in this area and compare it to other amplifiers to provide some context on how an amplifier that performs well in this area looks.

castleofargh's picture

from my point of view, nobody should buy an audio product that doesn't have extensive well expressed measurements. that would unlock the situation pretty fast I believe. money's the only language some people can talk.

you mention 110db being enough for proper headroom, while from my own trials 100 or 105db are really good for me even on some dynamic classical, when people ask me about a source driving a headphone, I tend to use 115db as the "all clear" value. because that's what I was told when I learned about those stuff. do you think I can go down to 110db for everybody? (not that it matters too much, but that's almost half the max voltage and at least for portable gears, those limits can be reached).

it feels like people go for super power amps because they don't understand anything and get afraid by all the silly talks about "my hd800 is so hard to drive" and other nonsense where guys so often mistake coloration for driving capability.
this article without detailing everything, gives a few examples of headphones and what is really needed to drive them to a certain loudness directly confronted to what the amp will output. I believe that's the kind of information most people really want instead of how to calculate stuff themselves(because that means learning and then they go "don't teach me anything, this is a hobby"). :'(

Bob Katz's picture

Dear.... I agree that speccing an amp to be able to produce 110 dB SPL in a given headphone is an extreme. But I picked it as going at least 10 dB beyond just to be safe. To my experience (despite claims to the contrary by a user in this thread), an amplifier that clips very close to the maximum SPL you want to achieve usually doesn't sound as good as one with more headroom. All I can do is claim this to be the case, I haven't got any objective double blind listening tests on that! So, take it or leave it, that's my opinion. It is based on years of critical listening experience, if that's worth anything to you.

castleofargh's picture

well I don't think anybody is against headroom, I believe what xnor is arguing against is the general idea that more power is always better with no limit to how much more can in fact become a problem.
there has to be a point where more is too much else we would just use speaker amps and be happy with that once adding a voltage divider.
anyway, it's been discussed, I was just wondering about how loud I should look at when estimating my needs for a headphone. from what you say I'll stay in the 110/115db zone and that's ok. so thank you for answering.

Tyll keep those guys coming!!!!!!!!!!!!!! I'm really enjoying myself reading innerfidelity those days. ;-)

tony's picture

I'm glad you measured the Burson to discover if it works properly. Good things are said about the Company and it's products but it's hard to tell…. especially a $560 used amp on ebay.
Musical qualities are another matter ( I suspect ). I've owned scads of Amps with only a small number seeming to be Virtuoso performers (Electrocompaniet).
I've seen the "Tweaks" cook up modified Amps that are astonishing musical performers ( that tend to go "out-of-tune" with the weather ).
Bruce Brison (MiT)& Karen Sumner Cables (Transparent), Koetsu phono cartridges are another range of products that astonish but can they be measured?

Gotta start somewhere though and a solid example of a working amp is as good a place as any.

It's interesting to read about how you work and think.

Stabalint & Cramolin are secrets, nice of you to mention this.

Tony in Michigan

ps. someone on HeadFi just completed a report on Tube Rolling.
He's using a first Gen. Schiit Lyr as the test bed amp, he reports wondrous variations and success from varied tubes. There are many ways to exercise neurosis & psychosis in this hobby.

Hifihedgehog's picture

I can send some DIY portable amplifiers for measurement if he wants to take on this task next. But like they say, one thing at a time. I can see this push for headphone measurements chewing up Tyll's time for the next six months to a year easily. But who knows? Maybe Tyll has the time and wants to do this too? I sure hope he can soon! For now, I just keep eyeing the datasheet section daily for updates/

Bob Katz's picture

I'd be very interested in doing a shootout of your DIY amp rated at at least 2 watts with less than 0.1% THD into any load down to 20 ohms. A shootout against my Burson. I could tell you why that interests me very much, and the general overall reactions to the Burson, the Schiit, the Benchmark and the Prism by myself and the rest of the listening panel, but that would spoil the suspense, so we're gonna have to wait.

I'm keeping track of all the nice comments we've seen here, including some opposing views. As the episodes proceed, we'll be able to return to these important topics once we approach the conclusion of the great headphone (and amp) shootout!

xnor's picture

It's simply not true that more power means better sound. You can try this yourself in a blind test with monoblocks vs. an amp with similar performance but only a tiny fraction of the power output.

When you want to reach 110 dB SPL with a full-scale sine wave and only need 5 mW, then the 3200 mW available will not do any good to sound quality. (In fact, it would likely be detrimental to sound quality if the high gain needed to achieve this power output was fixed.)

You may want to add 3 dB to that 5 mW power requirement (=> 10 mW) in case you want to play full-scale square waves cleanly, and maybe a dB or two (=> 15 mW) before reaching the clipping point, but everything beyond those ~15 mW will not be used, especially not with an almost purely resistive load like the headphone you used.

Bob Katz's picture

I don't think either you nor I know where the magic fully lies. Until we have a good handle on what makes a real good sounding amp we have to agree to disagree. I still believe that headroom is an important ingredient in an amplifier. It may not be that the CAPABILITY to produce high output is the direct influence on the amplifier's sound quality. It may be a secondary cause or influence. I do recall a loudspeaker power amplifier which was fully class A and only rated at 25 watts and realizing that this manufacturer's CLASS A 25 watts was "worth" more sonically than another manufacturer's CLASS AB 100 watts. But ultimately it is the subjective judgment. In an upcoming episode I will give you my comparison of an economical $100 headphone amplifier whose reported wattage is in the same ballpark as the $1000 amplifier. Then we can fight it out :-). I don't know enough but I know what I like so far, and headroom seems to correlate with sound quality to my experience. If you can supply me a 200 mv. capable power amp that can drive that into 20 ohms and sounds great to both you and me then I will stand corrected.

xnor's picture

I think we have a good idea of many factors that can have a huge influence on how we judge products. I'm sure you've read
Sean Olive's article.

I'm not here to argue with your subjective impressions. Those are yours and only yours, and others might share similar impressions or not. That's why I think that such parts of reviews should be prefixed with an appropriate explanation.
It's a bit like arguing how you perceive the color of different objects. I'm sure you have heard of "the dress". We can measure the colors on the picture (a gold and blueish white), but different people will perceive it differently.
Or even something as trivial as prescribing inert tablets. On some it has a negative effect, on some it has a positive effect, and on others it has none.

Subjective impressions aside, an important question is, would you notice a cheap (but with similar performance within these few milliwatts) headphone amplifier module being transplanted into your preferred headphone amp?
In order to make a correct universal statement about sound quality, we need to make appropriate tests. Sighted (where you receive clues about which amplifier you're listening to) tests are demonstrably inappropriate for that.

On the measurement side, we certainly lack comprehensiveness and consistency. One thing of interest would be crossover distortion measurements for headphone amplifiers.

Bob Katz's picture

Dear Xnor: Of course I am familiar with the situation regarding blind listening tests and listener prejudice. But the fact is, as I mentioned in episode one, it is impossible to objectively match levels when judging different headphones. When judging different amplifiers it is of course possible. As the episodes go on you will hear about the subjective judgments of 8 experienced audio professionals about the amplifiers under test and you can weigh in again with your reactions at that time.

xnor's picture

Sure, with headphones it is virtually impossible.

But my remark about subjective impressions still stands. It's just paragraphs like the one about more power = better, classes/topologies, ICs are bad etc. that trigger a warning sign about typical audiophile prejudices and the effects it can have on subjective evaluations.

One more remark about the headroom. This is not a case where you need 10W average power to drive a speaker and 1000W to reproduce 20 dB peaks cleanly.
In this case the <10 mW are already the power needed for the peaks, and the average power will be 0.x to 0.0x mW with similarly low volts. Many headphones just don't need much power, they just need clean power.

It really would be great if you could measure distortion with whatever amps you're testing at low levels.
“The first watt is the most important watt.” - Dick Olsher
I say: "The first milliwatt is the most important milliwatt." ;-)

Bob Katz's picture

We all have to make our beds and lie in them. As far as the milliwatt-level measurements... As yo can see, the noise floor of the Burson is in the milliwatts. Not shown is that the Burson is very linear up to where it goes crazy... its THD quickly goes well below 0.001% at low levels. So the answer to your question is that if you drop the input signal level to extremely low, you'll find the harmonic distortion components at 1 milliwatt with this amp will be inaudible, buried in the noise floor.

So the first milliwatt is not the most important.... as long as the distortion produced is well below the noise!

The Burson's harmonic distortion behave linearly.... until it ceases to be linear. That first point is at low frequencies. Even Dick Olsher should look at the actual SPL of the components he's citing as important. If it's below the noise, it would be hard to support his contention. I'm not saying he's wrong, I'm only asserting that in the case of the Burson, you're barking up the wrong tree here.

Finally, all these measurements, as important as they are to establish a base line, to prove that aan amplifier meets minimum requirements------- in my opinion, tell us very little about its sound. For example, to my experience, the quality of regulation of an amp's power supply has far more influence on its sonic quality than frequency response or THD.

Next, if you would liek to bark up the tree of requiring a double blind test, you are going to have to learn really and truly how difficult it is to perform a DBT properly. Subjects have to be trained... if you are trying to prove that a difference is audible, you have to find test material that exercises that factor and train subjects to recognize it.

So in the case of the importance of headroom, to prove whether you or I are right in terms of headroom being important, then I suggest you prepare yourself for about a year's work. If you want to prove the negative (and I want to prove the positive. We'll need perhaps 100 skilled and trained subjects. Perhaps (if we can find them) 10 of these will be capable of passing the listening test. The other 90 will fail the test because it's very difficult, fatiguing, and rigorous.

So if we don't perform the test sufficiently well, a null result will not prove anything. A statistical negative doesn't prove that the phenomenon is inaudible, it only shows that it is unlikely to be audible. It may instead show that you were not good at training your subjects, or your test material is not suitable for discerning the differences or your protocol was poorly performed.

I do have the experience of having conducted a successful DBT on a very subtle difference: Whether a D/A converter sounds different running on internal clock versus an external low-jitter clock. It took a long time to prepare and it was like pulling teeth to find subjects and get them to attend and take the test!

For this and many reasons, DBTs are not the cure-all. They are expensive and difficult to perform correctly. Many amateurs who want to throw their weight around and suggest a DBT like you have have no idea how difficult it is to do right.

I have studied how to perform these tests with the support advice of two of the world's experts on the topic. I'm fortunate to know these scientists personally or I would have flunked out of trying to do it. Without their advice you'd be better off just dreaming rather than trying to do one of these tests.

One of these scientists, Dr. Gilbert Soulodre, has helped to define the ITU BS.1116 protocol and has written several scholarly articles on how to perform such tests. So to begin with, you should read and study these papers if you want to start throwing DBT weight around:

1) ITU BS.1116C protocol, revision 3. Especially note the part where the authors caution amateur experimenters from trying this protocol, since "we cannot cover everything needed in this document and we suggest employing experts in conducting such tests"

2) This exemplary example of a BS.1116 test that was performed: http://www.researchgate.net/profile/Michel_Lavoie4/publication/242819973...

According to one of the authors, the latter of these documents is testing differences which experts feel already knew were audible and testable. When large amounts of money are on the line, they have to get all their ducks in a row. Imagine when there is no money on the line, and what you want to test are two different power amplifiers! The issues multiply because the differences are ultimately far more subtle and difficult to test for.

Both of these scholarly articles make it clear that it is very difficult to perform a DBT. So I don't have the time to create a scholarly DBT for every assertion that I make.But I will try to buttress what I hear as much as possible and you can either choose to believe it or not. "I'm not a doctor, but I do look like one."

By the way, I was not convinced by your citing my friend Sean Olive or any other authority to buttress your opinion on headroom and audio quality. I do think we are still in the dark ages on the subjective area of amplifier differences. I do know that a famous test was done like 20 years ago that seemed to prove there are no audible differences between power amplifiers once levels are matched.... My only response is: They must not have performed the DBT correctly or sufficiently trained their listeners or found good material to prove their assertions.

For you or me to set up a well-constructed BS.1116-quality Double Blind Test would be prohibitively difficult. For me to prove the positive I would need to find at least 10 subjects to take 10 trials each that can pass that test out of the many who cannot pass it. Please don't be arrogant and say that "anyone can set up a DBT" because I've spoken to the world's experts and they are humble enough to state that a DBT is almost automatically biased to come up with a negative, unless very careful and professional-grade controls have been instituted from the beginning.

So in practicality, what you and I have is a battle of words. For my part I'll try to be as sensible and logical as possible and for your part all I ask of you is to be a little less dogmatic and arbitrary.

For the record, I am on the side that some amplifier designers appear to know more than others. That some amplifiers do sound better than others, or at least different. I believe that even two amplifiers that measure nearly identical in THD and headroom can have very different sonic qualities. For example, if one has a well-regulated supply and the other one is as loosy-goosey as they come. Can I prove this? Give me a year and a $100,000 grant and I will endeavor to prove it scientifically. In the meantime, I'll buttress my opinions with my experiences as an audio engineer, and in the case of this Headphone shootout, the opinions of 7 other audio industry professionals. I think that's pretty strong evidence. There will be people who will doubt the results. But in that case, why are you reading this column? Your mind is already made up.

My mind is a bit more open :-). I'm open to the possibility that I'm wrong on the headroom thing and I've invited some Innerfidelitiers here to send me a low-wattage amp that claims to be sonically neutral against my Burson and we'll see which one I and some good friends prefer. Maybe I can make it blind (semi-formal) and at least I'll do it at GUARANTEED matched output levels.

Thanks for listening!


The issues of power supply regulation and design


Two amplifiers with equal frequency response, one

ultrabike's picture

I'm not very well versed on DBT so can't comment on that. I agree in that it seems pretty hard to get it right.

On other stuff, I think I understand where you come from. I agree that in general good power regulation and extra headroom is goodness. But it's goodness when it reduces noise and non-linear distortion, if we are concerned about fidelity.

In that sense I do feel that a decent set of measurements can tell us a lot about the quality of our gear.

Here are some slight corrections:

"The Burson's harmonic distortion behave linearly."

Harmonic distortion is non-linear by definition and cannot behave linearly.

"Finally, all these measurements, as important as they are to establish a base line, to prove that aan amplifier meets minimum requirements------- in my opinion, tell us very little about its sound."

I agree with you that measurements tells us little about the sound perception experience. That comes with... experience. But they do tell us quite a bit if we can relate experience to a set of basic measurements. Furthermore, they give us a measure of fidelity.

There is always new exciting things to learn, but I don't think we are doing that bad. I do agree however that there are practical considerations when doing a characterization which results in misleading oversimplifications.

Bob Katz's picture

Dear Ultrabike. What I meant by saying "the Burson's harmonic distortion behaves linearly" is that at low levels harmonic distortion is very low and it increases very gradually as level goes up until it gets out of hand. I didn't create a graph of this and sample the THD at multiple levels but I think you'll see a sloping line representing the increase in THD versus level.

ultrabike's picture

I'm pretty sure you understand this but other folks might not, so I just wanted to clarify for the benefit of others.

In general, I agree that as levels go up or down, so does distortion. At least that's what I've seen :)

xnor's picture

Ok, let me comment on the main points of this huge post:

As yo can see, the noise floor of the Burson is in the milliwatts.

What do you mean it is in the milliwatts? That would be horrible. With a sensitivity of 103 dB/mW (96 dB/mW according to Tyll's measurements) such a noise floor would be very loud.

If you have based your 110 dB SPL on 5 mW into 20 ohm, then the ~20 dB noise floor would be in the picowatt (10^-12) range.
(Btw, it would be nice to add the RMS amplitude of that recorded noise, because such spectrum analyzers are usually not suited for noise.)

THD quickly goes well below 0.001% at low levels

That's good, but I wasn't talking about the Burson specifically. I was talking about headphone amps in general, and if you are worried about crossover distortion in other amps then this is where you should measure, and not just at high levels.

So the first milliwatt is not the most important.... as long as the distortion produced is well below the noise!

So.. it is. By the time you reach 1 mW you already produce substantial SPL with many headphones, so generally speaking you want sounds played at a fraction of that power to be reproduced cleanly.

In other words, low THD+N is irrelevant if measured at levels far beyond the levels where normal music listening (including low-level details!) falls into.

Finally, all these measurements, as important as they are to establish a base line, to prove that aan amplifier meets minimum requirements------- in my opinion, tell us very little about its sound.

I disagree and agree with ultrabike here.
What these measurements will tell us little about will be your (or anyone else's) subjective impressions. This is confirmed by Toole's research.

Next, if you would liek to bark up the tree of requiring a double blind test, you are going to have to learn really and truly how difficult it is to perform a DBT properly.
So I don't have the time to create a scholarly DBT for every assertion that I make.

I think you are barking up the wrong tree here. I am not asking you to do a scholarly, formal DBT with results that could be published in a journal - far from it.

If you say that amp A sounds subjectively better to you than amp B, then aren't you curious if this difference persists if you don't know which one you're listening to beforehand?
Wouldn't you say that something like that should be done if you want to say that it is not just your subjective impression (which I'm perfectly fine with), but a more universal statement?

I do think we are still in the dark ages on the subjective area of amplifier differences.

Again, that's because your subjective impressions have little to do with just the sound (and therefore measurements) of the device, but all the other factors Olive mentioned in his blog post. The list of human biases is endless and the brain is very powerful.
That's no rocket science, but a basic admission of human nature. Completely inert pills can 'cure' a sick person too. Does that mean some magical ingredient, that we cannot measure, made it into the pill? Nope.

Please don't be arrogant and say that "anyone can set up a DBT" because I've spoken to the world's experts and they are humble enough to state that a DBT is almost automatically biased to come up with a negative, unless very careful and professional-grade controls have been instituted from the beginning.

Sorry, but that is nonsense in this context. Doing a personal, casual blind test is not hard. In this case you are the expert, you are no stranger to your listening environment, setup or claimed/heard differences, you don't need to be educated on the test procedure etc..
Assuming matched levels, all you do is have a friend switch cables between the hidden amps. Do a few trials and let him write down the answers.
Sure, it's not scientific, there may still be cues that will lead to false positives (and if you actually notice them then you'd have to call the test off), but it's worth it.

So in practicality, what you and I have is a battle of words. For my part I'll try to be as sensible and logical as possible and for your part all I ask of you is to be a little less dogmatic and arbitrary.

It looks like you've already pigeon-holed me, and think I am attacking you...

Your mind is already made up.

My mind is a bit more open :-).

That's quite the impudence.

PS: The "the opinions of 7 other audio industry professionals" do of course have their merit, but since I can show you PhDs that are of the opinion that the earth is flat or the center of the universe, the mere opinions are not good evidence.
Please don't take this as an attack. I am not comparing you to crackpots here, but I'm pointing out the problem with "opinions" and "subjective impressions" (which again I'm completely fine with if they are portrayed as such).

ultrabike's picture

> Bob: "Finally, all these measurements, as important as they are to establish a base line, to prove that aan amplifier meets minimum requirements------- in my opinion, tell us very little about its sound."

> Xnor: "I disagree and agree with ultrabike here.
What these measurements will tell us little about will be your (or anyone else's) subjective impressions."

Yes. Actually that's more or less what I meant. One can see this or that random distortion. But to get how things will actually sound I believe there is not substitutes for experience.

Note that there might be metrics one could still use to quantify subjective appreciation, which may actually be used for audio (data) compression (like in going from FLAC to MP3 or so). But those are not your classical amp characterization metrics and in the end I think in the proper context both you guys are saying similar things. I think.

xnor's picture

Indeed, the best models of the human ear won't help you with how a device will be perceived subjectively in a sighted test, because what you'd need is an additional psychological model of the listener.

Both sighted and blind listening involves a person that actually listens and therefore results in an experience, but the former one is demonstrably heavily biased.

I have to bring up the inert pill example again. Experiments have shown that color, shape, taste, name ... influence the effect on patients. We are talking about inert pills here, placebos!

By this I am not trying to disparage subjective impressions, because I might receive a placebo myself one day - and I want it to be as effective as possible - but I will not say that it actually does contain a substance which we just cannot measure...
And while we don't need to, it is easy to 'disprove' stuff like homeopathy, because all that drinking a couple of bottles of homeopathy will kill is your thirst. ;)

Bob Katz's picture

First of all, Xnor was correct in terms of how many watts the noise floor is in the Burson. I spoke from guessing and I should have actually done a calculation, so here it is exactly. A rough visual inspection of the noise floor in Spectrafoo shows it's lying at about the equivalent of -13 dB SPL above 125 Hz, since I chose to use an SPL-based scale. That's 103 + 13 = 116 dB below 1 mW. Which, if I know my logarithms, is 0.000 000 000 003 watts, or 3 times 10 to the minus 12 watts or 3 microwatts so Xnor was correct. I don't know what to make of that, I'd rather think in dB and the bottom line it is inaudible. I don't know where that was supposed to take us.

Anyway, there is a logical conclusion to my argument about DBTs. I was specifically pointing out that an informal blind test without a significant number of trials is invalid, that Xnor's sugg. 8 out of ten correct is a likely indication, that your judgment is better than chance, but not very strong statistically. 20 or 30 trials with 80% correct is very strong. if that's what Xnor was trying to get at. Unless you have enough statistical samples, your judgment is equivalent to flipping a coin and getting it right. Now, please, don't go away mad, just go away. :-)

Bob Katz's picture

3 picowatts. Now what does that prove anyway? I'm lost what we were trying to prove. And I think this thread has had more than enough discussion about blind testing. The point I tried to make and Xnor was refuting is that a casual blind test is useful. But it is not... a casual blind test is just as bad as flipping a coin. If you perform 20 trials and I perform 20 trials and I get better than 15 right, it's a good case for my assertion. But try 20 blind trials some time... you will need several hours including many 15 minute rest breaks. Plus several days to design the test. I have bigger fish to fry. And I'm inviting you to go fry this argument somewhere else, please. Next subject!

xnor's picture

Hah, you started it with the long reply and rant about DBTs. I promise to make it a bit shorter this time. ;)

THD+N: In the beginning I was talking generally about the importance of clean power, especially the first milliwatt. In an amp with crossover distortion doing just a 110 dB tone measurement would not reveal problems in the power range the headphone will actually be spending most of the time. THD would not drop proportionally with output level.
That was my point. Nothing really Burson specific.

The point I tried to make and Xnor was refuting is that a casual blind test is useful. But it is not... a casual blind test is just as bad as flipping a coin. If you perform 20 trials and I perform 20 trials and I get better than 15 right, it's a good case for my assertion. But try 20 blind trials some time... you will need several hours including many 15 minute rest breaks. Plus several days to design the test.

Not really.
Nobody said something about 20 trials. And a casual blind test with a few trials is not as bad as flipping a coin (50:50), not even close. It also does not take days to design the test. I've already mentioned all this in my previous post...

You were already going to level-match the amps, so what's the big problem at least trying to detect which amp you are listening to if a friend plugs your headphones into a randomly chosen one?

If you claim to hear clear differences between amps (which may actually be real), why would it take you hours to do a few trials?
Even if the test failed because you detected a cue that you do not want to invest time in to 'fix' - just sharing that knowledge would help.

But at this point I'm not even asking you anymore to do trials and keep score. Just try judging the amp's sound quality by its sound alone.


The attitude that is coming across feels like "let's better not try, I might learn something which I knew all along but I'd rather suppress".

Tbh, I couldn't find good arguments against the substantiated points I made. Please go back and read what I actually wrote.
That may sound cheeky, but so were your previous responses to me.

tony's picture

Phew, you must have astronomical performance standards if the 600s don't impress you. Most of the World will never have anything near HD600 quality.

This test you lads are embarking on will make interesting reading.

I'm glad you're basing it on an Amp that any hobby person could afford.

Tony in Michigan

Bob Katz's picture

Dear Tony: Didn't you buy the new Audeze midprice phones? How do they compare to the Sennheiser. I'm just underwhelmed by the HD600's... they are missing that magic Pizazz. They sound veiled to me.

tony's picture

The Bottlehead turns the 600s ( my case the 580s) into magic.

Yes on the 8 Audeze Open which I drive with the Schiit Asgard 2. I wanted the Planer but now feel it needs a better matching Amp, more explosive/dynamic Amp.

My experience lately is directing me to better recordings. I got out my older Sennheiser HD598s ( a low priced headphone ) to discover they sound superb playing A music ( such as "Portraits of Cuba").

Further into this is my need for Portability and Mobility that the Sennheiser's Wireless Headphones give, my next door neighbor can hear my collection of great music ( in her home ) from my big system, no wires! Wireless is wonderful. So, I'm now considering the Stage/Pro wireless systems from AKG,Sennheiser and Shure combined with one of the IEMs everyone is talking about being so good. One of these systems would trigger a $2,500 investment.

Being an Audiophile and having "impulse control disorder" could easily have me buying up everything I see, lots of us do this sort of thing.

The Audeze stuff is superb, the Sennheiser stuff is Superb, lots of stuff is superb and all of this gear is superb with your superb recordings.

Today, I'm uncertain about gear but I'm 100% Certain about the A recordings so I'm investing in the Music!

In answer to you're comment about the 600s missing, I feel they lack Bass and Treble ( which is why I'm interested in a 31 band eq.), they have that "Velvet Fog" kinda sound that Mel Torme and Nat King Cole have, the tinkle of the high frequencies is down, way down ( I very much miss that too ).
They aren't perfect but they are addictive on that Bottlehead.
They remind me of a low powered sports car, gee I wish it had a bit more ( sort of thing ). An acquaintance let me have a go with his Sennheiser HD800s and matching Sennheiser Amp ( $3,000 worth of stuff ). The 800s have the liquid mid-range of the 600s plus the bass & treble BUT it's $3,000!

Every which way I turn I'm facing $3,000 or greater investment to achieve more addictive magic. I feel like an addict.

Buying the A music is my savior, $20 or less keeps me floating on my high, keep up the good work making it and I'll keep buying!

I wish I could just be satisfied but I keep chasing better. My Psychiatrist is no help, he's an audiophile too, he has shelves of stuff he purchased over the last 4 decades ( kinda like me ).

I like reading about your adventure in headphones, you are everyone one of us ( we're wanna-be Bob Katz's ), you're living out our dreams, we are probably right in there with you as you experience this stuff, you're sort-of like a Journalist for a Major Newspaper reporting LIVE on a breaking story and it's the biggest story in Audio today, it will probably include the new 1000s from Hifiman, it's an on-going, chapter by chapter adventure. JA from Stereophile should pay you for all the hits this is generating. You're reporting will be the Tipping Point for plenty of purchases, mostly from you being the "Highest Authority" in all of Audio Journalism.

Tony in Michigan

ps. I nearly called those Lyra DAC Prism people today, wooda cost me $2,800 if I'd called, probably

sszorin's picture

A - What is 'A' music which you keep mentioning ?
B - Depending on the type of music you listen to you will find HD800 either a decent fit for music job or not so decent fit. HD800 are genre specific headphones. Their biggest drawback is that on recordings and with types of intimate music which require that you are drawn closer to the instruments and vocals, HD800 pushes you away and you end up as a spectator and not as a participating listener of music. The strength of HD800 dealing with complex orchestral sounds and number of instruments on a large orchestral stage is their [over]stretched soundscape but this advantage becomes disadvantage with smaller music ensembles playing in intimate music club space. Some find this feature of HD800's sound signature irritating. I do as well.