Audio Precision Explains Headphone Measurements and Head Related Transfer Function

In the image above clipped from the video below, you'll see a little snippet of text from the IEC60268-7 specification for headphone measurements.

No known objective method produces a flat frequency response characteristic from an earphone which is subjectively judged to produce wide band uncolored reproduction.

Before you watch the video, let me provide a few pointers to help you follow along. What we're talking about here is the fact that sound that measures flat on a standard measurement microphone will no longer be flat when you put your head in that sound field and measure the response at your eardrum. Your outer ear—specifically the concha bowl around the ear canal—provides gain between roughly 2kHz and 6kHz. There are other effects that cause sound at the ear drum to deviate from flat as well—ear canal resonances, and head and torso boundary gain for example.

Free-Field Response
Free-Field response is an early acoustic standard showing the relationship between the sound in space and that at your ear drum with sound coming directly in front of you (in an anechoic chamber). The first part of the video (graph at 0:10sec) shows roughly what this response looks like. It says Head Related Transfer Function (HRTF) but you'll notice the measurement is in an anechoic chamber and the angle of incidence is 0 degrees, which makes it the Free-Field response.

Now the problem with the Free-Field measurement is that it only shows the ear drum response for sounds coming directly in front of you. When a sound source moves away from center, the shape of the pinna (outside of your ear) reacts differently to the sound, and the shape of the response will change. The plot shown at 6:26 shows measurements for angles of incidence at 0, 90, 180, and 270 degrees, and shows how the frequency response at the ear drum changes with incidence angle.

Technically, the HRTF is the entire set of ear drum response curves for all angles of incidence. Free-Field response is generally understood as only the ear drum respons for sound directly in front of you, and in an anechoic chamber.

Diffuse Field Response
This film does a really terrific job of showing you that if you add up all the response curves of sound coming at you from every direction, you end up with a new type of response curve called the Diffuse Field Response. Generally, this measurement is made in a very reverberant room that gets filled with sound causing it to approach the head from every direction at once. For a long time after this response was identified it was touted as the best target response curve for headphones; some models even had "Diffuse Field Equalized" printed on the headphone.

The problem is, it just doesn't work. Neither sound coming from directly in front of you in an anechoic chamber (Free-Field), or coming at you from all directions in a reverberant chamber (Diffuse Field), represent a good approximation of how you listen to music. This leads to the problem highlighted in the IEC spec, and the conundrum of subjective and objective measurement of headphone response yielding different results. I'll go on, but this would be a good time to watch the flick.

(Click here if you have trouble seeing the video.)

Now, Audio Precision is a class outfit; their gear is spectacular, and their technical publications outstanding. In this video they've done an excellent job of showing us exactly where headphone standards are today. Their conclusion? As the instructor rightly summarizes at 9:28 regarding the IEC headphone measurement specification:

"So basically what they're saying here is: Nobody can agree on what is good."

I'm motivated to post this video here today because I though it was a nice clear explanation of Free-Field and Diffuse Field equalization curves for readers just getting into this subject. But I'm also motivated to show InnerFidelity readers what a difficult position headphone engineers find themselves in when trying to determine how a headphone should measure, and how important it is that a new headphone target response curve be developed.

Regular readers will be well aware of the work Sean Olive, Todd Welti, and Elisabeth McMullin have been doing at Harman's research facility on just such a target response curve. Their basic premise is that headphones should sound like good speakers in a good listening room.

It just makes sense, doesn't it?

If you'd like to catch up on my previous postings on the subject, check out the articles here, here, and here.

Rillion's picture

One might naively think that headphones should be voiced to sound like two speakers in the traditional equilateral triangle arrangement since that is what most recordings are mixed on. However, without crossfeed effects, a headphone will not perfectly model the reduced treble of mono sounds resulting from two-speaker comb-filtering. On the other hand, headphones should be able to better reproduce a three-speaker/channel arrangement that eliminates the comb filtering with a "center" channel for vocals and solo instruments. The problem with this is that 3-channel (or more) music recordings are not that common.

Tyll Hertsens's picture
Your point is well taken. The differences between headphone and speaker reproduction is complex, but thoughtful adjustments to EQ that more effectively mimic speakers in a room is far better than FF and DF target EQs, I think.
sszorin's picture

To 'imitate speakers in a room' is, I think, a wrong approach. Headphones should represent instruments on a stage, they should represent how would the instruments sound in space if one was standing in the central position of a microphone for vocal/s.

AstralStorm's picture

In fact, using a simulated three channel model (AKA mid-side stereo) and then applying crossfeed as needed between the three channels gives a much better result than trying to model crossfeed between two speakers.
I bet this is because the brain is actually resynthesizing the center signal from stereo + diffuse reverberation, not just ITD and ILD, but I don't have a hard theory.

AstralStorm's picture

The result, given equalized headphones (to personal flat sound - which apparently is less variable than one would think) - this sounds very close to speakers in a highly dampened room. (Not anechoic chamber, the distance is simple to gauge.)

Rillion's picture

Hi AstralStorm,

I've experimented with that some also. If you are including a delay in only your side signals, then the comb-effect will be greatly reduced. This will not accurately represent a two-speaker setup, but it is arguably better for certain music. I notice it mainly adds clarity to male vocalists, at least with the music I listened carefully with.

I'm not sure the delays are translated properly with side channels generated by subtracting right from left: the phase is reversed on one channel and then mid-side gets mixed back together after the processing--it makes my brain hurt thinking about it. Better results might be obtained with a "center cut" algorithm based on phases, for example: . There is a LADSPA implementation of this which unfortunately has some slightly audible artifacts (echo or warbling). I have not had time to improve upon it myself.

AstralStorm's picture

The delays are actually translated correctly - nearly in phase signal is crossfed less.
Phase is also correct, in fact phase cancellation effects work more like on actual speakers in this setup - they depend on the amount of center summing.
You'll see what exactly happens once you write down the equations.
No modern codec even tries going down the full orthogonalization route as it is unnecessary and inaccurate.

In addition, the center channel should be slightly reduced in volume. In perfect speakers and anechoic room, the summing is 6 dB for 60°, generally real setups are closer to 3 dB.
The difference is "cone" soundscape as opposed to "triangle" soundscape - depends on the pan law used for the recording too - most use 3 dB.

What is more important is that then phase cancellation is incomplete, just like in actual speakers.

Rillion's picture

Seems I was mistaken when I looked at this before. It is easy to verify that the approach of applying crossfeed only to the side channels of a simple M-S separation (with right S reversed phase from left S) does give the right results in the limits of either mono signals or hard-panned signals. I still need to verify the intermediate cases, but suspect it will work out fine. Thanks for bringing it up!

Rillion's picture

What I meant by "reversed phase" is (right S) = -(left S) .

Rillion's picture

The M-S approach seems fine with delays turned off, but I'm still not sure about what the delays do to the signals. Looks like I have a bit more work to do before I really understand this ...

Rillion's picture

Hi AstralStorm,

I've examined applying crossfeed to the S channel enough to be convinced that it generally does a better job, at least for the most simple crossfeed implementations: the comb that normally affects mono sound is replaced by a very attenuated anti-comb in hard-panned sound. This ends up working better with the way my favorite headphones are voiced (conventional crossfeed darkens their sound too much).

Anyway, can you refer me to any good resources to learn more about the "cone" vs "triangle" soundscape? I already know a bit about the triangle soundscape in an anechoic room since that is what I currently simulate (overlaid with a Bruel & Kjaer room response that gradually boost the low frequencies).

Rillion's picture

Once the overall tonal balance is matched using shelf filters, crossfeeding just the S channel sounds very close to conventional crossfeed. You have to listen very carefully to perceive differences. My impression is that crossfeeding just the S channel improves the clarity of soloists and singers at the expense of making the band sound less dynamic in music that has a lot of hard-panned instruments. I guess it all comes back to how the music was mixed in the first place (as many people have said in this thread, there seems to be no standards for this). I don't want to sound too critical of the S-channel crossfeed approach because these listening tests are very time-consuming so my conclusions may be biased by a small sample size.

castleofargh's picture

if we agree to go with flat speakers as reference sound, shouldn't we start by trying to get a similar signature while using some matter of crossfeed?
I don't really see the point of trying to get the "same sound" when we know from the start that there is nothing realistic in the way the sound is brought to us in headphones.
I don't know if we should keep expecting the sound engineers to think about headphones while mixing the albums and pray for some standardization of this process. or push for crossfeed as a default feature on headphone amps. but I really believe that sounding real(no 100% in only one ear nonsense)is imperative if we ever want headphones to sound flat.
nobody looks at the picture of a badly done 3D rendering and think "oh those colors are really well balanced".

Rillion's picture

Certainly aiming to reproduce speakers in a room is better than the status quo. I'm not sure what the best target is, but I do know that a headphone EQ that sounds right paired with a fairly realistic crossfeed will sound overly-bright without realistic crossfeed, at least with some music.

Seth195208's picture

..poses one of the most interesting(And seemingly paradoxical) scientific questions in all of audio. Makes my head spin when I try to make sense of all the variables.

Tyll Hertsens's picture
And that's why I continue to bring up the subject: it's quite confusing for many at first. Takes a while for it all to soak in. Still learning here.
Seth195208's picture

..inherently incorrect by bypassing each individual's own unique HRTF (outer ear, head and upper body) processing mechanisms that regular in room audio naturally and fully accounts for? Is a generic approximation really good enough? Really?

Rillion's picture

That is a question that deserves more study -- perhaps there are already studies published on it.

I don't think perfect reproduction of individual HRTFs is necessary for most people since the human brain can adapt to different acoustic environments. However, there are certain things that are very hard to adapt to, such as hard-panned bass instruments. How close you need to reproduce individual HRTFs for a comfortable listening experience is an interesting question and probably varies from person-to-person and the type of music.

Also, external speakers have the complicated issue of room treatment which can have a huge impact on the frequency response. There are professionals that make a living on acoustic room treatment.

Tyll Hertsens's picture
I wouldn't say "incorrect", but I would say unnatural.

One problem is that once you start going beyond relatively simple, but probably not complete, compensations, you have to start using complex DSP algorithms. Many, my self included, will likely prefer simple approximations that don't degrade the front end analog signal too much.

Mind you, I'm all for DSP solutions, and I think there will be plenty in the future, but I don't think they'll easily rise to the level if resolution and finesse that audiophiles strive for. Botton line: I think there room to develop along both lines.

Seth195208's picture

.. larger ears and larger canals(Let alone the shape of those things) compared to smaller ones, must have a profound effect on timing, comb filtering, resonances and frequency response, especially at, and above the 2.5 to 3.5 peak. There is also no reliable way to test for an individual's perceived "accuracy" at these frequencies other than asking the individual weather it sounds "subjectively" accurate or not. This is where the objective science of HRTF starts breaking down.

ultrabike's picture

Your article makes all of this very accessible to me.

"Regular readers will be well aware of the work Sean Olive, Todd Welti, and Elisabeth McMullin have been doing at Harman's research facility on just such a target response curve. Their basic premise is that headphones should sound like good speakers in a good listening room.

It just makes sense, doesn't it?"

IMO, to some extent. I honestly don't know what speakers where used to develop the standard free-field and diffuse-field target curves. But I bet they were not bad speakers. I feel that saying that headphones should sound like good speakers, in a "good" room or an anechoic chamber, doesn't completely address the problem.

One could say the problem is the use of an anechoic chamber, and a "good" room should solve the problem. But what is a "good" room? Furthermore, "good" speakers in a "good" listening room at 0 degrees (free-field like)? maybe 30 degrees? Bunch of degrees mashed up together (diffuse-field like)?

What qualifies a speaker as "good" in a "good" listening? +/-3 dB flat frequency response at 0 degrees, 1 meter, anechoic chamber? How about the off axis frequency roll off? How much absorption does the "good" room provides? How about room modes? What is the "good" coloration that the "good" speakers + "good" room should have? Will this overlap such coloration to ANY recording's coloration (with it's random mastering effects) in a "good" way?

Speakers are usually evaluated in anechoic chambers. Best guess is that relative performance evaluation would be fairly difficult if every other speaker is measured in a different "good" room. Yet the fact remains that most people do not listen in anechoic chambers. What then should then be the optimal interaction between the room and the speaker? How much energy should be reflected? What should a reference speaker frequency/angle polar response be?

Maybe something that reproduces "well" recorded music realistically and close to a "well" set up live performance will do fine... probably would have to listen to a good live performance and compare it to what we get from a good speaker in a good room.

Question: Are all of these compensations relative to sound impinging the head at 180 degrees? That is, does [x]-field compensation takes the frequency response difference between the x-degree response and the 180 degree response?

EDIT: Based on this LINKY it is not referenced to 180 degrees and applicable only if using an ear or head to measure. Remove the ear or head from the equation while maintaining proper acoustic impedance and this comp deal may be less necessary.

Tyll Hertsens's picture
Great comments, but I think I'd have to write a whole article to answer it properly.

A speaker that measures flat in an anechoic chamber will measure slightly bass heavy in a good listening room due to the boundary effect of the walls and wider dispersion of low frequencies resulting in more sound power in the lows in the room. So, good speakers in a good room naturally sound a little bass heavy and people are accustom to that.

Now, a good room is a little tough to define, but there does exist an IEC spec for listening rooms. The problem is that most studio acoustic designers don't really follow the spec, they've spent years developing their own brand of "special sauce" and as a result, for the most part, recording studio speaker response varies to some degree. Well, as soon as the studio mixes an album on a non-calibrated system you have no idea where flat really is.

So, I would say that the quick and dirty way to develop a headphone target response would be to put flat measuring speakers in an IEC standard listening room, and then measure the response using an IEC standard measurement head...possibly taking a few measurements with the head a few degrees of axis to either side and then averaging the measurements.

Problem is, that's just a guess on my part, and is exactly why the folks at Harman are approaching it so methodically and essentially using listener satisfaction as the controlling factor. They do make comments in the papers that they think the headphone target response is likely strongly related to the sound of speakers in the room, and they did make some assumptions along those lines when developing a variety of headphone responses to subjectively test, but they let the subjective response data drive the results and not their assumptions.

For most people wanting to delve into this topic further the best place to start id Floyd Toole's book "Sound Reproduction: The Acoustics and Psychoacoustics of Loudspeakers and Rooms". It doesn't talk about headphones, but it does go into great detail about how they untangled the problem of speakers being flat in an anechoic chamber and being bass heavy in a room, and how we humans, amazing as we are, somehow perceive the flat speaker in a room that adds some bass as being appropriate. (That's an almost ridiculous oversimplification of the material in the book, though.) He also goes into detail about what he calls "the circle of confusion" where if there are no standards followed for correct in-studio acoustic response, then everyone will continue to chase their tail trying to find neutral. The point is that the problems with headphones today is just an extension of the problems seen in the past with speakers...and those problems aren't even resolved fully yet in terms of implementation of what's been learned.

As a side note: Sean Olive worked with Floyd Toole when the work was done at Canada's NRC decades ago. I see it as one of the reasons he's motivated to do the work he's doing because the headphone problem is so strongly related to the idea of finding some sort of meaningful "neutral" and then designing gear around those standards. Sean also happens to be the current President of the Audio Engineering Society, and it feels very reassuring to know the group will consider his work very seriously.

BTW, Floyd's book is also filled with great information about reflections and reverberance and when it's important, and all sorts of cool stuff. I HIGHLY recommend it to audio enthusiasts as a fascinating read.

ultrabike's picture
Thanks Tyll, really appreciate your comments.
jagwap's picture

I like this research into the natural sound of loudspeakers in a room for headphones. It is an interesting goal. However much of the reason a flat response speaker in a room is perceived to be flat in room is because of the brain filtering down the reflected sounds against the early arrival directly from the speaker. So if a transducer has added bass to resemble the sound of an in room speaker, this additional bass should be delayed a little, or it will just sound like a bass lift, like these "room feel" headphones tend to have.

Music happens in the time domain. The frequency domain is a useful mathematical transform to allow us to understand the energy in each (1/3rd) octave to give balance when we do not have "DC to light" bandwidth.

DaveinSM's picture

very cool, thanks for sharing this! This is the best headphone review site on the internet that I have been able to find.

Seth195208's picture

Weather it be good, bad, right, wrong, ugly or beautiful, the buck stops with your own individual HRTF. If it is bypassed, in part, or whole, you won't be able to hear good, bad, right, wrong, ugly or beautiful the way your own brain has always expected to hear it. And getting good measurements above 6 kHz is somewhat analogous to the Heisenberg uncertainty principal, where the act of measurement(By having tiny microphones placed inside the ear canal on real human subjects with real living skin, comparable to an elephant in the living room) will grossly alter and invalidate the measurements.

Tyll Hertsens's picture
On the other hand our brains are amazingly adaptive, and are powerfully capable of making adjustments to the ways we perceive. So there's hope.

And along your line of thinking, I've often tried to dream up a way of using the headphone driver to "ping" the ear and receive a reflection from which it could map the shape of the ear to adjust perimeters. Maybe we'd need a bunch of lasers!? Mua ha ha ha ha! :)

Seth195208's picture

Our brains are so amazingly adaptive, that listening to high quality audio is totally moot, because our brain will fill in the gaps. Waaaah!

Tyll Hertsens's picture


Rillion's picture

Tyll mentioned using laser to map the ear. Well someone is already working on it:

zobel's picture

Not the magazine, but the goal of any sound reproducing system. Remember the mag?
What are we trying to reproduce? The sound of live music? Acoustic instruments or voices in a defined space? Electronic instruments in a venue? Computer generated sounds with recorded samples? If a pipe organ or a grand piano can be recorded and played back through loudspeakers in a home setting and you would swear there is a big organ or a piano in your house, you have a system that has high fidelity. That same system should be equally good at reproducing whatever the recording, mixing, and mastering has captured in most cases. A well recorded and mastered performance should sound like it did to the engineers at the studio.

As you know, recordings are most often monitored, mixed, and mastered using near-field speakers with a sub, and headphones. Unless we can experience the sound of those monitors ourselves, we don't know how close to the "original" our system is, since in almost all recorded music is realized in the studio from multitrack recordings. Even live recordings have usually been "doctored up" in the studio. If the goal of the recording engineers is to produce a recording that will sound good on your home stereo speakers, then good headphones should also sound like good speakers in a good room. If the goal of the recording is to sound good on headphones, then those same high fidelity headphones should sound good with that recording. If the recording is made to sound loud in cars,on the radio, or over cheapo systems, never mind, high fidelity in that case is not needed, and can actually be a bad thing.

But I ramble. Headphones by their nature are not a natural way to experience sound. That doesn't mean they are any less "high fidelity" than loudspeakers. Nobody listens to music in an anechoic chamber, that is a measuring tool, and lousy for music. As soon as you clamp those speakers to your head, any measurement taken in the anechoic chamber of those speakers is totally irrelevant to how they would sound then. Headphones are a much more complicated transducing system than loudspeakers, since your ears and head become an actual physical functioning part of the sound we hear with them.

So whats the rub with measuring the frequency response of headphones? It's coming up with a better system of measuring them.
In cases where no anechoic chamber is available to measure loudspeakers, other techniques have been used to eliminate the room's effect on measured response. Nearfield, FFT, and quiet, open outdoor techniques have proven successful. What if we took the human ear model (room) out of the process of measuring headphones?
Could there be a system to measure only the acoustic output of the driver / enclosure assembly? Would those graphs be more linear, and closer to perceived sound? Measuring the low end of headphones seems to work through a fake ear, but not acceptably for the upper frequencies. If all those higher frequency resonances and comb filtering effects actually exist at the ear drum, it doesn't matter, since they don't exist in the brain. It's like trying to go inside an eyeball with a camera to measure what the brain is seeing. Vision tests are subjective, and hearing tests are subjective, and ultimately, headphone tests are subjective. We someday will be able to accurately quantify the frequency response of headphones, but only when those numbers are verified by our hearing. I find other measurements to be helpful, especially THD charts, waterfall charts, square wave charts, and efficiency/ impedance numbers. I applaud Tyll for keeping the faith in objective measurements, and his interest in improving them. He is on an interesting mission!

Seth195208's picture


AstralStorm's picture

This is why I've designed some of my own DSPs (crossfeed, equalizer) using accurate recordings, evaluated after witnessing the performance.

Jazz Casual's picture

"No known objective method produces a flat frequency response characteristic from an earphone which is subjectively judged to produce wide band uncolored reproduction." "So basically what they're saying here is: Nobody can agree on what is good." The subjectivists nod approvingly.

BrightCandle's picture

For some reason I have a lot of issues with surround sound and current average HRTF's. With a variety of recommend headphones I struggle to pick up the proper sound queues given to me by surround sound in games. I can tell the differences between left and right, but not the difference between front and back and very little in between. I must be quite a long way from the average and it causes me a lot of problems in competitive gaming. Aureal used to have an implementation that worked fairly well for me, but since they went bust I haven't found a sound card that works well enough.

In many ways I don't care if the sound is coloured, it might be bad for music but ultimately my personal interest is in good positioning information from games. I only started to get interested in headphones and HRTF's because they didn't work for me as they should. Having tried all the sound cards out there I have found SBX is better than DH or Realtek but its still not very good. I am wondering how we can go about improving this dramatically so that I can have the HRTF's I need to hear surround properly. I think we need an industry that measures peoples HRTF and then provides a plugin for a sound API so it gets applied. Combined with environmental beaming in said API we might finally get some reasonable sound from a PC that matches the listener.

As to music I do wonder if what we want is to hear music as if it was coming from in front of us from speakers, or whether we should hear it as something different entirely. Its not clear to me that any one particular approach can be best in this regard. For my positioning in gaming its obvious that perfect positioning is possible, all be it perhaps degrading sound quality. But for music where how it should be projected and what adjustments should be made can only be applied to some reference comparison, but its capable of reproducing a lot of different listening environments because of covering both ears. Its for that reason its not clear that there can ever be one standard, equally with ears being different its not like there is one HRTF, there are over 6 billion of them and these two points mean no headphone can be the best.

RPGWiZaRD's picture

I'm also a bit "weirdo"-case when it comes to positioning. I've got both ASUS Xonar STX II and a Creative SoundBlaster ZxR (although I plan to sell them soon) and also tested a Titanium HD in the past (as well as a Xonar DG2).

Neither DH or SBX satisfies me in surround experience. DH sounds a bit too bassy and muffled for my tastes (although with a precise counter-EQ adjustment it sounds better but still not ideal as far as SQ goes but it varies from game to game, Skyrim works well with DH for some reason) and there's zero cues regarding height, it works "okay" on a 360 degree.

SBX again I thought sounded too "closed-in" with not always that clear positioning, it was a bit too "close"-sounding to me. But I don't believe it's the algorithm itself which is the problem but the Z-series way of handling speaker config which seems to bring a more "in-your-head" experience (at least for me). No matter how I configured it (5.1/7.1 or stereo) it just couldn't provide me that out of the head experience I was looking for.

Well until... enter Realtek onboard. Quite a few years ago I discovered some nicely working speaker config that provided me very realistic out-of-the-head experience, it was at the time I finally switched from using the third party kX Audio drivers for Audigy 2 ZS (which made me hold on to that card a long time) when I noticed the onboard audio surprised me on the P55 Gigabyte board with some ALC889A chip config. I noticed with 5.1 speakers and with certain speaker type boxes ticked the sound work suprisingly well in terms of HRTF for my ears/brain at least. If you tick those boxes uncorrectly then you will have a broken sound where front and rear sounds will be missing and you only get sounds directly from your side in games. If you just tick the "front speakers" settings "Full-range speakers" and "Surround speakers" settings and leaving all the rest boxes unticked you are presented with a very natural out-of-your-head experience in most games with a suprisingly extensive range. Furthermore you can reduce the "front" volume slider in levels tab from 100 to 97 (seems ideal for me personally) to improve the soundstaging from being more centered and in-your-face to a little bit more spacious and laid-back or it like leaves less focus around center channel and slightly more towards the sides leaving an impression of a wider stage. Especially Far Cry 3 I've lately gotten into (yea I'm late I know) I really hear the directions of incoming cars so clearly or during gunfights I really very accurately and easily pinpoint the direction of the enemies, it's almost like a cheat it feels like because when testing STX II recently for example I was confused during a gunfight and couldn't clearly tell from which directions the sounds came from. But the most impressive feat is how well it presents "range", it feels very naturally large and "free-roaming". The other solutions like DH and SBX sounds closed-in with "limited space" of a soundstage while the Realtek sounds very open in comparison and sounds great SQ-wise since no additional processing is involved.

I'm using a ASRock Extreme6 Z87 board with Realtek ALC1150 with onboard amps 2x TI NE5532 which sounds suprisingly great in the lows and highs (a bit analytical sounding perhaps but suprisingly great detail/resolution for onboard). Can't speak for other brands except an ASUS Gene VI which sounded clearly worse (lack of highs, too smooth and way exaggerated bass (more so than ZxR even) but it had a pleasant midrange though).

RPGWiZaRD's picture

+ it also gives an advantage in soundstaging for stereo music listening which I love. :P

jgazal's picture

Tyll, thank for sharing your findings.

Regarding idiosyncratic HRTF acquisition without acoustical measurements, see this: Imagine then a realizer without microphones to acquire the idiosyncratic HRTF. You enter a movie theater, your torso is scanned and you have 3D audio in your seat with headphones instantly.

I do not understand why they use the expression HRTF when they are actually talking about a specific part of the HRTF. When the analysis is focused in frequency response at the ear canal, it can only be related to “interaural level difference - ILD”. But HRTF also means “interaural time difference - ITD”. This difference in terminology seems important to determine which part of the HRTF is more idiosyncratic.

I have not seen yet a paper that proves individuals of a certain population can be grouped in a reasonable number of groups with similar HRTF’s (similar group meaning an standard HRTF works as good as a Realiser acquired idiosyncratic HRTF).

I was imagining a paper with statistical cluster analysis of groups of individuals sharing similar HRTF in a population. Such paper could divide the analysis in ILD and ITD. If such paper proves that ILD can be clustered and ITM cannot be clustered, then I think they can use the term HRTF meaning ILD.

I do not understand why a designer must equalize a headphone according to Kemar mannequin HRTF, which is also supposed to have an idiosyncratic ILD, since as I said before, no paper that I have found states that ILD are not idiosyncratic or that can be clustered.

If you find a paper identifying one or more groups of similar with ILDs, ITDs or HRTFs, please share with us.

tm's picture

The REAL problem is that listening to music on loudspeakers in a room (stereo or otherwise) is fundamentally flawed due to "acoustic crosstalk" between the channels. Headphones don't have that problem; they have other problems, but not that one. But, unlike the speaker problem, the headphone problems can be fixed with individual measurement and DSP. But, as a large body of recorded music exists that was tailored to make that fundamentally flawed "speakers in a room" system sound good (hard panning on Beatles albums, for example), headphone and headphone amp designers have no choice but to compromise their designs.

The use of headphones and IEMs has certainly increased in recent years, but I don't think we'll ever get away from the "mix it for speakers" paradigm. So, unless the recording companies start producing alternate material exclusively for headphone listening, it's going to be a compromise.

That said, using an HRTF based system (even one that only approximates your own), along with head tracking, and the subtle use of bass shakers, makes a HUGE difference. The HRTF and headphone frequency response inaccuracies can be largely eliminated with EQ.

But, casual listeners (where the money is) don't give a rats ass about accuracy, so I don't think an HRTF based system will ever become a standard – it’s just too much of a hassle. Even wading through a library of existing HRTFs, or using a system that allows you to create one close to yours, is a pain. So, in a sense, it doesn’t really matter what “standard” the headphone manufacturers use, as long as they all do the same thing. Those of us who care will have to “fix it” anyway to make the available recorded music sound right in headphones.

Now you can all jump all over me :^) Ready, set, go!

Rillion's picture

I think your statements summarize the situation well. I am pretty obsessive about pursuing a "natural" sound. Currently it is either very expensive or very time consuming to get it right and most people can't or do not wish to invest that much. Still, there is technology out there that could make it much more convenient to achieve these goals if it were properly packaged. I am hopeful that the price/convenience barriers will eventually come down so that the less "obsessed" population will be able to enjoy better sound.

tm's picture

Being an engineer, I've learned never to say never, but I doubt that measuring HRTFs will ever be quick and easy. Without reasonably accurate HRTFs, the user won’t hear the benefits of the “new scheme”. But, I don’t think that’s even the biggest hurdle. I remember several years ago when I let my sister-in-law listen to a Dolby Headphone converter I had purchased and she just didn’t understand the concept of getting the sound out of her head. Now, as a one size fits all solution, maybe it didn’t work for her, but beyond that, she just didn’t understand what it was fixing. This to me is the main reason an HRTF based system will never become mainstream – it’s just too damn hard to explain all of this to the casual listener. I believe that without the visual cues (being there / seeing musicians playing), many folks just can’t perceive 3D sound. Hell, 3D TV is a relatively easy thing to understand / perceive, and the hardware to pull it off is simple for the user to deal with (no individual, intrusive measurements), but it’s been a colossal failure.

I believe the less obsessed population is perfectly happy with the state of headphones. To them, it ain't broke.

Inks's picture

While there are still many questions to be asked, measurements are still important and still used by the best manufacturers out there. They don't portray the whole picture and have some grey areas but are overall still important for reference.

At least we do know:
High distortion is simply not a good thing to have. While the extent of what it takes for it to be audible is somewhat debatable the 1% is a good point of reference. With this, driver matching is also of high importance as it portrays the quality control of the manufacturer, though with measurements it becomes a matter of having the quantity to come to conclusions. Though from my experience, it doesn't take that many units to know who is doing the serious quality control.

Good headphones are able to cover from 20hz-20khz, a drop before this range is simply not good as it results in missing information.

3k peak to compensate for the ear's resonance has to be taken into account. A electrically flat measured headphone sounds distant in the midrange, while the extent of how much of this compensation is to be made is unclear, it is still important to take this information into account.

Well regarded headphones follow these guidelines very well.

AstralStorm's picture

Sorry, this is actually original research, not published yet. (It should be published hopefully around the next year.)
The difference is in the distance of the center. Reducing center boost makes for the cone approximation - you actually get a rhomboid shape and not a true cone with the simple ITD and ILD models.

I have even more tricks up my sleeve now with a more general filter for the crossfeed, rather than the lowpass, and more general spatio-frequency ITD and ILD curves rather than the simple flat approximation - in the three channel case. (Two channel has some literature, you can google that up.)

I'm also working on a harder to implement, but more powerful, head and torso models, specifically, three ellipse models, which are hard but possible to evaluate analytically without finite element method.