Is neutral sound possible in a recording? A mastering engineer weighs in.

An introduction to our newest contributor here at InnerFidelity – Mr. Rob LoVerde.

For those unfamiliar with LoVerde and his exploits, he began his career in the music business in 1999 at The Hit Factory in New York City, where he worked as a mastering assistant on many chart-topping hits of the day.

In 2003, he started working in the mastering and production department at Sony Music Studios, also in New York City, cutting records for a number of reissue record labels. 

He has been working as a mastering engineer at Mobile Fidelity Sound Lab in Sebastopol, California since 2007. This is what LoVerde describes as “the dream job I always wanted.”

A small sampling of the titles he’s mastered are:

  • BECK Sea Change
  • THE CARS Shake It Up
  • YES The Yes Album
  • THE BEACH BOYS Pet Sounds
  • BILLY JOEL The Stranger
  • FRANK SINATRA A Swingin' Affair!
  • GRATEFUL DEAD Workingman's Dead
  • ARETHA FRANKLIN Aretha's Gold
  • LOVE Forever Changes
  • PRETENDERS Pretenders
  • SUPERTRAMP Breakfast In America
  • JAMES GANG Rides Again
  • DEREK AND THE DOMINOS Layla And Other Assorted Love Songs
  • ELVIS COSTELLO WITH BURT BACHARACH Painted From Memory

LoVerde and I had been discussing the idea of what neutral sound means in both recording and playback and how many audiophiles and music lovers seem to labour under existing concepts of 1), what they think it is, and 2), that it is something that actually exists.

So I asked him – a mastering engineer – for his take on the subject.

By Rob LoVerde

It's touching to witness the quest of the music collecting audiophile. Their goal post often seems to be stuck in the ground right next to the ultimate version of their favourite album. The idea that one can obtain, possess, listen to and treasure musical software that most faithfully, precisely and accurately represents what the artist who recorded it intended you to hear is a most attractive one.  It also seems to be borne of utmost respect for music itself.

But, is this actually possible? This is the question that keeps these same collectors from a good night's sleep.  Certainly, one can safely assume that any musical artist who recorded their work for public consumption wished for their listeners the experience of hearing exactly what they themselves heard, approved and released.  It's almost (but not quite) as safe to assume that the equipment used to record, mix, master and manufacture the software that aims to reproduce this listening experience was designed with a mind's eye directed at providing a neutral conduit between artist and listener.

But, if this is true, then why does no consensus ever reveal itself regarding what something is supposed to sound like?  Why are there countless versions of some of music's most popular recordings and no universal agreement on which is the best?

As a mastering engineer at Mobile Fidelity Sound Lab, I can say that every single discussion and decision we have and make here revolves around the aforementioned quest of the music collecting audiophile.  With each and every project, each and every day, we strive to bring to the public the ultimate version of the album we're working on. To produce a product that truly represents, as closely as humanly possible, the artist's original intent and vision. It's a privileged position to be in. We get to create the version of the album we want in our collections.

Any advancement in technology that offers an opportunity to get closer to this vision, whether it be incremental or revolutionary, must be investigated, tested and if found to be a true upgrade, embraced and implemented into our mastering process. 

In the digital domain, we've taken a major leap forward recently with what is called DSD256 (or 4X Direct Stream Digital). 

Since the last couple of years of the 20th century, MFSL had been using DSD64 (or 1X DSD) technology to capture the analog signal of a given master tape for all of our Hybrid SACD releases (and some Gold CD releases, as well). It is called DSD64 because its sampling rate is 64 times that of standard CD technology. This was a real breakthrough in its day, impressing all MFSL staff at the time it took its place as an MFSL mainstay. 

It was a free lunch, giving with both hands, improving our sonic results in every way. We even gave this new technology a name, as applied in our mastering chain: The GAIN 2 System.

In 2014, we discovered a furthering of DSD: DSD256. 

DSD256 is (as you might surmise) 256 times the sampling rate of standard CD technology and four times the sampling rate of standard DSD. With an incredibly high sampling frequency of 11.2 MegaHertz, it probably matches, or even exceeds, the resolution of an analog master tape. But, of course, we can't simply play by the numbers. Listening tests are king.

After countless hours, many early mornings and late nights and A/B trials galore, we determined that DSD256 technology needed to be utilized going forward. It is actually difficult to tell the original master tape and the DSD256 capture apart. The window through which the listener views the artist's creation has been made more transparent than ever. 

For all of these reasons, we now use DSD256 exclusively for all of our analog-to-digital transfers and The GAIN 2 System is now The GAIN HD System. For our Hybrid SACD consumers, I only wish that there was an optical-disc format that supported DSD256. 

For now, we must downsample this ultra-high resolution capture to DSD64 for the SACD layer and 16-bit/44.1 kiloHertz for the CD layer of our digital disc products. The good news is that the resultant audio displays no sonic "toll" taken in the downsampling process thanks to a sampling-frequency converter made by some very intelligent designers. And the downsamples actually sound better, more faithful to the source, than a straight analog-to-DSD64 or 16-bit/44.1 kiloHertz capture as well. Once again, a free lunch gifted with both hands.

But still, what of this whole idea of witnessing, in a completely neutral fashion, the artist's musical rendering?  The reality is that this is a fantasy. 

Compelling in theory, impossible in practice. 

The truth is that everything in between you and the music serves to cast a shade on the proceedings. 

The goal is to find that which produces the fewest obstructions and allows the greatest purity possible. 

But, nothing is neutral. 

Nothing. 

You, the listener, are at the mercy of everything involved, from which source was used to master the music to the gear that was chosen to make the transfer.

For instance, which tape machine was utilized?  Which reproducing amplifier is connected to that tape machine? Which interconnects tie these two things together? How are these machines calibrated? What did the mastering engineer do to the sound? Was equalization applied?  How about compression? 

What did he or she think the music should sound like? 

And, I haven't even mentioned your sound system, your room, your headphones, or your ears.  How is all of that contouring the sound? With all of this considered, it's fair to say that the closest you'll get is a general approximation of the intended sonic picture.

In recent years among audiophiles, it seems that the search for "accuracy" has superseded the search for pleasure: The very thing that got us all into this in the first place. 

I'd like to think that if your listening experiences make you want to have more of them, if the sound you are hearing on your system is making you love the music, you are probably at the destination the artist wanted you to arrive at after all.

COMMENTS
Kalman Rubinson's picture

All very good but, without any reference to the role of the speakers used for monitoring, there is a large impediment to realizing that neutrality in playback.

Simply Nobody's picture

Also, what headphones (IEMs) were used for monitoring (if they were used at all), is not mentioned ......... This is a headphone (IEM) website ...........

Kalman Rubinson's picture

Ah, yes, I will acknowledge that this is a "headphone (IEM) website but the issues are the same since whatever device the monitoring is done on will influence the result.

Of course, nothing I can think of will allow us to compensate or hear through the engineer's personal preference but I'd be happy just to be sure I am hearing that.

Rafe Arnott's picture
More than a website just about headphones

It's a place to talk about what it is we are listening to through them as well.

Otherwise what's the point of even having them on our heads or using them?

Having a mastering engineer discuss part of the process of the analog and digital mastering chain is a rare, insightful treat for anyone who claims to be into music.

Also, of all the mastering engineers I've met or known over the years, none use IEMs for mastering.

Perhaps if a recording was being done specifically for IEMs?

Grover Neville's picture

Tried it once. Possible? Yes. Recommended? Not so much. : )

Beagle's picture

And our playback gear is the only thing we can attempt to make "neutral". What's played back on it is whatever it is. You can't make it "neutral" if it is not. Unless you want to EQ every recording. Most have better things to do.

Simply Nobody's picture

It is great to have some recording engineers contribute to the site and share their thoughts .........

roskodan's picture

Was this a weak attempt at discrediting 'neutral'? Doesn't exist? But no one is mentioning reproduction? The 'record' is the 'neutral', the final product.

Reproduction is where 'neutral' comes into play. As in building a system, a chain, that would show the record for what it is. Then we can talk about balancing neutral to one's particular preference in sound signature.

With headphones 'neutral' is a big thing since the listening experience is basically near field monitoring^3.

Records that are made as classic 2 channel mixes, when listening with headphones, put the listener in place where the mics are. All those mics directly plugged into the ears. With the ears acting as really bad, nonlinear, mics themselves.

So in headphones, a 'neutral' response will obviously be based on a compensation curve to reflect all of those 'bad' interactions.

'Neutral' exists, and yes, neutral makes a lot of sense with headphones and audio reproduction in general.

Solarophile's picture

The problem here is that the author does not set out to talk about what the term "neutral" is being referred to here. I do not understand why DSD64 or DSD256 is even relevant.

Of course there is such a thing as "neutral" when it comes to headphones!

badboygolf16v's picture

*Flame edited. Account suspended.

Mrsnikoph78's picture

Basically roskodan's comment. I really would like to hear from more recording engineers, because this article is odd and a little out of focus - clearly some "headphiles" and other audio enthusiasts are interested in "neutral" reproduction of sound, as from a recording. There are very clear objectives placed by some manufacturers on minimizing the distortion of their products (e.g. the non-recorded kind of distortion), and adhering to some kind of a target response for a target audience at a target price point. I like the ones that provide the best value in "neutral" reproduction. If I am seeking that out does that make me obsessed with accuracy at the cost of enjoyment? No-quite the opposite. Am I seeking out the "flavor neutral"? What is that? I rock a range of mostly inexpensive headphones, speakers of different kinds, and even a little Bose soundlink color, and that thing rocks for what it is. Music is a huge part of my life and I keep investing in both recordings and playback equipment. Sure, I can get slightly different "flavors" from my headphones here and there, but those "flavors" usually end up getting corrected with EQ or prevent me from enjoying my music to the greatest extent.

In the multi-billion business that headphones have become, I'm shocked to think that fidelity isn't a top concern - or that it comes only with a massive price tag. How can there be such a sea of gear out there, presumably made to no reference standard? How can it be that a mere cable is ruining my experience? I think the scarier reality being denied is that most of audio is heavily commodified, and niche products have a hard time justifying themselves. Especially when they don't perform. Imagine how shocked you might feel to hear that the quality of my ipad jack or laptop soundcard is so good I'm questioning why I might ever want another amp/DAC stack. My next receiver at home will be "Class D", not because I think it is sonically better per se but because it uses less energy. Efficiency matters too.

By the way, Beck and Yes are both beautiful recordings - I've spent plenty of time with them. I have no illusions that the recording and mastering process is "messy", "imperfect" and that the "system chain" I playback on may or may not be maximizing my access to the best the medium has to offer. But a whole lot of recordings are generally pretty terrible and slight the artists intent, IMO. Because I sure wouldn't have signed off on a CD like Metallica's Death Magnetic. What does it cost to record a high quality album these days? What sort of equipment is involved? How bad is it that the iTunes store and Amazon have made the "single track" Mp3 the most ubiquitous format? Are Albums dead? Are more and more artists just doing their own recordings? Such information would be much more useful!

maelob's picture

Love the last two paragraphs, especially coming from a mastering engineer of audiophile recordings. Great article that put things into context.

Kalman Rubinson's picture

Indeed, the best we can do at this time is what you say: "Reproduction is where 'neutral' comes into play. As in building a system, a chain, that would show the record for what it is."

Simply Nobody's picture

On the loudspeakers side, the reproduction chain is lot longer and could have many weak links ......... On the headphone side, the reproduction chain is lot shorter and could have less number of weak links :-) ..........

Kalman Rubinson's picture

(My bias against headphone listening is out of bounds here, so my reply is brief.)

No question about that if the studio monitoring is done with headphones and, ideally, with known models.

zobel's picture

You make good points here in trying to nail down what those terms mean. As you stated Rob, nothing is written in stone about what fidelity means, but also, as you point out, advances are being made in ADCs and DACs, and transducers. We, in this pursuit of realism in sound reproduction have also been upgrading our systems

What we find useful as consumers is identical to what you find as a producer of recordings. In a nutshell, best resolving power & clarity, detail, imaging, dynamic impact, accuracy in SPL/frequency, lowest noise & distortion, most realism...everything you shoot for as well.
I like that you stated that we don't listen to numbers. Those measurements are design tools, and only parameters we hope to quantify with our ears, and as it turns out, often do. Quantify is not the same as qualify, when the final effect of the differences we hear has so many variables involved, as you so nicely point out.

Thank you for the article! This is all about having fun making music come alive off of our recordings, and as you said, if the pudding makes you hungry for more, therein lies the proof.

Robin Landseadel's picture

Speaking as an ex-recording engineer, I concur with everything you said. To quote the Firesign Theater "You can't get there from here."

Even if you're looking for the same old place.

Ortofan's picture

... neutral/accurate as possible and then the playback system can be configured to achieve a sound quality as pleasant as the listener desires? Or should the recording be made to sound as pleasant as possible when played back on a neutral/accurate system? Is a neutral/accurate recording played back on an neutral/accurate system unlistenable? Is a pleasant sounding recording played back on a pleasant sounding system possibly too much of a good thing?

If a sound quality that gives pleasure is less than completely accurate, then how inaccurate should the recording be made? What deviations from accuracy are necessary to create a more pleasant sound quality? Is it a matter of altering the frequency response and/or adding certain types of distortions?

Finally, should the mastering engineer be the person making these sorts of decisions, or should it be left up to the artist?

jherbert's picture

Rob,

thank you for sharing those insights. Some comments though:

- I understand that you try to make the technology used in the mastering process as transparent as possible, so it is not in the way of the source material, hence the use of dsd and the move from 64 to 256 to 512. May I point out that this means you try to keep the signal path as NEUTRAL or may I say transparent as possible? It has to, if you want the signal to be as pure as possible.

- Mastering starts with digitizing, it does not end there. Considering the myriad of plugins that my or may not be used in production and mastering and their subtle or not so subtle effect in the final product these have a more impact on sound than dsd vs. pcm might ever have.

- What's needed in the process are tools to evaluate sound and the manipulation of sound to generate a pleasing experience for the listener. You are not writing about these tools, but I am sure you want them to be neutral and transparent as they can possibly be. Otherwise you would not be able to make an educated judgment while mixing and mastering.

- More of a sidenote: You talk about resolution of analog master tapes and how this might eventually, finally be approached by DSD512 (!). Makes me wonder how you define resolution, given the shortcomings of analogue technology. Even PCM 24/96 should be superior to almost anything of what we see in analog technology. This is not counting the limited frequency range of most microphones used in analog productions. As to noise and distortion: Nothing to write home about in the good old analog days.

KaiS's picture

Analog tape has an incredible resolution not easily met by digital.
It can record and playback signals way below it's residual noise and these signals are completely undistorted, and frequencies beyond 40 kHz.
Compare this to digital, where the threshold of the least significant bit defines the lowest possible level that can be recorded, furthermore only with extremely high nonlinear distortions.
For digital the workaround is using dither noise and noiseshaping, but practically that's not as effective as the theory goes, because you may not change the signal level in the 16bit digital domain then to keep its effect. But this is common practice today where all processing is done in the digital domain and not always in higher bit resolution.

16bit / 44.1 kHz (CD format) does not cover everything that is recorded on the best analog master tapes (1/2", 30ips), neither in (low level) dynamics nor in frequency range.
24bit / 96 kHz comes close.

BTW: I am not an Analog Guru, I prefer digital for all its other advantages, but it's necessary to know it's limits.

xnor's picture

That is simply not true. Digital systems surpass analog type in pretty much any measure, often by magnitudes.

"It can record and playback signals way below it's residual noise"
As can digital audio.

"Compare this to digital, where the threshold of the least significant bit defines the lowest possible level that can be recorded"
That's not true at all and easily refuted by DSD, a 1-bit digital format, that can achieve higher than 16-bit SNR in the audible range.

"furthermore only with extremely high nonlinear distortions."
That's not true either. Maybe this was the case 30 years ago, before dithering was used.
But DSD specifically (not digital audio in general!) does have its limitations in this regard.

There's a reason experts in the industry called DSD totally unsuitable for high-end applications.
PCM doesn't have such limitations, which is why it's recommended by the same people.

"dither noise and noiseshaping, but practically that's not as effective as the theory goes, because you may not change the signal level in the 16bit digital domain then to keep its effect"
Eh.. what?

An amplification process amplifies the whole signal, regardless if it comes from a tape or from a digital, including hiss and noise.

But this is actually where digital has another pro: processing can be done in floating point formats with practically unlimited dynamic range ... which is what happens in any DAW.

KaiS's picture

All I wrote is true, and your comment lets me think that you don't have practical experience as a recording engineer or developer using digital and top level analog.
Just looking on some isolated measurement figures does not help, specially if you did not do the measurements by yourself and don't know the circumstances how they were done.
Of course I could elaborate on the topic even more, but all this information is readily available in the Internet.

If you don't believe me you can test this by yourself:
Render a very low level sine wave 1 kHz using dither and noiseshaping to 16 bit.
Then change the level in the digital domain by 1/2 dB without redithering and you will find that all the distortion that was removed by dither and noiseshaping is there again.
If you did this we can talk again.

I just want to say again: I am not a Analog Guru and I prefer digital not only because it's easier to work on but because of all the faults that analog tape has, like drop outs, modulation noise, wow and flutter, speed offset, frequency response and interchannel phase deviations, and distortion - all multiplied by copy generation losses.

I personally am very sensitive to wow and flutter problems, and even the best tape recorders do not satisfy my demands here, on some music styles I (as a producer) simply can't stand it.

If you go high bit, high samplerate digital, it beats every analog tape. But if you come down to 16bit/44.1kHz CD format there is no way to preserve the complete sound of a top level analog master tape.
But still, CD (and it's streaming equivalents) are good sounding formats, much better than all the MP3s that are circling around.

xnor's picture

But it isn't, as I have pointed out.

As for not having experience, I've developed DSPs... but it's not about credentials, it's simply about facts.

As for your confusion about dithering: of course you have to re-dither again if you re-quantize, exactly as described by theory.

And again: DSP pipelines are not only floating point in DAWs but even in audio players. You can have tens if not hundreds of effects with virtually unlimited dynamic range (and a SNR beyond good and evil).

I don't need to do this experiment as I know what the result will be.
Here's a simple question for you though (you shouldn't have to do any experiments for this, it's a really simple calculation): given some RMS value of the quantization noise floor, by how many dB will the noise floor rise due to re-quantization after you've boosted the signal by 2x?

Now what do you think will happen when you re-record an analog recording boosted by 2x? (Let's ignore the plethora of linear and non-linear distortions that will be added in the process and just look at the noise floor again.)

zobel's picture

When we speak of the numbers above in terms of sound quality, it occurs to me that within CD format, there are many ways to do it.

Some have observed well executed PCM at CD bit depth and sample rate, has bettered the sound quality obtained by using higher numbers for both, mainly by the selection of the architecture of the inverters. I have had my eyes opened there by using a great multibit DAC, using precision R2R ladders, which I prefer over the sound of excellent Delta Sigma varieties.

It has also been pointed out that the difference between so called 'hi-rez' bit depth and extended sample rates and CD standard is not a real world factor compared to all the other factors involved in recording, when sound quality is concerned.

It has also been demonstrated that 16 / 44.1 is transparent when a ADC and a DAC are added to the signal chain, comparing that on a live microphone lead, although, obviously, that depends completely on the resolution of the transducers involved, and the listener's ability to hear differences in blind tests.

Differences exist in digital hardware, which is of course true in analog hardware. Differences exist in the implementation of the hardware as well, and as Rob pointed out, everything we actively and passively do to the signal has an effect on it, but as everyone knows, the most obvious game changers are still the transducers, which is why we are here at a headphone site.

xnor's picture

Hey zobel,
I completely agree with you about sound quality when it comes to production. In reproduction it is usually primarily frequency response that shapes how something sounds.
And yes, this is where headphones still vary significantly and make the largest differences.

As for what you said about Delta Sigma:
High-end ΔΣ DACs typically are multi-bit.

Why? Because of the inherent flaws in 1-bit designs (which DSD inherits).

As for R2R ladders ... do you mean non-oversampling DACs?
Those significantly distort the signal during reproduction.

For example, they use zero-order hold which causes quite a lot of droop in the audible range (towards 20 kHz).

Zero-order hold is one if not the worst interpolation filter, which is why such DACs typically require additional filters to suppress >20 kHz images.
These filters are typically analog, so you get a lot of phase shift in the audible range as well.

Not only do you get an absolute phase shift, variances in those filters' components can cause audible relative phase shifts between channels.

While it's not better technically I also don't think that it sounds better.

zobel's picture

I wonder if it is possible that the latest incarnations of chip-based R2R 32 bit DACs are different significantly to the one you listened to. I have currently 13 different DACs in my house here, and the one I like best is a Schiit Bifrost multibit DAC. It is my only R2R DAC, bur is definitely my favorite, surpassing more expensive Delta Sigma over-samplers.

Bifrost: Excellent Performance With All-New AKM DAC
Choose Bifrost, and you get a great DAC at an amazing price. Featuring the 32-bit AKM Verita® AK4490 D/A converter and a fully discrete stage for summing, current gain and filtering, Bifrost bests all previous delta-sigma Bifrosts—including the acclaimed Bifrost Uber—for a lower price.

Bifrost Multibit: The Most Affordable, Upgradable Schiit Multibit DAC
Choose Bifrost Multibit, you get Schiit’s proprietary closed-form, time- and frequency-domain optimized DSP-based digital filter like Yggdrasil and Gungnir Multibit, coupled to a precision Analog Devices AD5547CRUZ digital to analog converter—a D/A never before used in any other audio product.

Bifrost Multibit

D/A Conversion IC: Analog Devices AD5547CRUZ
Digital Filter: proprietary Schiit bitperfect closed-form digital filter implemented on Analog Devices SHARC DSP processor
Analog Stage: precision I/V converter and output buffer based on AD8512

Frequency Response: 20Hz-20KHz, +/-0.1dB, 2Hz-150KHz, -1dB
Maximum Output: 2.0V RMS

THD: <0.005%, 20Hz-20KHz, at max output
IMD: <0.008%, CCIR
S/N: >109dB, referenced to 2V RMS

KaiS's picture

@zobel "It has also been demonstrated that 16 / 44.1 is transparent..."

It's not! I have done this for years day after day, listening to both the original analog signal as it comes from the microphone and the sound through AD/DA conversion, when digital audio started. The difference was obvious, and even 48kHz/16bit was an improvement.
But hey, we are talking about a signal with full dynamics and a frequency response in the range of 50kHz and beyond, so this is the hardest task you can give to any transmission system, even harder than transferring analog master tapes.
It's really easy to hear the differences in this test.

The situation has bettered since, we are working in 24bit now, and converter quality has increased.
It's not only the combination of samplerate and bit depth that counts, there are a lot of variables like jitter that influences the quality of an AD/DA conversion. Now this is better understood then in the early days.

A lot of this increase in quality can be transferred to 16bit CD format, using dither/noiseshaping algorithms.
.
.
@xnor "These filters are typically analog, so you get a lot of phase shift in the audible range as well..."

It's is a widespread misconception that digital filters can avoid phase shifts at no cost on other parameters, but - for the same filter there is no difference if it's analog or digital.

There is a physical connection between the frequency and time domain, and no matter what you do you can, for a given filter order (steepness), achieve either perfect frequency-, phase-, or impulse response, never all three.

This is especially problematic at a sample rate of 44.1kHz, because you don't want to compromise the frequency response at 20kHz and want to get high damping above 22.05kHz to avoid aliasing products.
This means you need extremely steep and therefore problematic filters.
Oversampling and then filtering in the digital domain does not change the basic problem at all!

Usually you have to find a good compromise, which even means the frequency response of the filter might be less than perfect on favor of the other two.
The so-called linear phase response filters have a huge amount of time smear, e.g.
They generate a signal BEFORE the arrival of the main pulse, which can be quite annoying.

There are advantages of digital filters, but you cannot go beyond physics.

SACD with its DSD format sounded like being a solution, but to take the full advantage you need a complete producing chain all in DSD, which, to my knowledge, does not exist for professional productions based on multitrack recording and mixing.

Zalmo's picture

Not to rock the boat...or run into flack (why put yourself in harm's way?), but, I made my bread for 16 years plus on 16 bit / 44.1 in the daze of the CD remastering craze by one of the Big 4 Recording Corporations (at one point, we were number 1).
When it came time to do a shootout between A to Ds, our criteria was to choose the unit that sounded most natural. That means closest to the analog source-tape (we used the top of the line Studers..the last of the great reel-to-reel decks).
As we were the arm of a major Recording corporation, we had access to the actual master tapes...and that's what we utilized for our CD releases...exclusively...IF the tapes were available.
So we A/B'd the master tape machine vs. the 1630 passing through 3 - A to Ds (played back through Apogee filters), which will (mostly) remain nameless (2 of the companies, to my knowledge, are no longer in business). So, we all choose one out of the three. In the middle of this, I commented, maybe we should choose the one that sounds closest (or the same, if you will) to the actual analog reference playback. So, we switched gears, and dropped the crispest sounding A to D and choose the one that...sounded like the master. AND IT DID. One sounded brighter, but, that wasn't closest to the original. And, that was back then, 16/44! In other words, we heard through the A to D that we chose...what we put into it (way before the higher sampling rates/etc.)
One thing I've learned is that the MASTERING is MORE IMPORTANT than the pure "superior" specs of the higher digital bandwidth available. That said...many of the old tapes have deteriorated over the years and the larger digital bandwidth will NOT make that sound better....
One final thing....If you go to a site like SteveHoffman.TV, you will see that MANY prefer the"inferior" bandwidth CD remasterings that our team did...to the newer, hi-falootin' higher digital bandwidth remasterings by many of the major (re)mastering outfits!
To go out on a limb, I would say the musicality of (our) 16/44 remasterings could sound as good, as fulfilling as the higher rates during the majority of the higher sonic levels...and where you will or might hear the difference is dealing where very low musical passages occur. There, the more massive dynamic range WILL (perhaps) be evident in the smoothness of the audible content...but, the way we did things, chasing the fade, it might be a bit harder to tell...except for mastering engineers who are trained to listen more to nuance than most listeners...though I would expect audiophiles might perceive this as well.
Hey, my criteria in mastering was to make to mastering sound musical, with no deleterious effect, no aberrations...and as much like the master as possible. Many agree that our crew did just that. And the sounds of our masterings still hold up.
Even though it was only 16 bit / 44K. Way back when

zobel's picture

It always is refreshing and informative to hear from real-world, hands-on engineers who have produced recordings themselves. The tools and techniques you used "way back when", do stand up today, as you said, even with "only" 16 bit / 44K. I was glad to read that you chose your ADC by ear, and went for fidelity, with accuracy and naturalness as the yardstick. Ears will always be the best test.
What DAC do you now prefer?

xnor's picture

KaiS, I got your response per email but it's not showing (maybe because the thread got too deep?) so I'm replying to your first post and quote portions of your response:

"It's is a widespread misconception that digital filters can avoid phase shifts at no cost on other parameters, [...]"

I didn't imply that at all and have never heard anyone competent in the field claim that. Is it widespread among audiophiles?

Have you seen some of the analog filters on non-oversampling DACs?
If you want high suppression while keeping the filter reasonably expensive/complex you won't get a filter with smooth magnitude/phase response...
This also brings me to my other reason why I mentioned analog filters: component tolerances. See my previous post on how this can create audible distortion.

You write: "for the same filter there is no difference if it's analog or digital."

In theory you get the same filter, sure. But in reality you have to deal with real (non-ideal) analog components.
Also, in the digital domain you have more options.

There is a reason why "purist" (lol?) R2R D/A converters that approach high-end typically upsample with digital filters - even if it's a min-phase filter.

"This is especially problematic at a sample rate of 44.1kHz, because you don't want to compromise the frequency response at 20kHz and want to get high damping above 22.05kHz to avoid aliasing products."

During A/D conversion, sure. I don't see how this is problematic with today's technology though.
Btw, if you don't care about stuff above 20 kHz you can relax the filter and allow aliasing down to 20 kHz (or whatever is your cutoff point).

"The so-called linear phase response filters have a huge amount of time smear, [...]"

I know what you mean but linear-phase filters don't time smear. That's the whole point of linear-phase.

"They generate a signal BEFORE the arrival of the main pulse, [...]"

Given the previous 44.1 kHz example, this is only the case if this "main pulse" has energy at >20 kHz.
Up to 20 kHz such an ideal linear-phase filter leaves the signal untouched.

That leads me back to the first reason why I mentioned analog (typically min-phase) filters before: they can produce significant phase shift well bellow the cutoff point.

So even if such a filter's cutoff is 24 kHz, it may produce significant phase shift in the treble range.
That's not the case with linear phase filters.

"[...] which can be quite annoying."

Sure, not if the ringing is >20 kHz though (it can be above 21 to 22 kHz for the above 44.1 kHz example) or when there is little energy to begin with.
Ideally and typically both are the case.

Additionally, even if it was within the audible range it would be masked quite well by lower frequency, higher energy content.

"SACD with its DSD format sounded like being a solution"

A solution to what? DSD is inherently flawed due to the forced 1-bit conversion (which PCM doesn't have), but I think it did have a positive effect on the technology / industry.

Simply Nobody's picture

Rafe ....... Please hurry .......... Please review that VPI turntable with headphone output ASAP :-) ..........

ednaz's picture

Reading this reminded me... In live performances, Oregon used to joke that they were doing an acoustic performance... as they pointed to the array of microphones scattered all through their performance area, and the huge speakers that allow people beyond the front row to hear.

The concept of neutral or natural is fascinating when digital technology is being used for capture and reproduction of ANYTHING. I don't know the equivalent musical analogy, but when I hear photographers bemoan how digital will never be as real looking as film, I love to jump in with: which film? Tri-X? Hmmm, I see in color, not black and white. Ektachrome VS (very saturated colors)? And then when it's printed, dodging and burning. Other than live performance, there's no REAL neutral or natural, because as the writer says, there are a number of choices all along the production chain.

I like thinking about coming as close to the artist intent as possible. We wouldn't have the loudness wars if that wasn't what artists thought matched their artistic intent. ("As loud as possible.") Decisions are made in the capture, and then re-made in the mix. I'm sure that in the studio, the bass player for Sly and the Family Stone wasn't as loud as it is on their albums.

I enjoy listening to when artists revisit old albums with re-mastering and re-mixing, because it shows they're still thinking about their artistic intent, that it need not be frozen in time.

I will say this, though. Some music is a waste of bits produced at 24/192. Like Alabama Shakes - I love them, but there's no nuance in their intent. Loud in your face and hard edged. Bands like that don't sound much different through my sound system at Apple Store resolution than they do at 24/96. I suspect that DSD gives the engineer more latitude to push sound this way and that, just as more resolution and more dynamic range in cameras lets a photographer "torture" the files without obvious negative effects.

bogdanb's picture

*edited for courtesy.
In photography, when digitising art (photographing museum artworks) there are a couple of standards, one from US and another from Europe, that soon will become just one. (fadgi & metamorfoze)
What they do is to guarantee that the art reproduction was done at the highest possible standards, the pinnacle of technology was used. Basically endure that that was the best reproduction mankind was able to do at this moment in time.

You are saying there is no way to do that in the audio world?

I keep listening to Dropkick Murphys Warrior's code album, although is really a catastrophe in terms of sound quality.
"I'd like to think that if your listening experiences make you want to have more of them, if the sound you are hearing on your system is making you love the music, you are probably at the destination the artist wanted you to arrive at after all"

Not all artists are good technically to know what is possible and thus argue for a better mastering. Do they always have the final say?

In photography where a photographer might like, let's say an image with high contrast (daido moriyama) others softer images like the pictorialists etc, whenever a book is published the images has to be a faithful reproduction. Some excel in doing that adapting the image to the paper chosen (a downsampling in a way), others do a mediocre job.

What is the standard in the audio world?
Purchasing and paying a song more times is not a warrant of the reproduction fidelity.

Grover Neville's picture

Audio Loudness standards such as EBU do exist. More are paying attention to them due to streaming platforms such as YouTube, Apple Music, SPotify, Tidal, etc. specifyin LUFS (loudness units) targets. That said, DMR or other ways of measuring audio 'quality' are fuzzy at best and even the LUFS targets are not always adhered to by Studios and engineers (or pressing houses for example)
Broadcast and Film and Video Game score folks put out some of most consistently quality (not always exceptional, but usually always solid or good) recordings out there because they have stricter standards and monitoring for those standards targets.

SonicSavourIF's picture

*edited for courtesy and respect.
The interviewed engineer completely misses the core of the discussion about neutrality.
To readers who are interested about the problem of neutrality and accurate sound reproduction, the matter gets discussed to quite some some extent in the sound reproduction book by Tool, see
https://www.sciencedirect.com/science/book/9780240520094
Therein it becomes very clear, what the goal of capturing audio must be, what it can't be, where the problems of accurate sound reproduction lie and what to do to achieve it. An excellent read and much more enlightening than audiophile press.

Secondly, readers who want to understand why DSD is not needed and even harmfull to „next genereation“ sound reproduction, might be interested in the following paper by Stanley P. Lipshitz and John Vanderkooy
http://www.thewelltemperedcomputer.com/Lib/SACD.pdf

scottsol's picture

Rob’s assertion that DSD64 has 64 times the resolution of the CD PCM standard of 44/16 is incorrect. That number refers to sampling rate, which is about 2.8 mHz or 64 times the 44 kHz of CD. This gives us no information about resolution without knowing the bit depth. Since DSD is a 1 bit noise shaped scheme there is no way to precisely compare it to a multibit PCM system.

Most people would accept that DSD64 is roughly equivelant to 96/24 PCM. The bitrate per channel of uncompressed 44/16 is about 700 kbps while 96/24 is 2.3 mps and DSD64 is 2.8 mbps. While these numbers can’t be used as a solid figure of merit, it should be quite cleare that there is no way DSD has even close to 64 times the resolution of CD.

Rafe Arnott's picture
I fixed this error in nomenclature in the piece, thank-you for point it out.
KaiS's picture

I don't see a point in using the terms natural or neutral in context with pop and rock recordings.

These music style's recordings are artificial creations using lots and lots of not neutral, natural or transparent recording techniques and processing, all being part of the sound we love about them.

When such a band produces a record there never is an "original" acoustic event that can be reproduced 1:1.
Usually instruments and vocals are recorded one by one or in smaller groups, the band is not playing all together at any time. Even in live recordings instruments are recorded on separate tracks and later corrected, replaced and added at will of the producers or the musicians.

This way of producing music has started when the first tape machines were invented and used for "Multiplayback recording" and has developed until today where every sound snippet is under comprehensive control of the producer.

Recording is an art and not just a way to capture "what is" by sticking a microphone in the air.
Whoever is interested can get tons of information about contemporary production methods from the Internet.

.

The other point is preserving those works of art and transferring them into our digital world.
Here the terms accuracy and transparency comes into my mind.

Every copying of an analog source produces generation losses. Specially the playback machine for analog tape needs to be carefully selected and correctly aligned, this has magnitudes more influence on the final result than the choice of the AD converter or a digital system used.

The low and high frequency response on a professional tape machine can be aligned in the range of several dBs, and even the best reference tapes have deviations of 0.5 dB at best, if you still find some.
If you are a lucky the primary recording engineer has recorded some reference tones 40Hz,100Hz, 400Hz, 1kHz, 3kHz, 10kHz, 16kHz at best for later playback alignment. Very often those are missing or incomplete.
I always included a frequency sweep 20Hz-20 kHz with my master tapes, but this is the absolute exception.
Then there is the uncertainty between NAB and IEC equalization standards for playback of analog tape, which sometimes can be solved by listening to the release version on vinyl if there are no reference tones present and no markings on the tape box. Very often you just have to select by probability while listening to the result.
Even then it's hard to include what was done during tape to vinyl transfer, because very often there was some special equalization (early form of "Mastering") involved during production of the vinyl release version.

When I listen to the music I love (usually on Tidal lossless) I always like if I have the choice to select between the as close as possible to the original release sound (that I was used to then) and maybe an alternative remastered version.
Even then I don't expect this to be a 100% copy, I'm happy if there are no audible artifacts of an MP3 interstep or glitches from bad CD transfer as unfortunately often is the case.

In this context I value the work of MFSL in preserving the originals as good as possible, instead of having sloppy transfers done by just some overworked auxiliary engineer or the office secretary ("Now I've copied the cover, shall I copy the disc too?").

A still in duty recording and mastering audio engineer.

Leffen's picture

Through time, gear gets more transparent. We figure out how to improve the transparency. We've come a long way from Thomas Edison's phonograph, etc. All types of gear get upgraded this way.

Audiophiles like to be on the bleeding edge of gear improvement. They upgrade their systems as gains get made in transparency. They spend time intellectualizing, evaluating if gains really have been made (since it's a marginal affair).

When you get to the bleeding edge of transparency, different pieces still sound different. Audiophiles pick the one that sounds the most musical to them.

On top of having a transparent system, many audiophiles still make edits for pleasure. EQ, digital effects, or different "coloring" analog pieces to effect their system to their taste.

The really cool thing is to be able to have a maximally transparent system, and THEN bias it for pleasure as you please.

Then you have the best of both worlds.

This is possible, it's always been possible. Transparency improves. Pleasure never goes away.

There's nothing wrong.

stcigar's picture

..is letting all things be save for yourself.

echoplex's picture

The reason to try and mix/master in a "neutral" environment is so that the end result will translate as well as possible to markedly (or decidedly) non-neutral environments - like a living room with no acoustic treatment, full of furniture, and (worst of all) with wall to wall carpeting.

The sonic differences between the room a mix/mastering engineer use and what the typical consumer/audiophile uses is like night and day. Mix/mastering rooms can run into 5+ figures between the foundation, floating floor, room treatments, architecture - before you even get to the choice monitors, preamps, power-amps, converters. But once you get to a certain level of pro audio gear, the real source of neutrality is the physical design and setup of the room that gear sits in. Engineers who work in different rooms and environments learn how the same equipment sounds in different rooms, e.g., they carry around their own monitors to different rooms.

The sonic differences among choices of monitors from the likes of Dunlavy, Lipinski, Genelec, etc. - coupled with high quality class A or class D amps in terms of deviating from a flat frequency response, good impulse response, good RT60 measurement etc., in an acoustically correct room - are trivial compared to what happens when you take music mixed/mastered in that controlled environment, and play it back in home living rooms, or on a phone/computer with even $600+ headphones (whose frequency response - however good on paper - is easily ruined by an impedance mismatch).

Essentially, the mix/mastering guys (like Bob Katz - who has written columns here) are tearing apart the micro dynamics of a mix to make it sound better, and can hear/apply 0.5 dB changes in EQ in such a controlled environment - just so consumers have copious leeway to mess up the tonal balance in their living room or on their portable rig. But without producing it in what is by contrast acoustical "white room" conditions with a huge magnifying glass to fix EQ, dynamics etc. - what consumers hear in their living rooms and headphones would sound even worse (than it does now). It's not going to scale linearly from the "white room" of the studio when played back thru the non-linearities of consumer gear, but it will translate better the more controlled the production environment was.

The "neutral" reference comes from measuring the response of the entire audio signal/chain in the mastering/mix room, and then making changes to the treatments, monitor positions, or even the gear to get the best combination of frequency response, impulse response, reverb time, group delay, etc. If you want to learn more about the guidelines for those measurements, then take a course in acoustics (it's required for my students, and for most audio engineers these days who are in an accredited degree program).

Receivers and preamps attempt to mitigate the gross acoustical problems in a room with various canned/automated EQ software systems, e.g., Audyssey. From what I've seen of these, perhaps Trinnov is one of the better ones. But you cannot EQ away nulls in a room from interference/modes. Adding too much EQ starts to cause other problems from changing the gain staging, and can make effects from room modes worse. This is why the home EQ systems just don't touch certain frequency bands, and limit the EQ amplitude. EQ cannot make up for not fixing the room itself.

Many consumers/audiophiles simply don't know what a neutral acoustic environment sounds like. We see the look of (pleasant) shock and awe on students faces as they start to log enough hours in the studio to experience what a really accurate monitoring environment sounds like - and that becomes their reference.

I suspect the current enthusiasm over all the various headphone designs is because people don't even own a pair of speakers anymore. Consumers will spend money on a laptop, iPhone, and an expensive pair of headphones - and that's their investment in audio. But listening with high end Stax, Senheiser etc. headphones is still nothing like listening to music in a well treated room that's been measured with/optimized for the gear at hand. This is why it's still silly to try and master at home in an untreated room, let alone through headphones.

At least the Harmon curve (which Tyl and Bob Katz) referenced in previous columns starts to give a point of common reference to reproduce a diffuse field equalization in headphones. While I know many audiophiles don't like diffuse EQ, it is a step towards neutrality (in at least headphones). And the consumer for whom "less is more" may well be better off than the over zealous audiophile who is convinced that a $15K pair of speakers are going to make music sound better in his untreated living room compared to a $5K pair of speakers.

BTW - the reference above to the 17 year old (2001) AES paper citing issues with 1 bit sigma/delta conversion is I believe obsolete. I don't have the subsequent paper reference handy, but I understand it was proven that the design is in fact mathematically sound. The mastering engineers I know generally remark that DSD can sound better than 24 bit, 192K PCM (depending upon the source material).

GeneZ's picture

If anyone recording could achieve neutrality what the listeners on the other side of the speakers would hear might prove very disappointing. The best sound in the house begins at about twenty feet away from the performers. Yet, the mics used to record with are usually in very close proximity with the musical instruments. I always found that recordings made at a distance by someone in the audience always sounded surprisingly better than what I heard while performing.I believe other musicians would admit the same. Up close where recordings take place most of the time, needs some "doctoring" of the sound to make it sound good. We might not like "neutral" as much as some see it as an ideal..

bogdanb's picture

in photography there are image takers and image makers (check it on a book store).
I guess the same philosophy could be applied to the audio world.

So firstly there is the CAPTURE of the sound, the recording.
(Whatever the process used to calibrate the tools used as long as the losses in the data captured are kept to a minimum defined sets of standard that should be considered TRANSPARENT/ FAITHFUL/MIMETIC)

Q:
What are the issues at this stage? ; What are the best practices? Are there multiple methods to get to the same result without loss of information? What are the limits today of what is possible to be recorded?

Should be there maybe to ways of looking at the recording stage? One a documentation approach, when let's say a concert is recorded and another when the final result, the creation implies later work in post processing?

The second stage the POST PROCESSING.
Q:
What are the lossless corrections that might be applied to consider the stage faithful? (let's say the Document aspect of the recording is paramount)
Are there instances when something is recorded with the special intent to be post processed thus making the step/stage an equal part of the creative process?
By now I think it is clear that the more analog to digital or vice versa is done there will be s loss in the file information. But is it so? Is there a way around this? Are there any benefits in adding another conversion in this process? If so, why?
Any standards for this stage? Any best practices guidelines?

If I remember correctly Pink Floyd recorded the drums in a hallway of a house because they liked how the drums sounded there.
So one might argue that that bring a "coloration" to the sound, also modifying a sound after it has been recorded.
Other artist might want to add field recordings, or bring all kinds of stuff to the final file/song/audio work.

DOWNSAMPLING
How much longer will downsampling be needed? Will there ever be a time when a full resolution file will be streamed? or possible to download.
How come sometimes a youtube stream can sound so pure, when the listener forgets about everything and starts hearing some wonderful sequence and overlap of sounds in a magical way that he wants maybe to repeat the experience?

We all know 8 was replaced by 16, 16 by 32, 32 by 64 ... 128 by 256 and the replacement pace grows exponentially. We also know some like old school stuff, there are people buying very old cars, heck even old computers some even buy vinyls others really old oil paintings.
Other people buy new oil paintings, as they know how long oil will last in time and make their investment safe.
We also know old pianos aren't so good, but some old violins are irreplaceable and unique.

How do we best preserve the sound people made during their lifetime either by singing or using instruments or by composing an audio work without adding our personal interpretation?
How should that be done so that 100 or 1000 years from now with certainly more advance rendering machines (headphones, players etc) the listening experience would transcend time and the tools used?

KaiS's picture

The further you go through the music recording chain (artists performance, microphone pick up, the lengthy creative recording process, creative editing and mixing, mastering, transfer to distributable media, playback at home) the less the technology used has influence on the result, except for the last step which is in your personal responsibility.

There's a saying with us audio engineers: "If you want to make a great recording find yourself a great singer".

Recording is not bringing the whole of an artists performance into your living room. If that is your intention you can consider the losses as very high in the first place.
There is no transparent or faithful reproduction because it's not possible to stuff all aspects of a musical acoustic event into a recording.

All people involved in doing a recording put in their personal taste, and a lot of decisions are made how something should sound during the process of creating a piece of art we call music recording.

What you like or dislike about the sound of a music track might have been a decision to move a recording microphone several inches to the left or to the right.

Sound wise, once the signal is converted from the acoustic into the electric domain, I can rest assure you that all the equipment used has comparably very minor influence on the results, aside from intentional sound processing.
The next big change happens when the signal is translated back into acoustics, at your home speaker or headphones.

The quality of the technology used is a limiting factor, but it's a minor part of the equation.

There are great recordings which stood the test of time that are now 70 years old, done with what they had then which is by no means on the standard we have today.

As a producer I like the fact that now I have detailed creative access to every sound snippet and don't need to think about technology very much, but I did great recordings 30 years ago when nothing of the todays possibilities were even in sight. I had great singers then too :-)

Buddha Khan's picture

Hi Rafe

This or watching paint dry?

Simply Nobody's picture

....... or, watch grass grow .......... It is that time of the year :-) ...........

Pokemonn's picture

"But, nothing is neutral.

Nothing. "

maybe future headphone systems will AI supported analyze/compansate before playback each tracks.

X