Paper Review: "The Effects of MP3 Compression on Perceived Emotional Characteristics in Musical Instruments"

In a recent paper, Ronald Mo, Ga Lam Choi, Chung Lee, and Andrew Horner of the Department of Computer Science and Engineering, Hong Kong University of Science and Technology, published results of a study intended to determine, "The Effects of MP3 Compression on Perceived Emotional Characteristics in Musical Instruments." Woah! Very cool. Let's see what they found.

(I'm very grateful this paper is available for free download under the AES Open Access Policy. I highly recommend you download a free copy here to follow along.)

Here's the abstract:

The topic of the current paper is to determine how these MP3 artifacts change the emotional characteristics of the sound. Even though many previous studies have considered the relationship between music emotion and timbre, the relationship between music emotion and MP3 compression is still unexplored. In light of this, one might wonder how much MP3 compression affects the emotional characteristics of musical instruments. In particular, do all emotional characteristics decrease about equally with more compression, or do some increase and others decrease? Are any emotional characteristics relatively unaffected by compression? Which instruments change the most or least with more compression?

Twenty undergraduate students with normal hearing, who were not trained in music or audio arts but were considered "attentive listeners" were used in the test. Subjects listened to sustained instrument sounds of the: bassoon, clarinet, flute, horn, oboe, saxophone, trumpet, and violin. These sustained tone tracks have been used in other studies making it easy for future researchers to compare studies; most of the sounds came from the McGill University instrument sound collection.

Each track was compressed using the LAME MP3 encoder to: 112Kbps, 56Kbps, and 32Kbps. The original uncompressed track and compressed tracks were presented to listeners in randomized pairs and were reproduced by a Sound Blaster X-Fi Xtreme Audio sound card and Sony MDR-7506 headphones.

Subjects were asked to rate the relative emotional impact while selecting between the sample pairs. Subjects compared the tracks in terms of ten emotional categories: Happy, Heroic, Romantic, Comic, Calm, Mysterious, Shy, Angry, Scary, and Sad. These emotional characteristics are chosen as they provide a broad spread across the Valence-Arousal plane, which plots emotions on the vectors of Valence (unpleasant-pleasant) and Arousal (deactivated-aroused).

161214_Blog_MP3Emotions_Photo_ValenceArousal

Plot at left shows a typical representation of the Valence-Arousal emotional space; plot at right shows emotional descriptors used in study place in V-A space.

The study had numerous checks and balances to ensure results gathered were reliable. It states:

Subjects were fairly consistent in their responses. That is, subjects voted for the same tone in both comparisons (AB and BA) about 80% of the time. We measured the level of agreement among the subjects with an overall Fleiss’ Kappa statistics. It was calculated at 0.22, indicating a fair agreement among subjects [53].

The study generally found was that emotions in the positive half of the valence spectrum (pleasant emotions) were negatively effected by increasing compression; and that emotions in the negative half of the valence spectrum (unpleasant emotions) increased with increasing compression.

161214_Blog_MP3Emotions_Photo_Results

Although the valence axis is flipped in the above graph with pleasant feeling to the left and unpleasant feeling to the right, it can easily be seen that positive emotions were negatively effected and negative emotions were increased with lossy compression.

They also found that some instruments were more susceptible emotional change with varying amounts of compression. From most to least effected those instruments were: trumpet, sax; oboe; basson; clarinet; flute; violin; horn. The paper posits that instruments with more spectral incoherence may be more strongly effected by quantization jitter in the encoding process.

Another interesting point brought out in the paper is that the emotion "angry" suffered little effect with compression, while the emotion "sad" had the strongest variation even though they are roughly at the same negative valence level.

Paper Summary
Your take-away point from this paper: Compression does measurably and reliably diminish pleasure in music listening.

My Interpretation
While compression does diminish the pleasure of music listening, it's important to note that 112Kbps compression rates barely move the needle. (You'll have to look at the graphs in the paper to see this.) I would suspect by the time you get to 320Kbps these tests wouldn't show a difference at all. I'm not saying there isn't a difference in the emotional impact of 320Kbs to 16/44 and higher, I'm just saying it's more subtle, and may take more experienced listeners and gear better than Sony MDR-7506 used in this study to gain much significance or to reliably detect.

In other words, while the paper shows heavy compression sucks, it also shows that 112Kbps is pretty good. It's my feeling that by the time you get to 320Kbps MP3 encoding, audio problems that effect listening pleasure will more likely be in the headphone's imperfect performance rather than the bitrate of the source.

I think this study does validate the idea that better audio fidelity results in improved listening pleasure, and I would assume that applies to improved audio fidelity in all aspects of audio reproduction. But I would caution the reader to understand the significant diminishing returns curve seen even at these severe bitrate reductions. I do sense that high resolution datarates above 16/44 are more pleasurable. But I also weight in the fact that at those rates the pleasure I derive from listening is much more about the music itself than the fidelity of reproduction. I'm not going to obsess over the fact that my preferred music is only available at CD rates...I'm just going to enjoy.

Lastly, and maybe most importantly, is the fact that this study clearly shows that monitoring one's own emotional reaction to what one hears is a valid method for evaluating audio performance. Objectivists will often pull out the old "indistinguishable under A/B/X testing" rubric to poo-poo claims of people who discern the small differences made by cables and high-resolution. I would counter: Maybe you're in your head too much and should be listening more with your heart.

I would also caution subjectivists that pride of ownership or your emotional state at the time of listening may play a roll just as emotionally powerful—or more—as the emotional gains of actual fidelity improvements.

"The first principle is that you must not fool yourself and you are the easiest person to fool." Richard P. Feynman

COMMENTS
klausosk's picture

Dear Till,

What a great article. Not only your capture the important bits of the paper even you went forward to include a real world view of it. We need more of this.

I would like to suggest to do a review of a full system: headphones, cables, player and music files. How to build a good system for portable and for desktop use using headphones. What is important or it is not, to improve listening pleasure (sound quality) should I spend £100 more on a extra cable or on a Astell&Kern player?

Thanks,
Ricardo

Bigmaluk's picture

I have to agree with Klausosk about the way you've made a paper easier to understand. There has been report after report, test on test comparing compressed music. Surely when doing tests on compression/cables etc. it's about the feeling of the tracks you listen to. If you spend to much time doing comparisons about what's clearer aren't you missing out on what actually you listen to music for, the fun and enjoyment.

MarlenesMusings's picture

So they found out that 32 kBit/s mp3 (or highly compressed music) doesn't sound good. What a stunning discovery! At that bitrate, mp3 resamples to 16 kHz. Meaning: at max you have a frequency response of up to 8.000 Hz. It's not even the compression that's bad, it's the bandwidth-limited sound itself.

P.S.: mp3 is not optimized for these low bitrates. A more recent codec like aac or opus would have yielded better "emotional" results.

P.P.S.: the whole methodology is wrong. "Sounds have (...) timbral and emotional characteristics" - WTF? I thought it was the combination of notes (chords, harmony, counterpoint) creating the emotional reaction. Well, I guess I have been understanding music wrong all my life.

tony's picture

which is: the guy that coined the concept of "Eargasms" is Genius !

I first heard the term at a 2011 RMAF Seminar, it's stuck in the back of my head ever since.

Gear needs to be selected on the basis of how much Dopamine it triggers in a person's brain. This can take extended time lengths.

Gear that delivers the emotional component will continue to deliver it over great lengths of time : "the test of time" .

A few examples : the Original Quad ESL ( from the 1950s ) , the LS3/5a, Sennheiser HD600 & HD800, Etymotics.

I probably should include the vacuum tube ( 6dj8 & 6SN7 variants ) but they die of old age ( 4,000 hours use ).

Steve G. asked how to test eargasms?, he proposed distance. ( a bit of a dirty bird )

Tony in Michigan

Tyll Hertsens's picture
Actually, I think I coined the term many, many years ago. I think it even showed up in a Stereophile HeadRoom ad. And I think I responded "distance" to SG's question of how you measure one on Facebook some time ago.

So yeah, I'm the dirty bird.

tony's picture

I've been valuing gear based on it, it's what I look for. Some gear presents too much addictive for me, like that Valhalla 2 loaded up with Russian glass, phew, guys were lining up at that Ann Arbor headphone meet to try out various cans on that thing. Stupid me even bought a pair of Audeze 8 open based on that bit of listening. ( sold em later as they went flat on my lovely Asgard 2 ).

I'm going for just a "certain" level of thrill, sort of staying away from the Turbo Porsche or "super" bike levels of screaming thrills.

A "Ford" friend just retired, after a decade in Germany. He is an ARC / Infinity Reference Audiophile with his gear in "storage" for 15+ years. He was over at my place during our Bernie Sanders Working and had a listen to my Audiologist "tuned-up" Sennheiser/Schiit system, using my iMac iTune music collection. He was having "Eargasms" say'n he was gonna set-up his Big System again, he misses his System. Well, he's pulling his hair out trying to get it back on song. I heard it, it is boring.
Every call he made resulted in a $5,000 to $15,000 remedy, ARC wants the price of a "new car" to re-tube ( for gods sake ). Still, the Bass Towers play bass but they don't go anywhere as low as my Sennheisers ( which have tons of Beelzebub deeeeep Bass ). His wife says the stuff sounds the same to her! Replacement Phono Cart = $5,000, phew. I have to say that my friend is listening to OLD vinyl which was never very good compared with the kind of results Bob Katz releases, geez, CSNY "almost cut my hair" sounded horrible ( maybe just poorly recorded but we never ( 1990 ) noticed it while smoking the pipe ).

Anyway, old audiophiles, coming back, are blown away by mediocre headphone gear that's properly set up. Blown away by the prices too!
Anyone can have my lovely system for well under $2,000.

Thank you for being there and pointing the way, I could never have gotten here without your guidances.

Tony in Michigan

OldRoadToad's picture

And never will. I think it best to give up for now.

ORT

castleofargh's picture

so from this "study" we've "learned" that the lowest mp3 bitrates aren't transparent. and that when something sounds bad, it sounds bad.
mind blown.
we've always known that mp3 sucks at low bitrates. it's pretty much the worst in those conditions. I use mp3 on my DAPs and I wouldn't use those bitrates even on my audiobooks.
I imagine the discussion to set up the experiment: "hey let's do something useless we all know the result of, using subjective quantification to try and mess up a proper test. oh! and let's use mp3 at bitrates nobody uses! that's gotta serve a purpose." and in the room they were all like "genius! let me call AES"
^_^ I can't stop laughing at this stuff.

next I expect them to test if music becomes more emotional in altitude. and if they use the same logic, I can't wait to learn about the trials done in full vacuum. >:)

Johan B's picture

I find 112Kbps annoying sounding and it does not make me happy.

Scientist1's picture

There is no reason to use lossy any more. You might as well use lossless.

There are in fact NO differences in cables or high res audio.

Scientist1's picture

^please stop perpetuating this nonsense

jaredjcrandall84's picture

Depends on your ears and what equipment you have. Apple and beats suck

thefitz's picture

A shame they only played one instrument at once without any percussion. Once you throw in multiple instruments, especially involving drums, the mp3 sewer sound comes out in full force.

HalSF's picture

Emotional deadness is not just and important marker is audio fidelity testing, it's a key indicator in a lot of harsh, contemptuous, absolutist audio skepticism you read in comment threads like these. Science with a heart, like Tyll's post, is good; angry-sounding voices delivering withering and dismissive blasts of Perfect Lossy Sound Forever, not so much.

Three Toes of Fury's picture

Ive stuck with mp3 for the convenience: i want a compressed format that will play on any portable player.

I bumped up to 320kbps a few years ago as i figured that once i committed to the format, i should get the most out of it. So far im very pleased with that decision.

That being said, i think its great that there are so many options and formats avail for the audio community. I wouldnt hold it against anyone who either gets more out of different file types (lossless vs lossy) or compression rates within. At the end of the day, folks coming to this site love music and seek out ways to enjoy it on a different level (headphones, dacs, amps, and sound quality).

Peace .n. Living in Stereo

3ToF

PS: I found this on an online forum, i cant vouch for it, or its findings, but i like the approach.

http://www.music.mcgill.ca/~hockman/documents/Pras_presentation2009.pdf

stalepie's picture

That's an odd choice.

Tyll Hertsens's picture
In the paper it describes that a number of other studies have used the same set of compression levels. Researchers tend to set standards for their tests based on past work so as to build bodies of research that can be cross referenced. It also said that those bitrates are used because other tests showed: most people can easily hear the difference at 32Kbps; most people have a hard time hearing differences with 112Kbps; and 56Kbps split the difference.
clohmann's picture

Well, IMHO, the best records & resolutions played back through my cans (HD800)make the sound truly 3 dimensional. Space, depth, and detail! As the resolution decreases, the images start to flatten out. Like a sculptural relief. As the lossy compression increases, so does the change to a 2 dimensional sound. Eventually, everything seems like layers of cardboard cut outs...

zobel's picture

I doubt if women need to be concerned about this, but guys, what do you use to clean your headphones after eargasms? Don't IEMs cause too much ear canal blockage to be worn safely during eargasms, or do they generally just blow out of the ears before the back pressure can damage the ear drums?

jagwap's picture

It is part of the encoder, and can mess up the the timing.

If they had used music where more people played together the difference would be clearer, even at higher bitrates.

skris88's picture

Phew, I can "come out" now. A short while back I took all my WAV and FLAC files and converted them to 320Kbps CBR MP3s. But 112Kbps? That's REALLY pushing it. And 16/44? I thinks that too is more than enough. What IS interesting is that 16/44 MP3s sound better through 24 bit DACs than 16 bit DACs. I suspect it's the loudness wars with 0dBFS being constantly breached. Is that even possible? Well ReplayGain reports this constantly on MP3 files I obtain, go figure. Hopefully with a 24 bit DAC the same file is some 48dB below 0dBFS leaving headroom for the peaks to sing rather then be dropped through the DAC. Just my hunch, Tyll should do a more detailed analysis for us.

jagwap's picture

Reading over 0dBFS is because of the heavy clipping of the loudness wars. 0dBFS is rms for a clean sine wave where the peaks touch the limits. If the signal hits the limiter hard and clips it can measure more than 0dBFS.

24bit DACs sounding better than 16bit DACs should not be a surprise. Especially if the volume control is not analogue. The the digital volume control will reduce the number of bits as it is turned down. Every 6dB down loses 1 bit. 24 bits have some 48dB before 16 bit is damaged by the volume control, if it is true 24 bit capable converter. Unfortunately than hasn't been invented yet (144dB dynamic range), so 21.5 bit is as good as it gets today . Don't worry, dither with cover that.

Don't use MP3. It's technically nasty. AAC is far better. But if you believe MQA, 192kHz 24 bit can be improved on...

skris88's picture

Hi "jagwap". I tried AAC. Too much bother, not enough of an improvement over 320Kbps CBR MP3s. I couldn't hear any improvement, did you? Good if you did. By the way I couldn't hear any difference between 320Kbps CBR MP3s and 800+Kbps FLAC files. Plus with storage space being so cheap and plentiful these days why would anyone even bother with lossy-compressed audio formats? The only reason for choosing MP3s was that it worked on my iOS devices whereas FLAC needed jumping over hurdles.

At 62 I'm living only to listen to and enjoy music, not battle constantly changing technology trends.

What's interesting is that I'm going deaf but I'm a happier man then those younger than me. 10kHz peaks in headphone frequency response curves don't bother me as much as they do Tyll, he doesn't think much of the Sennheiser HD-700 or Superlux HD668Bs - their "shrill treble" is not something that bothers me in the slightest bit as I can't hear it. Gee, being old is good! :-)

In 1990 I threw away over 5,000 vinyl LPs and I still don't regret that - apart from my bank balance knowing now what they're worth.... actually, no: I've moved houses so many times since then it would have cost me a fortune more then they'd be worth if I had dragged those heavy things around with me.

Nope, my 320Kbps MP3s are fine. Pity more iRadio stations don't offer that tho. There's HD, then there's crap low-res MP3 Internet radio stations. It's no wonder MP3s have such a bad reputation in the audiophile community.

You owe it to yourself to convert a bunch of your 24/192 FLAC files to 16/44 320Kbps CBR MP3s and do a blind AB test. Remember statistically you need to be able to pick the better FLAC files out 2 times out of 3 or higher. Any less and it's time to agree you've been snake-oiled by all this talk of More being better. It's not. I bet 90% of us "audiophiles" will fail when put to the test re "hi-res" FLAC vs 16/44 320Kbps CBR MP3s.

Bigmaluk's picture

I have to say skris88 has hit on something that no review can address, hearing loss whether natural or otherwise. It's never hit on in any review but experts or user reviews. How many people have had a audiogram to see if their hearing is"perfect".

I know mine is less than so due to age and my time as a firefighter. I have been reviewing earphones lately and supprisingly found so far the best sounding for my ears has been a cheap air of earphones. I'm left flabbergasted all bar a pair of Bose were not anywhere near as clear and transparent to me.

So where do I go from here do I get said cheapies or as I'm doing try to listen to more types of phones.
In a nut shell is it my ears making them sound better or is it my music tastes and sound preferences making the cheap headphones sound good.

derbigpr500's picture

Just my 2 cents. A while back, probably about a year, I decided to rip a whole bunch of my FLAC's into 320 mp3's of highest quality, so I could transfer them onto my iPhone and make better use of the limited 64 GB of space on it. We're talking about 24/96, 24/192 files that are really well recorded, some of the best in the world, including binaural recordings. I compressed them all to 320 mp3 through foobar2000. ABX tested them with my Beyerdynamic T1's and couldn't tell a difference between the mp3 and the original, no matter how hard I tried. Well, to keep the long story short, I made a new folder on my harddrive for the mp3 ripped stuff as well as a new playlist in foobar2000 for the mp3's. Because of how fussy my foobar2000 interface is, somehow I deleted the original files from the playlist and kept only the mp3's. Suffice to say, I kept listening to that playlist happily for the next couple of months with my main listening setup, being impressed time and time again how amazing it sounds. Little did I know I was listening to mp3's all along instead of original 96 or 192 khz files. That was proof enough to me that properly ripped mp3's indeed to sound identical to flacs. Now, is it worth ripping a flac song that's 20 MB in size just to drop it down to 7-8 MB as an mp3? To me it was, because it meant I could store 2-3 times more songs on my phone than before. On a desktop setup I wouldn't bother, storage is so cheap nowadays that I don't even care if my DSD files are 200-300 MB per song, so I don't see size as an issue anymore. Following that, mp3 is a useless format nowadays. Does it sound identical to original? In all my experience, yes. Is it worth the time and effort of ripping originals down to mp3 just to save a fraction of your hard drive space? No.

skris88's picture

I agree absolutely!

A lot of it is the "more must be better" syndrome. If you're uneducated about something it is probably the safest option I guess!

I know one person who insists 128Kbps MP3s are fine yet refuses to accept Bluetooth since it is only "near" CD quality!

And others who insist only 32/384 digital audio is the only way to go even though Alan Shaw of Harbeth (makers of some of the finest 'standard' loudspeakers available) says 16/44 audio exceeds human hearing requirements.

SashaRomero's picture

Sounds good i like you said it, The subject of the present paper is to decide how these MP3 antiques change the enthusiastic attributes of the sound. Despite the fact that numerous past reviews have considered the relationship between music feeling and timbre, the relationship between music feeling and MP3 pressure is still unexplored. In light of this,I hear it before when http://guestpostservice.org wrote on it for www.huffington.com one may think about the amount MP3 pressure influences the enthusiastic qualities of melodic instruments. Specifically, do every single passionate trademark diminish about similarly with more pressure, or do some expansion and others diminish? Are any passionate qualities moderately unaffected by pressure? Which instruments change the most or minimum with more pressure.

Itzme's picture

"Compression" & "MP3" are concepts that I have gone out of my way to have nothing to do with for at least 40 years now

About 20yrs ago I decided to try "portable audio" out. It didn't engage me enough to care to listen to my music that way. There is no doubt in my mind, that I'm simply not interested in listening to music where any "nuances" of it are "aurally reduced" to fit a format.Listening to Music has never been a "Background" activity for myself.I would diminish any SQ of what I'm listening to... "Why?".

Along with this goes the attitude that I'm not as bothered by talk about Audio Equipment that is way above what I can realistically spend for it. I'm more interested in what "Flagship" equipment is capable of. Like everything else there the "trickle down" parameters exist in Technology.

Wookie's picture

While compressions used in this study were rather drastic for regular listening (but that is understandable - they wanted that to make the study show some actual workable results), what Tyll said is very true - even in 32kbps <-> 112kbps with mp3 codec you are seeing the beginning of the curve showing diminishing returns.

Personally i use FLAC 16/44 but in all honesty i am not entirely sure if i could hear the difference between that and say mp3 320kbps. Would need to setup a proper blind test for this with several tracks i am familiar with. But since space is not an issue for me and 16/44 FLAC is only about triple bitrate (if that) of 320kbps mp3, im just using it.

I do however seriously doubt if anyone outside of select few can actually hear the difference between FLAC 16/44 and higher. I am absolutely convinced i would not be able to with the hardware i currently have, and even with top of the line stuff i am pretty sure i still would not hear it.
Would love to see some blind tests performed on people dealing with music profesionally (with trained ears for details) for this specifically - if they can distinguish lets say mp3 320kbps from FLAC 16/44 and FLAC 24/(whatever frequency there is).

dbmcclain's picture

MP3 encodes by removing the frequency components its psychoacoustic model thinks you can't hear because of loudness masking. But even if you can't hear the primary sounds, some IMD artifacts will be below the loud tones making them audible...unless they have been removed and therefore cannot produce IMD products.

My own lab measurements have shown that at loud but comfortable listening levels we are literally swimming in our own IMD bath. But that's how we have always heard sound - until the advent of lossy compression schemes.

- DM

X