Paper Review: "The Effects of MP3 Compression on Perceived Emotional Characteristics in Musical Instruments"
In a recent paper, Ronald Mo, Ga Lam Choi, Chung Lee, and Andrew Horner of the Department of Computer Science and Engineering, Hong Kong University of Science and Technology, published results of a study intended to determine, "The Effects of MP3 Compression on Perceived Emotional Characteristics in Musical Instruments." Woah! Very cool. Let's see what they found.
(I'm very grateful this paper is available for free download under the AES Open Access Policy. I highly recommend you download a free copy here to follow along.)
Here's the abstract:
The topic of the current paper is to determine how these MP3 artifacts change the emotional characteristics of the sound. Even though many previous studies have considered the relationship between music emotion and timbre, the relationship between music emotion and MP3 compression is still unexplored. In light of this, one might wonder how much MP3 compression affects the emotional characteristics of musical instruments. In particular, do all emotional characteristics decrease about equally with more compression, or do some increase and others decrease? Are any emotional characteristics relatively unaffected by compression? Which instruments change the most or least with more compression?
Twenty undergraduate students with normal hearing, who were not trained in music or audio arts but were considered "attentive listeners" were used in the test. Subjects listened to sustained instrument sounds of the: bassoon, clarinet, flute, horn, oboe, saxophone, trumpet, and violin. These sustained tone tracks have been used in other studies making it easy for future researchers to compare studies; most of the sounds came from the McGill University instrument sound collection.
Each track was compressed using the LAME MP3 encoder to: 112Kbps, 56Kbps, and 32Kbps. The original uncompressed track and compressed tracks were presented to listeners in randomized pairs and were reproduced by a Sound Blaster X-Fi Xtreme Audio sound card and Sony MDR-7506 headphones.
Subjects were asked to rate the relative emotional impact while selecting between the sample pairs. Subjects compared the tracks in terms of ten emotional categories: Happy, Heroic, Romantic, Comic, Calm, Mysterious, Shy, Angry, Scary, and Sad. These emotional characteristics are chosen as they provide a broad spread across the Valence-Arousal plane, which plots emotions on the vectors of Valence (unpleasant-pleasant) and Arousal (deactivated-aroused).
The study had numerous checks and balances to ensure results gathered were reliable. It states:
Subjects were fairly consistent in their responses. That is, subjects voted for the same tone in both comparisons (AB and BA) about 80% of the time. We measured the level of agreement among the subjects with an overall Fleiss’ Kappa statistics. It was calculated at 0.22, indicating a fair agreement among subjects .
The study generally found was that emotions in the positive half of the valence spectrum (pleasant emotions) were negatively effected by increasing compression; and that emotions in the negative half of the valence spectrum (unpleasant emotions) increased with increasing compression.
They also found that some instruments were more susceptible emotional change with varying amounts of compression. From most to least effected those instruments were: trumpet, sax; oboe; basson; clarinet; flute; violin; horn. The paper posits that instruments with more spectral incoherence may be more strongly effected by quantization jitter in the encoding process.
Another interesting point brought out in the paper is that the emotion "angry" suffered little effect with compression, while the emotion "sad" had the strongest variation even though they are roughly at the same negative valence level.
Your take-away point from this paper: Compression does measurably and reliably diminish pleasure in music listening.
While compression does diminish the pleasure of music listening, it's important to note that 112Kbps compression rates barely move the needle. (You'll have to look at the graphs in the paper to see this.) I would suspect by the time you get to 320Kbps these tests wouldn't show a difference at all. I'm not saying there isn't a difference in the emotional impact of 320Kbs to 16/44 and higher, I'm just saying it's more subtle, and may take more experienced listeners and gear better than Sony MDR-7506 used in this study to gain much significance or to reliably detect.
In other words, while the paper shows heavy compression sucks, it also shows that 112Kbps is pretty good. It's my feeling that by the time you get to 320Kbps MP3 encoding, audio problems that effect listening pleasure will more likely be in the headphone's imperfect performance rather than the bitrate of the source.
I think this study does validate the idea that better audio fidelity results in improved listening pleasure, and I would assume that applies to improved audio fidelity in all aspects of audio reproduction. But I would caution the reader to understand the significant diminishing returns curve seen even at these severe bitrate reductions. I do sense that high resolution datarates above 16/44 are more pleasurable. But I also weight in the fact that at those rates the pleasure I derive from listening is much more about the music itself than the fidelity of reproduction. I'm not going to obsess over the fact that my preferred music is only available at CD rates...I'm just going to enjoy.
Lastly, and maybe most importantly, is the fact that this study clearly shows that monitoring one's own emotional reaction to what one hears is a valid method for evaluating audio performance. Objectivists will often pull out the old "indistinguishable under A/B/X testing" rubric to poo-poo claims of people who discern the small differences made by cables and high-resolution. I would counter: Maybe you're in your head too much and should be listening more with your heart.
I would also caution subjectivists that pride of ownership or your emotional state at the time of listening may play a roll just as emotionally powerfulor moreas the emotional gains of actual fidelity improvements.
"The first principle is that you must not fool yourself and you are the easiest person to fool." Richard P. Feynman