It seems that you're using an outdated browser. Some things may not work as they should (or don't work at all).
We suggest you upgrade newer and better browser like: Chrome, Firefox, Internet Explorer or Opera

×
I have been thinking about the problem of how you would analyze audio with a computer program, especially after hearing about ways to analyze images, and have thought of a couple problems, and I figure I'd ask about what I think is the easier of the two.

The question is: Given a sound wave (say, as a .wav file), determine whether or not it is dissonant.

Has there been any attempts to solve the problem? Did it turn out to be easy, or is this actually a hard problem? Are there any programs out there that do this, or any research papers that investigate the problem? What techniques could be helpful here? (I have a feeling that a Fast Fourier Transform would come in handy here.)
Anyone?

(This post here in order to get this topic to appear in the "topics I've posted in" list.)
FT seems to be the best approach.

Something like this might work:
1. Throw FT at the problem.
2. Find peaks.
3. Throw away octave information (divide by two until in base octave).
4. Analyze frequency ratios by comparing to known list of consonant ratios.
5. If you discover anything else, label it as dissonant.

Alternative to 4: Apply machine learning concepts to build a classifier.
avatar
mk47at: 4. Analyze frequency ratios by comparing to known list of consonant ratios.
Of course, the question is, where could I find such a list? Also, how should I handle cases where the ratio is very close to a consonant ratio?
avatar
mk47at: Alternative to 4: Apply machine learning concepts to build a classifier.
Of course, that would mean finding people to listen to such sounds, and it's possible that people's hearing might not respond to rations the same way.
Post edited December 19, 2017 by dtgreene
Note: Talking about intervals in English is not very easy for me, because I have to look up every name. Obviously I'm only familiar with the German names.

The first question is: how much dissonance do you want? There is no perfect answer. Do you only consider unison, octave, fourth and fifth? Do you allow thirds? Sixths?

The second question is: how do you handle equal-tempered tuning (wohltemperierte Stimmung)? I'm not sure about your knowledge of music theory, if you don't know what think about a piano/guitar and the enharmonische Verwechslung (whatever it is called in English... aha, it's enharmonic equivalent), i.e. c sharp=d flat. This makes everything a lot harder.


[url=https://en.wikipedia.org/wiki/Interval_(music)#Frequency_ratios]https://en.wikipedia.org/wiki/Interval_(music)#Frequency_ratios[/url]

https://en.wikipedia.org/wiki/Interval_ratio

https://en.wikipedia.org/wiki/Consonance_and_dissonance
avatar
dtgreene: Of course, the question is, where could I find such a list? Also, how should I handle cases where the ratio is very close to a consonant ratio?
This is a good point, you'd need to build in a certain (adjustable?) margin of error if you're analysing anything other than digitally synthesised music. All "real" instruments (acoustic, electro-mechanical and of course analogue electronics) are out of tune by some amount simply because physics.
avatar
mk47at: The first question is: how much dissonance do you want? There is no perfect answer. Do you only consider unison, octave, fourth and fifth? Do you allow thirds? Sixths?
Also this, musical intervals are a spectrum rather than sitting in one of two columns. In the even tempered tuning, technically only unison and octaves are consonant as even the fifths and fourths are messed up (by 2 cents). :)
Post edited December 19, 2017 by SirPrimalform
So… did you give it a try?
avatar
mk47at: So… did you give it a try?
No, I have not.

Incidentally, there's some other problems related to audio processing that I've thought of that might be worth investigating: For example:

1. Convert a WAV file to a MIDI file, or alternatively, transcribe the WAV into conventional music notation. (These two problems are basically equivalent, as, to my understanding, MIDI is basically something like music notation for the computer.)

2. Given a WAV file, figure out what musical instrument it is a recording of.

3. Given a WAV file of an ensemble performance, extract one specific musical instrument from it and output a WAV of just that instrument's parts.

In any case, these do not appear to be easy problems. Has any research been done on this sort of thing?
There are wav-to-midid converters out there, but as you might expect, the results vary. I'd expect them to work best for highly tonal solo instruments.

Figuring out what instrument a recording is of should be quite easy. Each instrument has a rather distinctive signature. Sounds like a perfect application for machine learning; just collect a corpus of classified recordings of all the instruments you want recognized.

I'm not sure about the third problem, but there are tools that try to extract (or delete) voice tracks from files. If you already have tools to detect an instrument (problem 2?) then maybe the same tool can be used to train a tool to extract that particular instrument. Again, intuition says that highly tonal instruments ought to be easier.
Soundwave isn't dissonant. Everyone knows Soundwave is a Decepticon.
avatar
dtgreene: [...]
1. Convert a WAV file to a MIDI file, or alternatively, transcribe the WAV into conventional music notation. (These two problems are basically equivalent, as, to my understanding, MIDI is basically something like music notation for the computer.)

2. Given a WAV file, figure out what musical instrument it is a recording of.

3. Given a WAV file of an ensemble performance, extract one specific musical instrument from it and output a WAV of just that instrument's parts.

[...]
(1) is basically automated music transcription. There seems to be some progress into that.
First couple of things I came across after a very quick search:
https://pdfs.semanticscholar.org/96c1/d283995fa03452835082d88b04e5eb87c1dc.pdf
https://arxiv.org/pdf/1508.01774.pdf
http://ismir2012.ismir.net/event/papers/379_ISMIR_2012.pdf
http://c4dm.eecs.qmul.ac.uk/ismir15-amt-tutorial/AMT_tutorial_ISMIR_2015.pdf

Also some github project, and some paid piece of software:
https://github.com/jsleep/wav2mid
https://www.lunaverus.com/

(2) After a quick search, i found one paper, which also seems to delve into (3), and a classifier on github.
http://cs229.stanford.edu/proj2013/Park-MusicalInstrumentExtractionThroughTimbreClassfication.pdf
https://github.com/bzamecnik/ml/tree/master/instrument-classification

Please note that I didn't actually read any of this (yet), so it might not be the best selection of things.
I also didn't test any of the github projects so YMMV. It should, however, show that there definitely is research into this stuff.
It seems like a pretty interesting area, so if I've got time I might see if I can find some good/state-of-the-art-ish resources.
Post edited January 24, 2018 by Aemenyn
avatar
clarry: There are wav-to-midid converters out there, but as you might expect, the results vary. I'd expect them to work best for highly tonal solo instruments.

Figuring out what instrument a recording is of should be quite easy. Each instrument has a rather distinctive signature. Sounds like a perfect application for machine learning; just collect a corpus of classified recordings of all the instruments you want recognized.

I'm not sure about the third problem, but there are tools that try to extract (or delete) voice tracks from files. If you already have tools to detect an instrument (problem 2?) then maybe the same tool can be used to train a tool to extract that particular instrument. Again, intuition says that highly tonal instruments ought to be easier.
Of course, I just thought of one issue that might come up with the second and third problems: Some instruments sound very different at different parts of the range. One example is that, in the upper range, the oboe and clarinet sound pretty similar (to the point that that particular doubling works rather well); however, this is not true in the lower range.

One thing I've been wanting to do (and actually can do now that I have a microphone) is to take the clarinet's low E and use a computer to raise the pitch 3 octaves and see how it sounds; similarly, I could take the altissimo E and lower it 3 octaves. (I would do this by changing the speed, and hence the frequency of the sound wave, by a factor of 8). I am curious to see how that would sound, and I think I might actually do this in the next few days.

By the way, one example of a tricky case of the 3rd problem: Beethoven's 5th symphony starts with the strings playing in unison for the first 8 notes, but if you look at the score, the clarinets also have that part; however, it's in the quietest part of the ragne, making it rather difficult to hear. I am wondering how difficult it would be to extract that part from a recording of the first 8 notes.
avatar
dtgreene: Of course, I just thought of one issue that might come up with the second and third problems: Some instruments sound very different at different parts of the range. One example is that, in the upper range, the oboe and clarinet sound pretty similar (to the point that that particular doubling works rather well); however, this is not true in the lower range.
That's true. But I think dealing with that is simply a matter of including enough samples across the
range.

One thing I've been wanting to do (and actually can do now that I have a microphone) is to take the clarinet's low E and use a computer to raise the pitch 3 octaves and see how it sounds; similarly, I could take the altissimo E and lower it 3 octaves. (I would do this by changing the speed, and hence the frequency of the sound wave, by a factor of 8). I am curious to see how that would sound, and I think I might actually do this in the next few days.
If you do that, you should also try pitch shifting in the frequency domain and compare results. In my experience speedup sounds very artificial, nearly always. Shifting frequencies in frequency domain is more likely to give a natural sounding result, though that's not always the case.

Either way, real instruments are complicated and I think that e.g. the harmonics of a stringed instrument are usually less powerful at high pitches.
avatar
clarry: Either way, real instruments are complicated and I think that e.g. the harmonics of a stringed instrument are usually less powerful at high pitches.
I suspect the reverse is true for upper woodwinds. For example, the high notes on the clarinet sound really shrill and piercing, and the same seems to be true of the piccolo and flute. (On the other hand, the bassoon doesn't have the same behavior.)

Might be interesting to study this a bit more.
avatar
dtgreene: (…) (These two problems are basically equivalent, as, to my understanding, MIDI is basically something like music notation for the computer.) (…)
Even this is not that easy. Music notation to midi is trivial, but the reverse is a quantization problem. Midi consists (among other commands) of timed "Note On" and "Note Off" messages that have to be turned into notes. And there is a "Pitch Bend Change" as well.