Hum to search!
Ever found yourself trying to find a song you've heard before, but all that comes to you is a vague melody, without any other useful memory to identify it? You might have asked around and tried humming it to your friends, but even them, other fellow human beings, couldn't figure it out, and not always because you don't have a musical ear. It is actually quite a difficult task.
That's where AI could hypothetically come to the rescue. Long before good datasets in this regard appeared online, the scientists of Google secretly got it working a few years ago, and their novel deep neural network based tool was publicly made available and integrated on every Android device in 2018. [1]
Public datasets
The state-of-the-art public dataset for such a task was made available a few weeks ago, and it's a very big jump from what we had previously. [2]
The HumTrans dataset [3] consists of 500 musical compositions of different genres and languages, with each composition divided into multiple segments. In total, the dataset comprises 1000 music segments. To collect this humming dataset, they employed 10 college students, who were offered known musical segments. The dataset encompasses approximately 56.22 hours of audio, making it the largest known humming dataset to date.
The dataset was released on Hugging Face [4], along with a GitHub repository containing baseline results and evaluation codes. [5]
How does it work?
When a melody is hummed, machine learning models transform the audio into a number-based sequence representing the song’s melody. The models are trained to identify songs based on a variety of sources, including humans singing, whistling or humming, as well as studio recordings. The algorithms also take away all the other details, like accompanying instruments and the voice's timbre and tone.
The sequences are then compared to thousands of songs from the web and identify potential matches in real time.
Basically, it is a classification task, but with as many classes as the number of available songs online, and those are... well, a lot. Each class contains different versions of the same song, be it a live recording, a studio recording, a cover, a remix, and so on.
What's next?
It won't be long, and YouTube will follow with a similar feature, most likely way better performing than anything we've seen before. Tests are currently ongoing, and the feature is already available to a few select individuals. [6]
Until then, we're left to experiment ourselves with what we've already got, and marvel over how accurate the existing solutions already are, for example what Google provided on all Android phones, including yours. If you've never tried it, the technology was hidden just right under your nose, waiting to be used, tested, and made better, even if we intend to just play around with it.
Niciun comentariu:
Trimiteți un comentariu