Hum to search!

Ever found yourself trying to find a song you've heard before, but all that comes to you is a vague melody, without any other useful memory to identify it? You might have asked around and tried humming it to your friends, but even them, other fellow human beings, couldn't figure it out, and not always because you don't have a musical ear. It is actually quite a difficult task.

That's where AI could hypothetically come to the rescue. Long before good datasets in this regard appeared online, the scientists of Google secretly got it working a few years ago, and their novel deep neural network based tool was publicly made available and integrated on every Android device in 2018. [1]

Public datasets

The state-of-the-art public dataset for such a task was made available a few weeks ago, and it's a very big jump from what we had previously. [2]

The HumTrans dataset [3] consists of 500 musical compositions of different genres and languages, with each composition divided into multiple segments. In total, the dataset comprises 1000 music segments. To collect this humming dataset, they employed 10 college students, who were offered known musical segments. The dataset encompasses approximately 56.22 hours of audio, making it the largest known humming dataset to date.

The dataset was released on Hugging Face [4], along with a GitHub repository containing baseline results and evaluation codes. [5]

How does it work?

When a melody is hummed, machine learning models transform the audio into a number-based sequence representing the song’s melody. The models are trained to identify songs based on a variety of sources, including humans singing, whistling or humming, as well as studio recordings. The algorithms also take away all the other details, like accompanying instruments and the voice's timbre and tone.

The sequences are then compared to thousands of songs from the web and identify potential matches in real time.

Basically, it is a classification task, but with as many classes as the number of available songs online, and those are... well, a lot. Each class contains different versions of the same song, be it a live recording, a studio recording, a cover, a remix, and so on.

What's next?

It won't be long, and YouTube will follow with a similar feature, most likely way better performing than anything we've seen before. Tests are currently ongoing, and the feature is already available to a few select individuals. [6]

Until then, we're left to experiment ourselves with what we've already got, and marvel over how accurate the existing solutions already are, for example what Google provided on all Android phones, including yours. If you've never tried it, the technology was hidden just right under your nose, waiting to be used, tested, and made better, even if we intend to just play around with it.

Sources

[1] https://blog.google/products/search/hum-to-search/

[2] https://cs.paperswithcode.com/paper/humtrans-a-novel-open-source-dataset-for

[3] https://arxiv.org/abs/2309.09623v2

[4] https://huggingface.co/datasets/dadinghh2/HumTrans

[5] https://github.com/shansongliu/humtrans

[6] https://techcrunch.com/2023/08/23/youtube-tests-search-feature-users-hum-sing-identify-songs/

Solving the Space Titanic Kaggle Competition

First steps

As a first step, I had to look over some articles about data science and how to process the data. Learned some new concepts about models and how to train them. I was also studying various data preprocessing techniques, exploring feature engineering strategies, and understanding the types of different machine learning algorithms.

Understanding the Dataset

The dataset, comprising personal records of passengers, is split into two parts: the training data (found in 'train.csv') consisting of approximately two-thirds (around 8700) of the passengers and the test data (in 'test.csv') with the remaining one-third (approximately 4300) of the passengers. Each entry in the dataset is associated with a unique PassengerId, indicating the group the passenger is traveling with and their position within the group. Key attributes, such as HomePlanet, CryoSleep, Cabin, Destination, Age, VIP status, and various onboard amenities' billing details, provide essential insights into each passenger's profile.

Dataset Attributes

PassengerId: Unique identifier for each passenger, denoting the passenger's group and position within the group.

HomePlanet: Denotes the planet of permanent residence for each passenger, reflecting their point of departure.

CryoSleep: Indicates whether the passenger chose to undergo suspended animation during the voyage, confining them to their cabins.

Cabin: Specifies the cabin number where the passenger is lodged, delineated by deck/number/side, where the side can be 'P' for Port or 'S' for Starboard.

Destination: Specifies the planet where the passenger is scheduled to disembark.

Age: Reflects the age of each passenger.

VIP: Indicates whether the passenger availed special VIP services during the journey.

RoomService, FoodCourt, ShoppingMall, Spa, VRDeck: Reflect the respective billing amounts for each passenger at various luxury amenities on the Spaceship Titanic.

Name: Records the first and last names of each passenger.

Transported: Represents the critical target attribute, denoting whether the passenger was transported to another dimension during the collision.

Data Preprocessing and Feature Engineering

The program begins by reading the data from the 'train.csv' file using the Pandas library. It then proceeds to analyze the data types, identifying categorical, numerical, and boolean attributes. Furthermore, it assesses the presence of null values in the dataset, and strategically handles them by imputing the missing values using medians. Additionally, the program converts categorical attributes to numerical ones. For instance, the 'Cabin' column is utilized to create the 'Deck' and 'Port' features. The 'Deck' and 'Port' attributes are mapped to numeric values based on specific mappings, and the 'Cabin' column is removed from the dataset. Similar transformations are applied to other categorical attributes like 'HomePlanet', 'Destination', 'VIP', and 'CryoSleep' in order for them to be used in the models selected.

Model Training and Evaluation

The processed data is then split into training and testing sets using the 'train_test_split' function from the 'sklearn.model_selection' module. Subsequently, I implemented various classification models and evaluated them, including Logistic Regression, Random Forest Classifier, Gradient Boosting Classifier, Support Vector Machine (SVM), Gaussian Naive Bayes, K-Nearest Neighbors, and a Multi-layer Perceptron Neural Network. The accuracy of each model is computed using the 'score' method. This was the last step I was able to do in these 3 weeks. After that, I tried using GridSearchCV in order to tune the parameters for the Random Forest Classifier. In the end, I ended up with a prediction score of 78.78%, however there's still space for improvement.

Future Work

In the future I will like to try and implement a multi-layer neural network to see how it would perform.

Materials that helped me get here

(16) A Practical Introduction to Data Science - Mark West - YouTube

Titanic Tutorial | Kaggle

Courses from this class (Machine Learning)

A Reference Guide to Feature Engineering Methods | Kaggle

Invatare Automata 2023

vineri, 27 octombrie 2023

Hum to search - music recognition