Million song api

Map the powers of the spectrum obtained above onto the mel scale,.Take the Fourier transform of (a windowed excerpt of) a signal.Having a big data set isn't enough, in oppose to image tasks I cannot work straight on the raw sound sample, a quick calculation: 30 seconds × 22050 sample/sec- ond = 661500 length of vector, which would be heavy load for a convention machine learning method.įollowing all the papers I read and researching a little on acoustic analysis, It is quit obvious that the industry is using Mel-frequency cepstral coefficients (MFCC) as the feature vector for the sound sample, I used librosa implementation.

Unfortunately I had only my laptop for this mission, so I had to settle with only 1% of the dataset (around 2.8GB). This is more than enough for my deep learning task, So I wrote “previewDownloader.py” that downloads for every song in the MSD dataset a 30 sec preview. Still any song stream costs money, But I found out that they are enabling to preview random 30 seconds of a song to the user before paying for them. I signed up to 7Digital as a developer and after their approval i could access their API. 7Digital is a SaaS provider for music application, it basically let you stream music for money.

I found that one of the tags every song have in the dataset is an id from a provider called 7Digital. The problem now was to get the sound itself, here is where I got a little creative. There is a project on top of MSD called tagtraum which classify MSD songs into genres. It is a freely-available collection of audio features and metadata for a million contemporary popular music tracks. Eventually I found MSD dataset (Million Song Dataset). So, I looked up for generating more data to learn from. This is from my perspective one of the reasons that held him from getting better results. This dataset is quit small (100 songs per genre X 10 genres = overall 1,000 songs), and the copyright permission is questionable. Tao’s paper based on a dataset called GTZAN. But the biggest pain is copyrighting, there are no legit famous songs dataset as they would cost money. Working with music is a big pain, every file is usually a couple of MBs, there are variety of qualities and parameters of recording (Number of frequencies, Bits per second, etc…). Getting the dataset might be the most time consuming part of this work. This project is implemented in Python and the Machine Learning part is using TensorFlow. A sub task for this project was to learn a new SDK for deep learning, I have been waiting for an opportunity to learn Google’s new TensorFlow. My work classify the whole 10 classes challenge, a much more difficult task. Obviously he got really good result for 2 classes classification, but the more classes he tried to classify the poorer the result he got. One very important note is that Tao’s work published result only for 2,3 and 4 classes classification. His simple yet very efficient network made me think that Tao’s RBM was not the best approach and there for my implementation included a CNN instead like in the Spotify blog. Spotify recruited a deep learning intern that based on the above work implemented a music recommendation engine. Also, this paper was mentioned lately on “Spotify” blog. The way they got the dataset, and the preprocessing they had done to the sound had really enlightened my implementation. A very influential paper was Deep content-based music recommendation This paper is about content-base music recommendation using deep learning techniques. So I had to look on other, related but not exact papers. I did learned a lot from this paper, but honestly, they results the paper presented were not impressive.

One paper that did tackle this classification problem is Tao Feng’s paper from the university of Illinois. To my surprise I did not found too many works in deep learning that tackled this exact problem. When I decided to work on the field of sound processing I thought that genre classification is a parallel problem to the image classification. This paper discuss the task of classifying the music genre of a sound sample. Academic research in the field of Deep Learning (Deep Neural Networks) and Sound Processing, Tel Aviv University.