![]() ![]() Or, I could also expand the genres to include things like rock, blues, hip-hop, etc. I think, given more time, I’d be able to increase the accuracy to over 96%. I was pretty surprised with this result, because it was the first score I got, and it was on a dataset with broken engineered features. Well, it turns out there’s a miraculous Python library for MIDI called music21. ![]() I quickly ran into the problem, though, of how I was going to parse these songs and pull out the useful information, like key signature, tempo, song length, number of instruments, etc. I decided my DataFrame would consist of a row for each song and columns with features of the songs. ![]() Since I ended up with a lot of quality classical, jazz, ragtime, and folk MIDI files, I created a folder for each genre and filled them with the appropriate MIDI files. I ended up cleaning up the data I had by writing several batch (.bat) files to pull the songs out of thousands of folders. I also downloaded some MIDI file collections from Kaggle, Google Datasets, and various MIDI websites, but most of these either had few songs, were of low quality, or both. I found a torrent of 130,000 MIDI files that formed the basis of my dataset. Data collection was slightly insane, because I wanted massive amounts of MIDI files, but they had to be labeled by genre, and had to have some modicum of quality (MIDI files can be hit or miss). ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |