Z-ENG Music Track Identification (Shazam)
The aim of the project is the analysis of traditional broadcast content publicly available in Hungary. Broadcast means television and radio broadcasting. The analysis is carried out in several modalities, such as audio processing, video processing, language processing.
In the case of audio processing, the first task is audio segmentation. This means that a longer (e.g. 24-hour) radio program must be segmentized according to the type of content (conversation, advertisement, spot, music). Segmentation can be solved, for example, with the help of classification algorithms. In this case, a good starting point is the spectrogram of the audio content, which is processed with Conv2D. The classification algorithm can be used to perform the segmentation.
The next task is to identify the musical content. This can be done with the help of a Shazam-like algorithm, for example. Here, we start with the spectrogram, and after determining the peak intensities, we form a triplet-based fingerprint. The contents are identified using fingerprints. It means that information retrieval is also involved.
The student performs the analysis and possible further development of the algorithms in the field.