A Python implementation of a Shazam-like song recognition system based on spectrogram peak constellation maps and audio fingerprint hashing.
The system identifies a song from a short audio clip by matching time-aligned spectral fingerprints against a database.
This project demonstrates how audio fingerprinting works at a low level using:
- Digital Signal Processing (DSP)
- Spectrogram peak detection
- Hash-based matching with temporal alignment
Given:
- A database of songs
- A short query clip (e.g., 10 seconds)
The system returns:
- The most likely matching song
- A confidence score based on aligned hash matches
- Convert audio to mono and normalize
- Compute spectrogram (STFT)
- Detect prominent spectral peaks
- Create a constellation map
- Generate compact fingerprints (hashes)
- Match query fingerprints against database
- Vote using time-offset alignment
This approach is inspired by Shazam’s audio fingerprinting technique.
Contains all DSP and fingerprinting logic.
Key Functions
-
read_wav_file()
Reads WAV audio, converts stereo to mono, and normalizes amplitude. -
compute_spectrogram()
Computes magnitude spectrogram using FFT and Hann windowing. -
detect_spectrogram_peaks()
Detects local maxima using neighborhood filtering and thresholding. -
generate_fingerprints()
Generates compact hashes from anchor–target peak pairs using frequency and time deltas.
Implements database creation and song matching.
Database Structure
{
"Song_name": "...",
"Fingerprint": [(hash, time_offset), ...]
}