Whisper transcribes callsigns inconsistently depending on how the controller/pilot speaks and how the audio comes through — e.g. "United two five", "United 25", and "UAL25" can all refer to the same aircraft. This fragments analysis and weakens ADS-B correlation (which matches on callsign).
Goal
Add a normalization pass that canonicalizes a callsign into a consistent form (airline telephony + flight number) before the transcript batch is sent to Gemini and before ADS-B correlation.
Suggested approach
- Map airline telephony names → ICAO prefixes (e.g.
United → UAL, Speedbird → BAW, Cathay → CPA).
- Convert spelled-out and word-number digits to numeric (
two five → 25, niner → 9).
- Produce a canonical token (e.g.
UAL25) while preserving the raw transcript text for display.
- Be conservative: when confidence is low or no airline match is found, leave the raw text untouched rather than guessing.
Where to look
backend/core/batcher.py (batch assembly + AIRPORT_GEO / ADS-B correlation)
- The Gemini prompt assembly path
Acceptance
- A short unit test covering several spelled-out / numeric / ICAO variants resolving to the same canonical callsign.
- Raw transcript text remains visible on the observation card.
This maps directly to the "callsign matching isn't perfect yet" limitation in the README — one of the most impactful accuracy fixes available.
Whisper transcribes callsigns inconsistently depending on how the controller/pilot speaks and how the audio comes through — e.g. "United two five", "United 25", and "UAL25" can all refer to the same aircraft. This fragments analysis and weakens ADS-B correlation (which matches on callsign).
Goal
Add a normalization pass that canonicalizes a callsign into a consistent form (airline telephony + flight number) before the transcript batch is sent to Gemini and before ADS-B correlation.
Suggested approach
United→UAL,Speedbird→BAW,Cathay→CPA).two five→25,niner→9).UAL25) while preserving the raw transcript text for display.Where to look
backend/core/batcher.py(batch assembly +AIRPORT_GEO/ ADS-B correlation)Acceptance
This maps directly to the "callsign matching isn't perfect yet" limitation in the README — one of the most impactful accuracy fixes available.