Skip to content

Add spectrum analyzer overlay #23

@GeneralD

Description

@GeneralD

type priority complexity estimate AI

Real-time frequency spectrum visualizer rendered as an overlay bar graph, driven by the currently-playing application's audio output via a CoreAudio per-process tap.


Audio Source Options

Source macOS floor TCC prompt Background daemon Notes
CoreAudio per-process tap (CATapDescription + AudioHardwareCreateProcessTap) 14.2 (SDK); 14.4 conservative (insidegui/AudioCap) kTCCServiceSystemAudioCapture — yes, one-time LaunchAgent (gui/$UID) CAN present prompt; LaunchDaemon (session 0) cannot Recommended — captures only the target app; no system-wide bleed
AVAudioEngine tap on default output 10.15+ kTCCServiceMicrophone (misleading) Same LaunchAgent caveat Captures all system audio; no per-app scoping
ScreenCaptureKit audio 12.3+ kTCCServiceScreenCapture Same Heavier API, intended for screen recording

Recommendation: CoreAudio per-process tap. Scope-correct, lightest API, no video overhead. Min-OS impact: hosts on macOS 14.0–14.1 get enabled = false no-op; implementer must choose between 14.2 (SDK floor) and 14.4 (conservative empirical floor) for the @available gate.


Prerequisites and Blockers

BLOCKER — Info.plist / codesign gap

lyra is ad-hoc signed with Info.plist not embedded in the binary. TCC grants kTCCServiceSystemAudioCapture to the executable path, but also requires NSAudioCaptureUsageDescription in a bound Info.plist. Without an embedded bundle identity, TCC denies the tap silently and every Homebrew reinstall resets the grant path.

Required before any tap code ships:

  • Add Info.plist with CFBundleIdentifier + NSAudioCaptureUsageDescription to the CLI target in Package.swift
  • Add a Makefile post-install codesign step that re-signs with the embedded plist: codesign --force --sign - --entitlements ... --identifier com.generald.lyra /usr/local/bin/lyra

Per-process tap correctness gaps

Three tap correctness issues require explicit design decisions before implementation:

  1. No active audio — kAudioHardwareBadObjectError: AudioHardwareCreateProcessTap fails when the target process has no active audio output (e.g., music paused). Tap creation must gate on playbackRate > 0; on pause, destroy the tap and post zeroed bins.

  2. Browser PID over-broad capture: MRMediaRemoteGetNowPlayingApplicationPID returns the browser PID for YouTube Music / Apple Music web. A process tap on a browser captures all browser audio, not just the music tab. Mitigation options: (a) allowlist known native apps and skip tap for browser PIDs, (b) fall back to system-wide tap when browser detected, (c) document as known limitation.

  3. App-switch lifecycle: CATapDescription binds a fixed AudioObjectID. When the now-playing app changes (kMRMediaRemoteNowPlayingApplicationDidChangeNotification), the old tap must be destroyed and a new one created for the new PID.

PID pipeline gap

NowPlayingInfo (Entity) has no pid: Int? field. media-remote-helper.swift does not call MRMediaRemoteGetNowPlayingApplicationPID. Adding PID propagation spans five modules:

  • media-remote-helper.swift — add MRMediaRemoteGetNowPlayingApplicationPID call, emit "pid" in JSON
  • Entity/NowPlayingInfo.swift — add pid: Int?
  • NowPlayingRepository — pass through from DataSource
  • MediaRemoteDataSource — decode from JSON
  • TrackInteractor / WallpaperInteractor — surface to Presenters via TrackUpdate if needed

This prerequisite work (~120 lines) should be sequenced before the tap DataSource.


VIPER Component Plan

New files

Layer File Responsibility
Entity Sources/Entity/Config/SpectrumConfig.swift SpectrumConfig (all-optional Codable, FlexibleDouble, ColorStyle) + SpectrumStyle (all non-optional)
Domain Sources/Domain/DataSource/AudioTapDataSource.swift AudioTapDataSource protocol + TestDependencyKey
Domain Sources/Domain/Interactor/SpectrumInteractor.swift SpectrumInteractor protocol + TestDependencyKey
DataSource Sources/AudioTapDataSource/AudioTapDataSourceImpl.swift CoreAudio CATapDescription + AudioHardwareCreateProcessTap; IOProc → lock-free ring buffer; @available(macOS 14.2, *) gated
Interactor Sources/SpectrumInteractor/SpectrumInteractorImpl.swift Reads ring buffer on display-link tick, runs vDSP.FFT, publishes [Float] bin array
Presenter Sources/Presenters/Spectrum/SpectrumPresenter.swift @MainActor ObservableObject; @Published var bins: [Float]; @Published var isAnimating: Bool; isEnabled computed from config
View Sources/Views/Spectrum/SpectrumView.swift TimelineView(.animation(paused: !presenter.isAnimating)) + Canvas; bar graph rendering
DI Sources/DependencyInjection/DataSourceRegistration+Spectrum.swift liveValue = AudioTapDataSourceImpl()
DI Sources/DependencyInjection/InteractorRegistration+Spectrum.swift liveValue = SpectrumInteractorImpl()

Modified files

File Change
Sources/Entity/Config/AppConfig.swift Add let spectrum: SpectrumConfig (non-optional, like ripple)
Sources/Entity/Style/AppStyle.swift Add let spectrum: SpectrumStyle (non-optional)
Sources/ConfigRepository/ConfigRepositoryImpl.swift Map SpectrumConfig → SpectrumStyle
Sources/Views/Overlay/OverlayContentView.swift Add SpectrumView conditionally: if spectrumPresenter.isEnabled (NOT .opacity(0))
Sources/AppRouter/AppRouter.swift Add private var spectrumPresenter: SpectrumPresenter?; wire into windowFactory closure; call spectrumPresenter?.tick() in onFrame
Package.swift Add AudioTapDataSource and SpectrumInteractor targets + test targets

OverlayContentView ZStack layer order after change

ZStack {
    WallpaperPlayerView(presenter: wallpaperPresenter)   // layer 0 (bottom)
    if spectrumPresenter.isEnabled {
        SpectrumView(presenter: spectrumPresenter)       // layer 1 — position controlled by SpectrumStyle.placement
    }
    RippleView(presenter: ripplePresenter)               // layer 2
    VStack { HeaderView(...); LyricsColumnView(...) }    // layer 3
    WallpaperLoadingOverlay(presenter: wallpaperPresenter) // layer 4 (top)
}

SpectrumStyle.placement (.underlay / .bottom / .top) adjusts z-position relative to lyrics via offset or ZStack reordering — exact mechanism is an open question.


Config Schema

TOML ([spectrum])

[spectrum]
enabled = true
bar_count = 64
bar_color = ["#1E3A5F", "#4A9EFF"]   # ColorStyle — solid string or gradient array
background_color = "#00000066"        # optional, solid only
bar_width_ratio = 0.7                 # bar width / (bar + gap), FlexibleDouble 0–1
min_db = -80.0                        # FlexibleDouble
max_db = 0.0                          # FlexibleDouble
decay_rate = 0.85                     # FlexibleDouble, per-frame exponential decay
fft_size = 1024                       # FlexibleDouble (decoded as Int at use site)
placement = "bottom"                  # "bottom" | "top" | "underlay"
height_ratio = 0.25                   # fraction of overlay height, FlexibleDouble

Swift Entity shape

// Entity/Config/SpectrumConfig.swift
public struct SpectrumConfig: Codable, Sendable {
    public let enabled: Bool
    public let barCount: FlexibleDouble
    public let barColor: ColorStyle
    public let backgroundColor: String?
    public let barWidthRatio: FlexibleDouble
    public let minDb: FlexibleDouble
    public let maxDb: FlexibleDouble
    public let decayRate: FlexibleDouble
    public let fftSize: FlexibleDouble
    public let placement: SpectrumPlacement
    public let heightRatio: FlexibleDouble

    public static let defaults = SpectrumConfig(
        enabled: false,
        barCount: 64,
        barColor: .gradient(["#1E3A5F", "#4A9EFF"]),
        backgroundColor: nil,
        barWidthRatio: 0.7,
        minDb: -80,
        maxDb: 0,
        decayRate: 0.85,
        fftSize: 1024,
        placement: .bottom,
        heightRatio: 0.25
    )
}

public enum SpectrumPlacement: String, Codable, Sendable {
    case bottom, top, underlay
}

// Entity/Style/SpectrumStyle.swift  (all non-optional, resolved from SpectrumConfig)
public struct SpectrumStyle: Sendable {
    public let enabled: Bool
    public let barCount: Int
    public let barColor: ColorStyle
    public let backgroundColor: String?
    public let barWidthRatio: Double
    public let minDb: Double
    public let maxDb: Double
    public let decayRate: Double
    public let fftSize: Int
    public let placement: SpectrumPlacement
    public let heightRatio: Double
}

Rendering & Performance Approach

  • SwiftUI Canvas inside TimelineView — same pattern as RippleView. Zero new SPM dependencies, sufficient for 64 bars at 60 fps.
  • vDSP FFT (vDSP.FFT / vDSP_fft_zrip): ~15–25 µs per 1024-sample window on Apple Silicon. No new dependency; Accelerate is already available.
  • IOProc constraint: The CoreAudio render callback runs on a real-time audio thread. It must not allocate heap, call Swift async, or acquire locks. Use a lock-free ring buffer (e.g., C TPCircularBuffer bridged via a thin header, or Swift Atomics ManagedAtomic read/write indices on a fixed backing array) to hand off PCM frames to the interactor.
  • Idle suspension: When playbackRate == 0 or enabled == false, the tap is destroyed (not muted) and SpectrumPresenter.isAnimating = false pauses TimelineView — zero GPU work. OverlayContentView conditionally includes SpectrumView (not .opacity(0)) to avoid idle Canvas redraws per the bug: パフォーマンス低下と電力消費・発熱・ファン回転の悪化 #252 pattern.
  • Bin count: 64 bars map to FFT output linearly for simplicity; logarithmic mel-scale mapping is a future enhancement.
  • Exponential decay: Each frame applies bins[i] *= decayRate before writing new FFT magnitudes, giving a smooth falloff without storing history.

Open Questions

  1. @available floor — 14.2 or 14.4? SDK annotates AudioHardwareCreateProcessTap as API_AVAILABLE(macos(14.2)); insidegui/AudioCap reports reliable behaviour only from 14.4. Which floor does this project target for the live implementation?

  2. Info.plist ownership: Should Info.plist be added to the CLI Swift Package target (via Package.swift infoPlist: or a Resources bundle), or is it an install-time responsibility managed entirely by the Makefile/formula? Affects signing strategy.

  3. Browser-PID mitigation: For YouTube Music / Apple Music web users (browser PID scenario), which strategy: (a) skip tap + show zeroed bars, (b) fall back to system-wide tap, or (c) document as unsupported?

  4. Lock-free ring buffer strategy: Prefer (a) bridged C TPCircularBuffer (proven in audio apps, adds a C file), (b) Swift Atomics SPM dependency + fixed-size array, or (c) nonisolated(unsafe) + os_unfair_lock (simpler, technically blocks but contention is near-zero)?

  5. PID pipeline sequencing: Should the NowPlayingInfo.pid prerequisite ship in a separate PR first, or be bundled with the initial spectrum DataSource PR?


Scope Estimate

Area Lines
PID pipeline prerequisite (5 modules) ~120
AudioTapDataSource (IOProc + ring buffer + @available stub) ~200
SpectrumInteractor (vDSP FFT + bin decay) ~150
SpectrumPresenter ~100
SpectrumView (Canvas bar graph) ~80
Entity (SpectrumConfig, SpectrumStyle, SpectrumPlacement) ~100
DI wiring + AppRouter + OverlayContentView edits ~80
Package.swift + AppConfig/AppStyle edits ~45
Tests ~150
Total ~1,025

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is needed

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions