Skip to content

VHID daemon: ProcessType=Interactive to prevent load-induced disconnects (MAL-57 Layer 1)#882

Merged
malpern merged 1 commit into
masterfrom
claude/elastic-borg-a4bf3e
Jun 11, 2026
Merged

VHID daemon: ProcessType=Interactive to prevent load-induced disconnects (MAL-57 Layer 1)#882
malpern merged 1 commit into
masterfrom
claude/elastic-borg-a4bf3e

Conversation

@malpern

@malpern malpern commented Jun 11, 2026

Copy link
Copy Markdown
Owner

Summary

Two live stuck-key incidents on 2026-06-10 (multi-second autorepeat bursts while typing) were traced end to end: under heavy CPU load the Karabiner VHID daemon misses the pqrs client's 3s heartbeat, kanata's output connection dies mid-keystroke, in-flight key Release events are lost, and macOS autorepeats the stuck key until the daemon tears down the old client (~3s). Full evidence and causal chain are documented in docs/bugs/MAL-57-duplicate-keypresses.md (new "2026-06-10 Incident Evidence" section).

Root-cause asymmetry: the shipped SMAppService kanata plist already sets ProcessType=Interactive, but the KeyPath-installed VHID daemon plist set no ProcessType — so launchd could deprioritize exactly the process that must answer heartbeats and write key reports.

Changes

  • Add ProcessType=Interactive to the VHID daemon plist in both generator copies (PlistGenerator.generateVHIDDaemonPlist() + the privileged helper's duplicate), the legacy kanata plist generators (consistency with the shipped SMAppService plist), and the debug script template
  • isVHIDDaemonConfiguredCorrectly() now also requires the key, so a pre-fix plist reports misconfigured and repair rewrites it
  • Tests: generated plist asserts the key (parsed, not string-matched); health check covers the missing-ProcessType case
  • MAL-57 bug doc: 2026-06-10 incident evidence, confirmed causal chain, and the remaining fix layers (kanata-fork work) for future research

Test plan

  • ./Scripts/test-full.sh green on the final code
  • PlistGeneratorTests (27 tests) and ServiceHealthCheckerTests green, including new testGenerateVHIDDaemonPlistUsesInteractiveProcessType and testIsVHIDDaemonConfiguredCorrectlyReturnsFalseWithoutProcessType
  • Verification metric: per-day counts of output backend unavailable during write / connected reconnects in /var/log/com.keypath.kanata.stdout.log before vs after, under similar build load

@claude

claude Bot commented Jun 11, 2026

Copy link
Copy Markdown

Review: VHID daemon ProcessType=Interactive (MAL-57 Layer 1)

The root-cause analysis is thorough and the fix is well-targeted. A few observations:


What's done well

  • Both plist generator copies (PlistGenerator.swift and HelperService.swift) are updated — important since they can diverge independently.
  • The new test parses the plist with PropertyListSerialization rather than string-matching, so it validates the key survives a real plist round-trip.
  • The writeVHIDPlist helper gets a clean default (processType: "Interactive"), which automatically upgrades testIsVHIDDaemonConfiguredCorrectlyReadsProgramArguments and testIsVHIDDaemonConfiguredCorrectlyReturnsFalseForWrongPath to also cover the new requirement.
  • The VHIDManager plist deliberately omits ProcessType=Interactive — correct, since it's a one-shot KeepAlive=false activator, not the heartbeat-answering daemon.

One meaningful gap (acknowledged, but worth flagging explicitly)

isVHIDDaemonConfiguredCorrectly() is called only in ServiceBootstrapper.repairVHIDDaemonServices() (the repair postflight). It is not wired into SystemInspector.appendComponentIssues() or getServiceStatus(), so the wizard will never proactively surface a .vhidDaemonMisconfigured issue for existing users with old plists.

The effect: users who installed before this PR keep the stuck-key bug unless they happen to trigger a VHID repair for a different reason. The PR doc notes this as a follow-up; I'd suggest tracking it as a distinct issue. Connecting isVHIDDaemonConfiguredCorrectly() into SystemInspector + SystemContext.components so the wizard can emit the issue and route to repair would close the proactive-fix gap.


Minor: verbose doc comments

CLAUDE.md style calls for a single short WHY comment when the constraint is non-obvious. The doc block added to generateVHIDDaemonPlist() is four lines explaining the causal chain. The bug doc (MAL-57-duplicate-keypresses.md) already owns that explanation in full; the docstring here could point to it with a single line rather than restating the chain.

Same applies to the test doc comment (testGenerateVHIDDaemonPlistUsesInteractiveProcessType): two lines is borderline fine, but could simply be // MAL-57: Interactive priority prevents heartbeat-miss disconnects under CPU load.


Not a blocker

The duplicate plist generators (PlistGenerator vs HelperService private copies) is pre-existing tech debt, not introduced here. The PR correctly updates both. Worth a future consolidation when convenient, but not something this PR needs to fix.


Overall: Correct and safe to merge. The proactive-repair gap is worth a follow-up issue so it doesn't stay theoretical.

…cts (MAL-57 Layer 1)

Under heavy CPU load the Karabiner VHID daemon can be starved long enough
to miss the pqrs client's 3s heartbeat. Kanata then drops its output
connection mid-keystroke, losing key Release events and leaving the
virtual keyboard's key stuck down — macOS autorepeats it for ~3 seconds
(multi-letter bursts like 'togggggg').

The shipped SMAppService kanata plist already sets ProcessType=Interactive;
the KeyPath-installed VHID daemon plist set no ProcessType at all, so the
process that must answer heartbeats and write key reports was the one
launchd could deprioritize. Changes:

- Add ProcessType=Interactive to the VHID daemon plist in both generator
  copies (PlistGenerator + privileged helper), the legacy kanata plist
  generators (for consistency with the shipped SMAppService plist), and
  the debug script template.
- isVHIDDaemonConfiguredCorrectly() now also requires the key, so a
  pre-fix plist reports misconfigured and repair rewrites it.
- Tests: plist generation asserts the key; health check covers the
  missing-ProcessType case.
- Document the 2026-06-10 live incident evidence in the MAL-57 bug doc:
  full causal chain across kanata, the VHID daemon, and DriverKit, plus
  the remaining fix layers for future work.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@malpern malpern force-pushed the claude/elastic-borg-a4bf3e branch from 31b2847 to 46a5941 Compare June 11, 2026 06:40
@malpern malpern enabled auto-merge June 11, 2026 06:40
@malpern malpern merged commit dc29187 into master Jun 11, 2026
5 checks passed
@malpern malpern deleted the claude/elastic-borg-a4bf3e branch June 11, 2026 06:42
@claude

claude Bot commented Jun 11, 2026

Copy link
Copy Markdown

Code Review

PR: VHID daemon: ProcessType=Interactive to prevent load-induced disconnects (MAL-57 Layer 1)

Overview

A clean, targeted fix for a real production bug. The causal chain (CPU load → heartbeat miss → connection drop → stuck Release → autorepeat burst) is now documented with strong empirical evidence and fixed at the right layer. The code change is minimal and applied consistently.


Strengths

  • Applied in all the right places — all four plist templates (two in PlistGenerator.swift, two in HelperService.swift) and the debug script template are updated consistently. The PR doesn't miss a copy.
  • Tests use proper PropertyListSerialization parsing rather than string-matching the raw XML. This is the right approach and is consistent with testGenerateVHIDDaemonPlistValidXML.
  • writeVHIDPlist helper default upgraded silently — existing tests like testIsVHIDDaemonConfiguredCorrectlyReturnsTrue and testIsVHIDDaemonConfiguredCorrectlyReturnsFalseWithWrongPath now also exercise the ProcessType check without any changes needed to those tests. That's a nice side effect.
  • Incident documentation is well-structured with precise timestamps, cross-referenced log sources, and clearly separated fix layers for future work.

Issues

1. Missing test: wrong ProcessType value (minor)

testIsVHIDDaemonConfiguredCorrectlyReturnsFalseWithoutProcessType covers the absent-key case. A plist with ProcessType=Background (or Adaptive) would be equally broken and is not tested. Suggest:

func testIsVHIDDaemonConfiguredCorrectlyReturnsFalseWithWrongProcessType() throws {
    let expectedPath = "...Karabiner-VirtualHIDDevice-Daemon"
    try writeVHIDPlist(programPath: expectedPath, processType: "Background")
    XCTAssertFalse(checker.isVHIDDaemonConfiguredCorrectly())
}

2. getServiceStatus() does not surface stale plists proactively (acknowledged, but worth tracking)

The PR note says:

wiring plist-content validation into getServiceStatus() so the wizard proactively flags stale plists on old installs is a follow-up.

This is the right call for scope control, but it means a user on an old install will have the wrong plist until they run repair. The bug is load-triggered and intermittent, so affected users may not file a ticket. Worth opening a follow-up issue if one does not exist.

3. isVHIDDaemonConfiguredCorrectly() is only called in the repair postflight

From ServiceBootstrapper.swift:898, this validates after repair rather than driving proactive detection. Until the getServiceStatus() gap is closed, a stale install is only repaired if the user runs the repair flow. This is consistent with the note above — just confirming the scope.


Nit

docs/bugs/MAL-57-duplicate-keypresses.md (pre-existing, not introduced here) has a local absolute path in an existing link at the top of the file:

/Users/malpern/local-code/KeyPath/.worktrees/duplicate-key-investigation/...

Since you're editing this file, worth fixing the link to a repo-relative path while you're here.


Summary

The core fix is correct and consistently applied. Tests use the right parsing approach. The only actionable item before merge is the missing wrong-value ProcessType test case (or a conscious decision to defer it). The getServiceStatus() gap is a legitimate follow-up, not a blocker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant