diff --git a/docs/architecture/helper-causal-receipts-v0.1.md b/docs/architecture/helper-causal-receipts-v0.1.md new file mode 100644 index 0000000..cc3aa8a --- /dev/null +++ b/docs/architecture/helper-causal-receipts-v0.1.md @@ -0,0 +1,151 @@ +# Helper Causal Receipts v0.1 + +Status: Draft +Scope: SourceOS Shell, BearBrowser, TurtleTerm, Office/PDF runtime, preview/rendering helpers +Primary goal: preserve user-legible causal intent across helper-process boundaries. + +## Problem + +- Modern desktop actions are not single-process events. +- A visible action such as previewing a file, opening a native file picker, rendering a thumbnail, taking a screenshot, cleaning browser cache, or indexing metadata can spawn a hidden helper-process cascade. +- The macOS unified-log sample that motivated this spec shows repeated helper lifecycle patterns: demand spawn, `xpcproxy` materialization, source attachment, running/init transitions, sandbox-denied lookups, clean exits, supervisor kills, and teardown races. +- The useful primitive is not the raw process log. The useful primitive is a root-intent-bound receipt. +- SourceOS must not repeat the opaque pattern where users see helper churn but cannot answer: why did this run, what requested it, what data did it touch, what was denied, and did network/clipboard/account/analytics access occur? + +## Design Principles + +- Every helper process carries a `root_intent_id`. +- Every helper spawn has a declared purpose. +- Every sensitive capability request is recorded as a policy decision. +- Every denied capability records whether data was accessed. For a sandbox denial, the default is `data_accessed=false` unless an allow event proves otherwise. +- Every helper exit records completion, duration, and receipt completeness. +- Every teardown race is normalized before user presentation. +- Expected denials are evidence of containment, not automatic alerts. +- Unexpected denials are policy-regression candidates. +- Local preview helpers deny network, DNS, pasteboard, account lookup, analytics, camera, microphone, location, credential stores, and arbitrary file reads by default. +- Web thumbnailing is treated as hostile-content rendering, not as static image generation. +- Native file picker helpers must not inherit browser session authority. +- Terminal preview helpers must not inherit shell secrets. + +## Event Types + +| Event Type | Purpose | Required Fields | +|---|---|---| +| `root_intent.created` | Start a causal graph for a visible or scheduled action | `event_id`, `root_intent_id`, `timestamp`, `surface`, `actor`, `declared_purpose`, `data_scope`, `default_policy`, `receipt_required` | +| `helper.spawn` | Record subprocess/helper launch | `event_id`, `parent_event_id`, `root_intent_id`, `parent_process`, `child_process`, `trigger`, `spawn_reason`, `policy_profile` | +| `capability.request` | Record sensitive service/capability lookup | `event_id`, `root_intent_id`, `requestor`, `capability`, `requested_service`, `decision`, `classification`, `policy_rule`, `data_accessed` | +| `helper.exit` | Record termination and cleanup | `event_id`, `root_intent_id`, `process`, `exit_status`, `duration_ms`, `children_cleaned`, `unexpected_denials`, `network_used`, `receipt_complete` | +| `teardown.normalized` | Normalize noisy cleanup/race messages | `raw_message`, `normalized_class`, `severity`, `meaning`, `receipt_complete`, `policy_impact` | +| `policy.decision` | Record policy evaluation | `policy_profile`, `rule_id`, `capability`, `decision`, `reason`, `override_actor`, `override_expiry` | +| `data.touch` | Record data class touched | `object_type`, `object_hash`, `path_policy`, `access_mode`, `retention`, `derived_artifact` | + +## Policy Profiles + +### `preview.local_only.v1` + +- Applies to local PDF, image, office, and generic file previews. +- Allows selected file snapshot reads, thumbnail cache writes, CPU rendering, mediated GPU rendering, allowlisted system font reads, and IPC to the preview broker. +- Denies network, DNS, pasteboard, analytics, account lookup, contacts, calendar, camera, microphone, location, arbitrary file reads, and unrestricted child processes. + +### `preview.web_thumbnail.local_only.v1` + +- Applies to HTML thumbnails, web archive previews, local browser export previews, and URL snapshot rendering. +- Risk model: hostile-content rendering. +- Allows parsing and rendering the selected local snapshot plus mediated GPU rendering and local thumbnail output. +- Denies network, DNS, cookies, credentials, local/session storage, extension APIs, service workers, remote fonts, pasteboard, account lookup, analytics, camera, microphone, and location. +- Non-negotiable invariant: web thumbnail helpers never inherit browser session authority. + +### `cache_cleanup.local_only.v1` + +- Applies to browser/app cache cleanup and local cache size accounting. +- Allows cache metadata read, cache entry delete, cache size compute, and local policy report. +- Denies network, DNS, analytics, account lookup, remote sync, and browser session reads. +- Special rule: if a network-shaped helper is spawned, its receipt must state whether network authority was actually granted or whether only local cache metadata was inspected. + +### `file_picker.native_ui.v1` + +- Applies to native open/save panels and file picker preview surfaces. +- Allows selected file grants, UI rendering, and preview-broker IPC. +- Denies account lookup, cloud sync triggers, pasteboard access, analytics, browser extension invocation, cookie reads, and browser session reads. +- Special rule: native file picker helpers must not inherit browser session authority. + +### `terminal.preview.local_only.v1` + +- Applies to TurtleTerm file previews, hyperlink previews, archive listings, and command-output renderers. +- Allows selected file/path preview, local rendering, and local temporary artifacts. +- Denies shell environment reads, shell history reads, SSH key reads, token reads, network fetches, clipboard reads, account lookup, and analytics. +- Special rule: terminal preview helpers must never inherit shell secrets. + +## Denial Classification + +| Classification | Meaning | Default Severity | +|---|---|---| +| `expected_denial` | Policy intentionally blocked a commonly probed service | notice | +| `unexpected_denial` | Helper requested capability outside declared profile | warning | +| `compatibility_probe` | Framework probed optional service and did not require it | notice | +| `policy_regression` | New build began requesting undeclared capability | error in CI | +| `malicious_probe_candidate` | Request unrelated to purpose and targeting sensitive data | critical | +| `missing_service` | Target service absent or not running | notice | +| `teardown_race` | Request/reply path raced with helper shutdown | trace/notice | + +## Teardown Normalization + +| Raw Pattern | Normalized Class | Meaning | +|---|---|---| +| `no client port found` | `client_endpoint_missing_after_teardown` | Client disappeared before service reply completed | +| `invalid client reply port -1` | `invalid_reply_endpoint_after_teardown` | Reply endpoint invalid during cleanup | +| `job not found` / `ENOSERVICE` | `service_removed_before_reply` | Service exited before late lookup/reply completed | +| `Operation already in progress` | `duplicate_activation_coalesced` | Duplicate demand while helper already running | +| supervisor `SIGKILL` | `supervisor_worker_lifecycle_kill` | Supervisor ended bounded worker | + +## User Inspector Requirements + +- Provide a local “Why did this run?” view. +- Group by `root_intent_id`, not raw PID. +- Show visible action, parent surface, helper chain, allowed capabilities, denied capabilities, data touched, network/DNS outcome, exit status, incomplete receipts, and policy regressions. +- Expected denials should be visible on expansion, but not noisy by default. +- Policy regressions and incomplete receipts should be surfaced immediately. + +## Acceptance Tests + +| Test | Given | Assert | +|---|---|---| +| Root intent propagation | Any helper spawn | `root_intent_id`, `parent_event_id`, `spawn_reason`, and `policy_profile` exist | +| Local preview no-network | Local PDF/image preview | network, DNS, analytics, and account lookup denied | +| Web thumbnail isolation | HTML/web thumbnail render | cookies, storage, extensions, network, and pasteboard denied | +| Cache cleanup transparency | Cache cleanup spawns network-shaped helper | receipt explains reason and egress decision | +| Native file picker isolation | Browser invokes native file panel | no browser session or extension authority inherited | +| Terminal preview isolation | TurtleTerm preview | shell secrets and environment denied | +| Receipt completeness | Completed action | all helper spawns have exit or active-state events | +| Denial classification | Denied capability request | denial classified and `data_accessed` recorded | +| Policy regression CI | New undeclared capability | CI fails unless policy is updated | +| Inspector rendering | Completed DAG | user-readable summary exists | + +## Repo Integration + +| Repo | Role | +|---|---| +| `SourceOS-Linux/sourceos-shell` | Runtime receipt store, helper wrapper, parser/correlator, local inspector | +| `SourceOS-Linux/BearBrowser` | Browser file picker, cache cleanup, preview and thumbnail helper enforcement | +| `SourceOS-Linux/TurtleTerm` | Terminal preview and command helper secret isolation | +| `SocioProphet/ontogenesis` | Ontology classes, properties, SHACL constraints | +| `SocioProphet/prophet-platform` | Evidence-envelope mapping, evidence-console view, CI trust gates | + +## Non-Goals + +- Do not alert for every expected denial. +- Do not ban short-lived helpers. +- Do not ban multiprocess rendering. +- Do not assume every denial is malicious. +- Do require causality, classification, policy, and receipts. + +## Security Invariants + +- Preview helpers are local-only by default. +- Web thumbnails do not inherit browser session state. +- Cache cleanup does not receive network authority by default. +- Native file picker helpers do not inherit browser extension authority. +- Terminal preview helpers do not inherit shell secrets. +- Every sensitive capability decision is recorded. +- Every helper exit is recorded. +- Incomplete receipts degrade trust state. diff --git a/policies/helper-receipts/apple_service_family_taxonomy.v0.1.yaml b/policies/helper-receipts/apple_service_family_taxonomy.v0.1.yaml new file mode 100644 index 0000000..8d349e5 --- /dev/null +++ b/policies/helper-receipts/apple_service_family_taxonomy.v0.1.yaml @@ -0,0 +1,167 @@ +version: 0.1 +purpose: > + Conservative mapping from Apple/macOS service-family strings to SourceOS helper receipt phases and policy profiles. + This supports imported-log analysis without claiming private subsystem intent. +default_phase: unknown_or_general_launchd_churn +default_policy_profile: unknown +families: + - match: com.apple.mdworker + phase: spotlight_metadata_indexing + policy_profile: indexer.metadata_local.v1 + role: metadata_index_worker + - match: mdworker_shared + phase: spotlight_metadata_indexing + policy_profile: indexer.metadata_local.v1 + role: metadata_index_worker + - match: CacheDelete + phase: cache_cleanup + policy_profile: cache_cleanup.local_only.v1 + role: cache_cleanup + - match: CacheExtension + phase: cache_cleanup + policy_profile: cache_cleanup.local_only.v1 + role: cache_cleanup + - match: WebKit + phase: web_thumbnail_or_webkit_helper + policy_profile: preview.web_thumbnail.local_only.v1 + role: web_runtime_helper + - match: WebThumbnail + phase: web_thumbnail_or_webkit_helper + policy_profile: preview.web_thumbnail.local_only.v1 + role: web_thumbnail_helper + - match: QuickLook + phase: quicklook_preview_rendering + policy_profile: preview.local_only.v1 + role: preview_ui_or_thumbnail + - match: CGPDFService + phase: quicklook_preview_rendering + policy_profile: preview.local_only.v1 + role: pdf_rendering_helper + - match: ImageIOXPCService + phase: quicklook_preview_rendering + policy_profile: preview.local_only.v1 + role: image_decode_helper + - match: ThumbnailExtension + phase: quicklook_preview_rendering + policy_profile: preview.local_only.v1 + role: thumbnail_helper + - match: screencapture + phase: screenshot_capture + policy_profile: screenshot.capture_receipt.v1 + role: screenshot_capture_ui + - match: openAndSavePanelService + phase: native_file_picker + policy_profile: file_picker.native_ui.v1 + role: native_file_picker + - match: AXVisualSupportAgent + phase: accessibility_ui_support + policy_profile: accessibility.ui_support.v1 + role: accessibility_visual_support + - match: com.apple.accessibility + phase: accessibility_ui_support + policy_profile: accessibility.ui_support.v1 + role: accessibility_service + - match: com.apple.filesystems.netfs + phase: filesystem_netfs_plugin + policy_profile: filesystem.netfs_plugin.v1 + role: network_filesystem_plugin + - match: PlugInLibraryService + phase: filesystem_netfs_plugin + policy_profile: filesystem.netfs_plugin.v1 + role: plugin_library_service + - match: iconservices + phase: iconservices_rendering + policy_profile: iconservices.rendering_local.v1 + role: icon_rendering_service + - match: AudioComponentRegistrar + phase: audio_component_discovery + policy_profile: audio.component_scan_local.v1 + role: audio_component_registry + - match: CarbonComponentScanner + phase: audio_component_discovery + policy_profile: audio.component_scan_local.v1 + role: audio_component_scanner + - match: trustd + phase: trust_and_certificate_services + policy_profile: trust.security_local.v1 + role: certificate_trust_service + - match: secinitd + phase: trust_and_certificate_services + policy_profile: trust.security_local.v1 + role: security_initialization + - match: amfid + phase: security_integrity_sidecar + policy_profile: security.scan_local.v1 + role: code_integrity + - match: XProtect + phase: security_integrity_sidecar + policy_profile: security.scan_local.v1 + role: malware_protection + - match: sysextd + phase: system_extension_management + policy_profile: system_extension.management_local.v1 + role: system_extension_daemon + - match: CloudTelemetry + phase: telemetry_sidecar + policy_profile: telemetry.local_metric.v1 + role: telemetry_service + - match: ecosystemanalytics + phase: telemetry_sidecar + policy_profile: telemetry.local_metric.v1 + role: ecosystem_analytics + - match: ecosystemd + phase: ecosystem_services + policy_profile: ecosystem.service_local.v1 + role: ecosystem_service + - match: biomesyncd + phase: biome_sync_services + policy_profile: biome.sync_local.v1 + role: biome_sync_service + - match: geod + phase: location_services + policy_profile: location.service_deny_by_default.v1 + role: location_service + - match: appleaccountd + phase: account_identity_services + policy_profile: account.identity_deny_by_default.v1 + role: account_identity_service + - match: iCloudNotificationAgent + phase: cloud_notification_services + policy_profile: cloud_notification_deny_by_default.v1 + role: icloud_notification_agent + - match: maild + phase: mail_services + policy_profile: mail.service_local.v1 + role: mail_service + - match: WorkflowKit + phase: workflow_background_shortcuts + policy_profile: workflow.background_shortcut.v1 + role: background_shortcut_runner + - match: BackgroundShortcutRunner + phase: workflow_background_shortcuts + policy_profile: workflow.background_shortcut.v1 + role: background_shortcut_runner + - match: naturallanguaged + phase: natural_language_processing + policy_profile: nlp.local_processing.v1 + role: natural_language_daemon + - match: cfprefsd + phase: preferences_services + policy_profile: preferences.local_service.v1 + role: preferences_daemon + - match: prngseedd + phase: secure_random_seed_services + policy_profile: security.random_seed_local.v1 + role: random_seed_service + - match: seputil + phase: secure_enclave_utility + policy_profile: security.secure_enclave_local.v1 + role: secure_enclave_utility + - match: MTLCompilerService + phase: metal_shader_compilation + policy_profile: gpu.shader_compile_local.v1 + role: metal_compiler_service + - match: swcd + phase: shared_web_credentials + policy_profile: web_credentials_deny_by_default.v1 + role: shared_web_credentials_daemon diff --git a/policies/helper-receipts/service_taxonomy.v0.1.yaml b/policies/helper-receipts/service_taxonomy.v0.1.yaml new file mode 100644 index 0000000..3a14ce9 --- /dev/null +++ b/policies/helper-receipts/service_taxonomy.v0.1.yaml @@ -0,0 +1,106 @@ +version: 0.1 +purpose: > + Map named helper/service lookups into SourceOS Helper Causal Receipt capability classes. + Unknown services remain mach_service.lookup until explicitly classified. +default_capability: mach_service.lookup +rules: + - match: pasteboard + capability: pasteboard.read + sensitivity: high + default_preview_decision: deny + rationale: Clipboard access is unrelated to passive preview or thumbnail generation. + - match: analyticsd + capability: analytics.emit + sensitivity: high + default_preview_decision: deny + rationale: Preview and cache helpers must not emit analytics by default. + - match: ecosystemanalyticsd + capability: analytics.emit + sensitivity: high + default_preview_decision: deny + rationale: Ecosystem analytics is telemetry-adjacent and must be separated from local security checks. + - match: tccd + capability: privacy.tcc.lookup + sensitivity: high + default_preview_decision: deny + rationale: Privacy/TCC lookups must be receipt-bearing. + - match: distributed_notifications + capability: notifications.distributed.lookup + sensitivity: medium + default_preview_decision: deny + rationale: Distributed notification lookups are optional UI/system probes in many helper contexts. + - match: webprivacyd + capability: privacy.web.lookup + sensitivity: high + default_preview_decision: deny + rationale: Web privacy service access is not required for local-only snapshot rendering. + - match: PowerManagement + capability: power.management.lookup + sensitivity: medium + default_preview_decision: deny + rationale: Passive preview helpers should not require power-management control. + - match: CARenderServer + capability: render.ca_server.lookup + sensitivity: medium + default_preview_decision: deny_or_mediated + rationale: Rendering helpers may need mediated graphics access, not broad compositor authority. + - match: windowserver + capability: windowserver.lookup + sensitivity: high + default_preview_decision: deny_or_mediated + rationale: Window-server access must be mediated and tied to visible UI intent. + - match: dock + capability: dock.lookup + sensitivity: low + default_preview_decision: deny_or_expected_denial + rationale: Dock/fullscreen probes are often UI compatibility probes but must remain classified. + - match: LaunchServices + capability: launchservices.lookup + sensitivity: medium + default_preview_decision: deny_or_brokered + rationale: LaunchServices can reveal app/file associations and should be brokered. + - match: coreservices + capability: coreservices.lookup + sensitivity: medium + default_preview_decision: deny_or_brokered + rationale: CoreServices lookups should be scoped to declared preview/file-picker purpose. + - match: FileProvider + capability: fileprovider.lookup + sensitivity: high + default_preview_decision: deny + rationale: FileProvider can imply cloud/sync surfaces; previews must not trigger sync implicitly. + - match: CloudTelemetry + capability: telemetry.cloud.lookup + sensitivity: high + default_preview_decision: deny + rationale: Cloud telemetry is not required for local-only helper work. + - match: XProtect + capability: security.xprotect.lookup + sensitivity: medium + default_preview_decision: allow_local_security + rationale: Local security checks may be allowed if receipt-bearing and separated from telemetry. + - match: MobileFileIntegrity + capability: security.code_integrity.lookup + sensitivity: medium + default_preview_decision: allow_local_security + rationale: Code integrity checks are acceptable when locally scoped and receipt-bearing. + - match: Keychain + capability: credentials.keychain.lookup + sensitivity: critical + default_preview_decision: deny + rationale: Preview helpers must not access credentials. + - match: securityd + capability: security.service.lookup + sensitivity: high + default_preview_decision: deny_or_brokered + rationale: Security service access requires explicit brokered purpose. + - match: apsd + capability: push_notifications.lookup + sensitivity: medium + default_preview_decision: deny + rationale: Push notification service is unrelated to local preview/cache cleanup. + - match: geod + capability: location.service.lookup + sensitivity: critical + default_preview_decision: deny + rationale: Location access is never needed for local preview. diff --git a/schemas/helper-causal-receipts.schema.json b/schemas/helper-causal-receipts.schema.json new file mode 100644 index 0000000..f82106b --- /dev/null +++ b/schemas/helper-causal-receipts.schema.json @@ -0,0 +1,94 @@ +{ + "$schema": "https://json-schema.org/draft/2020-12/schema", + "$id": "https://sourceos.local/schemas/helper-causal-receipts.schema.json", + "title": "Helper Causal Receipts v0.1", + "type": "object", + "required": ["schema", "event_type", "event_id", "timestamp", "root_intent_id"], + "properties": { + "schema": { "const": "sourceos.helper_causal_receipt.v0.1" }, + "event_type": { + "enum": [ + "root_intent.created", + "helper.spawn", + "capability.request", + "helper.exit", + "teardown.normalized", + "policy.decision", + "data.touch" + ] + }, + "event_id": { "type": "string", "minLength": 8 }, + "parent_event_id": { "type": ["string", "null"] }, + "root_intent_id": { "type": "string", "pattern": "^intent\\." }, + "timestamp": { "type": "string" }, + "surface": { "type": ["string", "null"] }, + "actor": { "type": ["object", "null"], "additionalProperties": true }, + "declared_purpose": { "type": ["string", "null"] }, + "data_scope": { "type": ["object", "array", "null"] }, + "default_policy": { "type": ["string", "null"] }, + "receipt_required": { "type": ["boolean", "null"] }, + "parent_process": { "type": ["string", "null"] }, + "child_process": { "type": ["string", "null"] }, + "process": { "type": ["string", "null"] }, + "pid": { "type": ["integer", "null"] }, + "trigger": { "type": ["string", "null"] }, + "spawn_reason": { "type": ["string", "null"] }, + "policy_profile": { "type": ["string", "null"] }, + "expected_lifetime_ms": { "type": ["integer", "null"], "minimum": 0 }, + "capability_budget": { "type": ["object", "null"] }, + "requestor": { "type": ["string", "null"] }, + "capability": { "type": ["string", "null"] }, + "requested_service": { "type": ["string", "null"] }, + "decision": { "enum": ["allow", "deny", "degrade", "missing", "unknown", null] }, + "classification": { + "enum": [ + "expected_denial", + "unexpected_denial", + "compatibility_probe", + "policy_regression", + "malicious_probe_candidate", + "missing_service", + "teardown_race", + "duplicate_activation_coalesced", + "supervisor_worker_lifecycle_kill", + "unknown", + null + ] + }, + "policy_rule": { "type": ["string", "null"] }, + "data_accessed": { "type": ["boolean", "null"] }, + "exit_status": { "type": ["string", "null"] }, + "duration_ms": { "type": ["integer", "null"], "minimum": 0 }, + "children_cleaned": { "type": ["boolean", "null"] }, + "unexpected_denials": { "type": ["integer", "null"], "minimum": 0 }, + "network_used": { "type": ["boolean", "null"] }, + "receipt_complete": { "type": ["boolean", "null"] }, + "raw_message": { "type": ["string", "null"] }, + "normalized_class": { "type": ["string", "null"] }, + "severity": { "enum": ["trace", "info", "notice", "warning", "error", "critical", null] }, + "meaning": { "type": ["string", "null"] }, + "policy_impact": { "type": ["string", "null"] }, + "service_key": { "type": ["string", "null"] }, + "service_uuid": { "type": ["string", "null"] }, + "service_family_role": { "type": ["string", "null"] }, + "phase": { "type": ["string", "null"] }, + "line_no": { "type": ["integer", "null"] }, + "tags": { "type": ["array", "null"], "items": { "type": "string" } }, + "inference_confidence": { "enum": ["high", "medium", "low", "unknown", null] } + }, + "allOf": [ + { + "if": { "properties": { "event_type": { "const": "helper.spawn" } } }, + "then": { "required": ["parent_process", "child_process", "trigger", "spawn_reason", "policy_profile"] } + }, + { + "if": { "properties": { "event_type": { "const": "capability.request" } } }, + "then": { "required": ["requestor", "capability", "requested_service", "decision", "classification", "data_accessed"] } + }, + { + "if": { "properties": { "event_type": { "const": "helper.exit" } } }, + "then": { "required": ["process", "exit_status", "duration_ms", "receipt_complete"] } + } + ], + "additionalProperties": true +} diff --git a/tests/fixtures/helper-receipts/preview_local_only_fail_network_allowed.json b/tests/fixtures/helper-receipts/preview_local_only_fail_network_allowed.json new file mode 100644 index 0000000..b364b8d --- /dev/null +++ b/tests/fixtures/helper-receipts/preview_local_only_fail_network_allowed.json @@ -0,0 +1,14 @@ +{ + "schema": "sourceos.helper_causal_receipt.v0.1", + "event_type": "capability.request", + "event_id": "evt_fixture_preview_fail", + "timestamp": "2026-05-06T00:00:01Z", + "root_intent_id": "intent.preview.file.fixture", + "policy_profile": "preview.local_only.v1", + "requestor": "sourceos-pdf-renderer", + "capability": "network.egress", + "requested_service": "network", + "decision": "allow", + "classification": "policy_regression", + "data_accessed": true +} diff --git a/tests/fixtures/helper-receipts/preview_local_only_pass.json b/tests/fixtures/helper-receipts/preview_local_only_pass.json new file mode 100644 index 0000000..865cbd2 --- /dev/null +++ b/tests/fixtures/helper-receipts/preview_local_only_pass.json @@ -0,0 +1,14 @@ +{ + "schema": "sourceos.helper_causal_receipt.v0.1", + "event_type": "capability.request", + "event_id": "evt_fixture_preview_pass", + "timestamp": "2026-05-06T00:00:00Z", + "root_intent_id": "intent.preview.file.fixture", + "policy_profile": "preview.local_only.v1", + "requestor": "sourceos-pdf-renderer", + "capability": "network.egress", + "requested_service": "network", + "decision": "deny", + "classification": "expected_denial", + "data_accessed": false +} diff --git a/tests/fixtures/helper-receipts/web_thumbnail_pass_clipboard_denied.json b/tests/fixtures/helper-receipts/web_thumbnail_pass_clipboard_denied.json new file mode 100644 index 0000000..97c0f14 --- /dev/null +++ b/tests/fixtures/helper-receipts/web_thumbnail_pass_clipboard_denied.json @@ -0,0 +1,14 @@ +{ + "schema": "sourceos.helper_causal_receipt.v0.1", + "event_type": "capability.request", + "event_id": "evt_fixture_webthumb_pass", + "timestamp": "2026-05-06T00:00:02Z", + "root_intent_id": "intent.preview.web.fixture", + "policy_profile": "preview.web_thumbnail.local_only.v1", + "requestor": "sourceos-web-thumbnailer", + "capability": "pasteboard.read", + "requested_service": "pasteboard", + "decision": "deny", + "classification": "expected_denial", + "data_accessed": false +} diff --git a/tests/test_unified_log_helper_correlator.py b/tests/test_unified_log_helper_correlator.py new file mode 100644 index 0000000..92c1893 --- /dev/null +++ b/tests/test_unified_log_helper_correlator.py @@ -0,0 +1,45 @@ +from tools.unified_log_helper_correlator import build_dag + + +def test_build_dag_basic_lifecycle(): + events = [ + { + "event_id": "evt1", + "event_type": "helper.spawn", + "timestamp": "2026-05-05 20:00:00.000000", + "line_no": 1, + "root_intent_id": "intent.test", + "phase": "cache_cleanup", + "pid": 1, + "service_key": "svc", + "lifecycle_state": "will_spawn", + }, + { + "event_id": "evt2", + "event_type": "helper.spawn", + "timestamp": "2026-05-05 20:00:00.100000", + "line_no": 2, + "root_intent_id": "intent.test", + "phase": "cache_cleanup", + "pid": 1, + "service_key": "svc", + "lifecycle_state": "xpcproxy_spawned", + }, + { + "event_id": "evt3", + "event_type": "helper.exit", + "timestamp": "2026-05-05 20:00:01.000000", + "line_no": 3, + "root_intent_id": "intent.test", + "phase": "cache_cleanup", + "pid": 1, + "service_key": "svc", + "lifecycle_state": "exited", + }, + ] + + dag = build_dag(events) + assert dag["summary"]["event_count"] == 3 + assert dag["summary"]["edge_type_counts"]["contains_phase"] == 1 + assert dag["summary"]["edge_type_counts"]["same_pid_next_event"] >= 2 + assert dag["summary"]["edge_type_counts"]["lifecycle_progression"] >= 2 diff --git a/tests/test_unified_log_helper_parser.py b/tests/test_unified_log_helper_parser.py new file mode 100644 index 0000000..74c79bc --- /dev/null +++ b/tests/test_unified_log_helper_parser.py @@ -0,0 +1,29 @@ +from tools.unified_log_helper_parser import parse_text + + +def test_parse_mdworker_spawn(): + text = "2026-05-05 20:40:14.174009 (user/501/com.apple.mdworker.shared.08000000-0600-0000-0000-000000000000) : internal event: WILL_SPAWN, code = 0" + events = parse_text(text) + assert len(events) == 1 + assert events[0]["event_type"] == "helper.spawn" + assert events[0]["phase"] == "spotlight_metadata_indexing" + assert events[0]["policy_profile"] == "indexer.metadata_local.v1" + + +def test_parse_denied_pasteboard(): + text = "2026-05-05 20:42:00.415146 (gui/501 [100023]) : denied lookup: name = com.apple.pasteboard.1, requestor = WebThumbnailExt[53926], error = 159: Sandbox restriction" + events = parse_text(text) + assert len(events) == 1 + assert events[0]["event_type"] == "capability.request" + assert events[0]["capability"] == "pasteboard.read" + assert events[0]["decision"] == "deny" + assert events[0]["data_accessed"] is False + assert events[0]["phase"] == "web_thumbnail_or_webkit_helper" + + +def test_parse_teardown_without_context(): + text = "2026-05-05 20:42:03.378197 : invalid client reply port -1" + events = parse_text(text) + assert len(events) == 1 + assert events[0]["event_type"] == "teardown.normalized" + assert events[0]["normalized_class"] == "invalid_reply_endpoint_after_teardown" diff --git a/tools/check_helper_receipts.py b/tools/check_helper_receipts.py new file mode 100644 index 0000000..ca23b15 --- /dev/null +++ b/tools/check_helper_receipts.py @@ -0,0 +1,102 @@ +#!/usr/bin/env python3 +"""Minimal CI policy gate for SourceOS Helper Causal Receipts. + +This intentionally checks high-value invariants first: +- every event has a root intent +- helper spawns declare parent/process/trigger/reason/policy +- capability requests declare decision/classification/data_accessed +- local-only profiles do not allow network, DNS, analytics, pasteboard, or account lookup +""" + +from __future__ import annotations + +from pathlib import Path +import argparse +import json +import sys +from typing import Iterable + +LOCAL_ONLY_PROFILES = { + "preview.local_only.v1", + "preview.web_thumbnail.local_only.v1", + "terminal.preview.local_only.v1", + "cache_cleanup.local_only.v1", + "file_picker.native_ui.v1", +} + +DENY_IN_LOCAL_ONLY = { + "network.egress", + "dns.lookup", + "analytics.emit", + "pasteboard.read", + "pasteboard.write", + "account.lookup", + "credentials.keychain.lookup", +} + + +def iter_events(path: Path) -> Iterable[dict]: + if path.suffix == ".jsonl": + for line in path.read_text(errors="replace").splitlines(): + if line.strip(): + yield json.loads(line) + else: + data = json.loads(path.read_text(errors="replace")) + if isinstance(data, list): + yield from data + else: + yield data + + +def check_event(ev: dict) -> list[str]: + errors: list[str] = [] + + if not str(ev.get("root_intent_id", "")).startswith("intent."): + errors.append("root_intent_id must start with 'intent.'.") + + event_type = ev.get("event_type") + + if event_type == "helper.spawn": + for field in ["parent_process", "child_process", "trigger", "spawn_reason", "policy_profile"]: + if field not in ev or ev.get(field) in (None, ""): + errors.append(f"helper.spawn missing {field}") + + if event_type == "capability.request": + for field in ["requestor", "capability", "requested_service", "decision", "classification", "data_accessed"]: + if field not in ev: + errors.append(f"capability.request missing {field}") + + if ev.get("policy_profile") in LOCAL_ONLY_PROFILES: + if ev.get("capability") in DENY_IN_LOCAL_ONLY and ev.get("decision") == "allow": + errors.append( + f"local-only policy {ev.get('policy_profile')} allowed forbidden capability {ev.get('capability')}" + ) + + if event_type == "helper.exit": + for field in ["process", "exit_status", "duration_ms", "receipt_complete"]: + if field not in ev: + errors.append(f"helper.exit missing {field}") + + return errors + + +def main() -> int: + parser = argparse.ArgumentParser() + parser.add_argument("paths", nargs="+", type=Path) + args = parser.parse_args() + + failures: list[dict] = [] + checked = 0 + + for path in args.paths: + for index, event in enumerate(iter_events(path)): + checked += 1 + for error in check_event(event): + failures.append({"path": str(path), "index": index, "error": error}) + + print(json.dumps({"checked": checked, "errors": failures}, indent=2, sort_keys=True)) + return 1 if failures else 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/tools/unified_log_helper_correlator.py b/tools/unified_log_helper_correlator.py new file mode 100644 index 0000000..86d7482 --- /dev/null +++ b/tools/unified_log_helper_correlator.py @@ -0,0 +1,201 @@ +#!/usr/bin/env python3 +"""Helper Causal Receipt DAG correlator. + +Reads JSONL emitted by `tools/unified_log_helper_parser.py` and builds a conservative +DAG. The graph is intentionally observational: + +- phase containment edges group events by inferred subsystem family +- same-PID edges preserve chronological process-local continuity +- same-service edges preserve service-key continuity +- lifecycle edges connect spawn/init/exit progressions +- nearby capability edges connect service lookups to recent same-PID helper lifecycle events + +The correlator does not claim private OS intent; it reconstructs visible evidence +for user-facing explanation and policy review. +""" + +from __future__ import annotations + +from pathlib import Path +import argparse +from collections import Counter, defaultdict +from datetime import datetime +import json +from typing import Any + + +def ts_float(timestamp: str) -> float: + for fmt in ("%Y-%m-%d %H:%M:%S.%f", "%Y-%m-%dT%H:%M:%SZ"): + try: + return datetime.strptime(timestamp, fmt).timestamp() + except ValueError: + continue + return 0.0 + + +def load_events(path: Path) -> list[dict[str, Any]]: + events: list[dict[str, Any]] = [] + for line in path.read_text(errors="replace").splitlines(): + if line.strip(): + events.append(json.loads(line)) + return events + + +def build_dag(events: list[dict[str, Any]]) -> dict[str, Any]: + events = sorted(events, key=lambda ev: (ts_float(ev.get("timestamp", "")), ev.get("line_no", 0))) + + nodes: dict[str, dict[str, Any]] = {} + edges: list[dict[str, Any]] = [] + + root_id = "root:intent.imported_log.analysis.synthetic" + nodes[root_id] = { + "id": root_id, + "type": "root_intent", + "label": "Imported unified-log helper receipt analysis", + } + + for event in events: + phase = event.get("phase", "unknown_or_general_launchd_churn") + phase_id = f"phase:{phase}" + if phase_id not in nodes: + nodes[phase_id] = { + "id": phase_id, + "type": "phase", + "label": phase.replace("_", " "), + "phase": phase, + } + edges.append({"source": root_id, "target": phase_id, "type": "contains_phase"}) + + event_id = event["event_id"] + nodes[event_id] = { + "id": event_id, + "type": event.get("event_type"), + "timestamp": event.get("timestamp"), + "line_no": event.get("line_no"), + "label": event.get("lifecycle_state") or event.get("capability") or event.get("normalized_class") or event.get("event_type"), + "pid": event.get("pid"), + "service_key": event.get("service_key"), + "service_uuid": event.get("service_uuid"), + "phase": phase, + "policy_profile": event.get("policy_profile"), + "service_family_role": event.get("service_family_role"), + "capability": event.get("capability"), + "requested_service": event.get("requested_service"), + "decision": event.get("decision"), + "classification": event.get("classification"), + "lifecycle_state": event.get("lifecycle_state"), + "exit_status": event.get("exit_status"), + "duration_ms": event.get("duration_ms"), + "severity": event.get("severity"), + "inference_confidence": event.get("inference_confidence"), + } + edges.append({"source": phase_id, "target": event_id, "type": "contains_event"}) + + by_pid: dict[int, list[dict[str, Any]]] = defaultdict(list) + by_service: dict[str, list[dict[str, Any]]] = defaultdict(list) + + for event in events: + if event.get("pid") is not None: + by_pid[int(event["pid"])].append(event) + if event.get("service_key"): + by_service[str(event["service_key"])].append(event) + + for pid, pid_events in by_pid.items(): + pid_events = sorted(pid_events, key=lambda ev: (ts_float(ev.get("timestamp", "")), ev.get("line_no", 0))) + for previous, current in zip(pid_events, pid_events[1:]): + edges.append({ + "source": previous["event_id"], + "target": current["event_id"], + "type": "same_pid_next_event", + "pid": pid, + }) + + lifecycle_order = { + "will_spawn": 1, + "xpcproxy_spawned": 2, + "source_attach": 3, + "running_or_init": 4, + "exited": 5, + } + + for service_key, service_events in by_service.items(): + service_events = sorted(service_events, key=lambda ev: (ts_float(ev.get("timestamp", "")), ev.get("line_no", 0))) + + for previous, current in zip(service_events, service_events[1:]): + edges.append({ + "source": previous["event_id"], + "target": current["event_id"], + "type": "same_service_next_event", + "service_key": service_key, + }) + + lifecycle_events = [event for event in service_events if event.get("lifecycle_state")] + for previous, current in zip(lifecycle_events, lifecycle_events[1:]): + previous_order = lifecycle_order.get(previous.get("lifecycle_state"), 99) + current_order = lifecycle_order.get(current.get("lifecycle_state"), 99) + edge_type = "lifecycle_progression" if current_order >= previous_order else "lifecycle_loop_or_reactivation" + edges.append({ + "source": previous["event_id"], + "target": current["event_id"], + "type": edge_type, + "service_key": service_key, + }) + + prior_lifecycle_by_pid: dict[int, list[dict[str, Any]]] = defaultdict(list) + for event in events: + if event.get("event_type") in {"helper.spawn", "helper.exit"} and event.get("pid") is not None: + prior_lifecycle_by_pid[int(event["pid"])].append(event) + + for event in events: + if event.get("event_type") != "capability.request" or event.get("pid") is None: + continue + candidates: list[tuple[float, dict[str, Any]]] = [] + for candidate in prior_lifecycle_by_pid.get(int(event["pid"]), []): + delta = ts_float(event.get("timestamp", "")) - ts_float(candidate.get("timestamp", "")) + if 0 <= delta <= 10: + candidates.append((delta, candidate)) + if candidates: + _, nearest = sorted(candidates, key=lambda item: item[0])[0] + edges.append({ + "source": nearest["event_id"], + "target": event["event_id"], + "type": "nearby_pid_capability_request", + "pid": event.get("pid"), + }) + + summary = { + "event_count": len(events), + "node_count": len(nodes), + "edge_count": len(edges), + "event_type_counts": dict(Counter(event.get("event_type") for event in events)), + "phase_counts": dict(Counter(event.get("phase", "unknown_or_general_launchd_churn") for event in events)), + "policy_profile_counts": dict(Counter(event.get("policy_profile", "unknown") for event in events)), + "service_family_role_counts": dict(Counter(event.get("service_family_role", "unknown") for event in events)), + "capability_counts": dict(Counter(event.get("capability") for event in events if event.get("capability"))), + "lifecycle_state_counts": dict(Counter(event.get("lifecycle_state") for event in events if event.get("lifecycle_state"))), + "edge_type_counts": dict(Counter(edge["type"] for edge in edges)), + } + + return { + "schema": "sourceos.helper_causal_receipt_dag.v0.1", + "root_intent_id": "intent.imported_log.analysis.synthetic", + "summary": summary, + "nodes": list(nodes.values()), + "edges": edges, + } + + +def main() -> int: + parser = argparse.ArgumentParser() + parser.add_argument("jsonl", type=Path) + parser.add_argument("--out", type=Path, required=True) + args = parser.parse_args() + + dag = build_dag(load_events(args.jsonl)) + args.out.write_text(json.dumps(dag, indent=2, sort_keys=True)) + print(json.dumps(dag["summary"], indent=2, sort_keys=True)) + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/tools/unified_log_helper_parser.py b/tools/unified_log_helper_parser.py new file mode 100644 index 0000000..e9b0fee --- /dev/null +++ b/tools/unified_log_helper_parser.py @@ -0,0 +1,211 @@ +#!/usr/bin/env python3 +"""Unified-log-style helper receipt parser. + +Converts macOS unified-log style text into conservative Helper Causal Receipt +candidate events. This tool is intentionally observational: it reconstructs +visible lifecycle/capability patterns from text and does not claim private OS +subsystem ground truth. +""" + +from __future__ import annotations + +from pathlib import Path +import argparse +import json +import re +import uuid +from typing import Optional + +LINE_RE = re.compile( + r"^(?P\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d+)\s+" + r"(?:\((?P[^)]*)\)\s+)?" + r"<(?P[^>]+)>:\s+" + r"(?P.*)$" +) +PID_RE = re.compile(r"(?:pid/|\[)(?P\d+)") +SERVICE_NAME_RE = re.compile(r"name = (?P[^,\s]+)") +REQUESTOR_RE = re.compile(r"requestor = (?P[^,\s]+)") +XPCPROXY_RE = re.compile(r"xpcproxy spawned with pid (?P\d+)") +EXIT_DURATION_RE = re.compile(r"ran for (?P\d+)ms") +UUIDISH_RE = re.compile(r"([A-F0-9]{8}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{12})", re.I) + +SERVICE_FAMILY_RULES = [ + ("mdworker", "spotlight_metadata_indexing", "indexer.metadata_local.v1", "metadata_index_worker"), + ("CacheDelete", "cache_cleanup", "cache_cleanup.local_only.v1", "cache_cleanup"), + ("CacheExtension", "cache_cleanup", "cache_cleanup.local_only.v1", "cache_cleanup"), + ("WebKit", "web_thumbnail_or_webkit_helper", "preview.web_thumbnail.local_only.v1", "web_runtime_helper"), + ("WebThumbnail", "web_thumbnail_or_webkit_helper", "preview.web_thumbnail.local_only.v1", "web_thumbnail_helper"), + ("QuickLook", "quicklook_preview_rendering", "preview.local_only.v1", "preview_ui_or_thumbnail"), + ("CGPDFService", "quicklook_preview_rendering", "preview.local_only.v1", "pdf_rendering_helper"), + ("ImageIOXPCService", "quicklook_preview_rendering", "preview.local_only.v1", "image_decode_helper"), + ("ThumbnailExtension", "quicklook_preview_rendering", "preview.local_only.v1", "thumbnail_helper"), + ("screencapture", "screenshot_capture", "screenshot.capture_receipt.v1", "screenshot_capture_ui"), + ("openAndSavePanelService", "native_file_picker", "file_picker.native_ui.v1", "native_file_picker"), + ("AXVisualSupportAgent", "accessibility_ui_support", "accessibility.ui_support.v1", "accessibility_visual_support"), + ("com.apple.accessibility", "accessibility_ui_support", "accessibility.ui_support.v1", "accessibility_service"), + ("com.apple.filesystems.netfs", "filesystem_netfs_plugin", "filesystem.netfs_plugin.v1", "network_filesystem_plugin"), + ("PlugInLibraryService", "filesystem_netfs_plugin", "filesystem.netfs_plugin.v1", "plugin_library_service"), + ("iconservices", "iconservices_rendering", "iconservices.rendering_local.v1", "icon_rendering_service"), + ("AudioComponentRegistrar", "audio_component_discovery", "audio.component_scan_local.v1", "audio_component_registry"), + ("CarbonComponentScanner", "audio_component_discovery", "audio.component_scan_local.v1", "audio_component_scanner"), + ("trustd", "trust_and_certificate_services", "trust.security_local.v1", "certificate_trust_service"), + ("secinitd", "trust_and_certificate_services", "trust.security_local.v1", "security_initialization"), + ("amfid", "security_integrity_sidecar", "security.scan_local.v1", "code_integrity"), + ("XProtect", "security_integrity_sidecar", "security.scan_local.v1", "malware_protection"), + ("CloudTelemetry", "telemetry_sidecar", "telemetry.local_metric.v1", "telemetry_service"), + ("ecosystemanalytics", "telemetry_sidecar", "telemetry.local_metric.v1", "ecosystem_analytics"), + ("geod", "location_services", "location.service_deny_by_default.v1", "location_service"), + ("appleaccountd", "account_identity_services", "account.identity_deny_by_default.v1", "account_identity_service"), + ("iCloudNotificationAgent", "cloud_notification_services", "cloud_notification_deny_by_default.v1", "icloud_notification_agent"), + ("WorkflowKit", "workflow_background_shortcuts", "workflow.background_shortcut.v1", "background_shortcut_runner"), + ("MTLCompilerService", "metal_shader_compilation", "gpu.shader_compile_local.v1", "metal_compiler_service"), + ("swcd", "shared_web_credentials", "web_credentials_deny_by_default.v1", "shared_web_credentials_daemon"), +] + +CAPABILITY_RULES = [ + ("pasteboard", "pasteboard.read", "high"), + ("analyticsd", "analytics.emit", "high"), + ("ecosystemanalyticsd", "analytics.emit", "high"), + ("tccd", "privacy.tcc.lookup", "high"), + ("distributed_notifications", "notifications.distributed.lookup", "medium"), + ("webprivacyd", "privacy.web.lookup", "high"), + ("PowerManagement", "power.management.lookup", "medium"), + ("CARenderServer", "render.ca_server.lookup", "medium"), + ("windowserver", "windowserver.lookup", "high"), + ("dock", "dock.lookup", "low"), + ("LaunchServices", "launchservices.lookup", "medium"), + ("coreservices", "coreservices.lookup", "medium"), + ("FileProvider", "fileprovider.lookup", "high"), + ("CloudTelemetry", "telemetry.cloud.lookup", "high"), + ("XProtect", "security.xprotect.lookup", "medium"), + ("MobileFileIntegrity", "security.code_integrity.lookup", "medium"), + ("Keychain", "credentials.keychain.lookup", "critical"), + ("securityd", "security.service.lookup", "high"), + ("apsd", "push_notifications.lookup", "medium"), + ("geod", "location.service.lookup", "critical"), +] + + +def new_event_id() -> str: + return "evt_" + uuid.uuid4().hex + + +def service_key(context: str) -> str: + return re.sub(r"\s+\[\d+\]$", "", context or "global") + + +def family_for(text: str) -> tuple[str, str, str]: + low = text.lower() + for needle, phase, policy, role in SERVICE_FAMILY_RULES: + if needle.lower() in low: + return phase, policy, role + return "unknown_or_general_launchd_churn", "unknown", "unknown" + + +def capability_for(service_name: str) -> tuple[str, str]: + low = service_name.lower() + for needle, cap, sensitivity in CAPABILITY_RULES: + if needle.lower() in low: + return cap, sensitivity + return "mach_service.lookup", "unknown" + + +def base_event(ts: str, context: str, msg: str, severity: str, line_no: int) -> dict: + pid_match = PID_RE.search(context or "") or PID_RE.search(msg) + pid = int(pid_match.group("pid")) if pid_match else None + uuid_match = UUIDISH_RE.search((context or "") + " " + msg) + phase, policy_profile, role = family_for((context or "") + " " + msg) + return { + "schema": "sourceos.helper_causal_receipt.v0.1", + "event_id": new_event_id(), + "timestamp": ts, + "root_intent_id": "intent.imported_log.analysis.synthetic", + "raw_message": msg, + "pid": pid, + "line_no": line_no, + "service_key": service_key(context or "global"), + "service_uuid": uuid_match.group(1) if uuid_match else None, + "phase": phase, + "policy_profile": policy_profile, + "service_family_role": role, + "source_severity": severity, + } + + +def classify(parsed: dict, line_no: int) -> Optional[dict]: + context = parsed.get("context") or "global" + msg = parsed["msg"] + base = base_event(parsed["ts"], context, msg, parsed["severity"], line_no) + child = base["service_key"].split("/")[-1].split(" [")[0] + + if "internal event: WILL_SPAWN" in msg: + return {**base, "event_type": "helper.spawn", "lifecycle_state": "will_spawn", "parent_event_id": None, "parent_process": context, "child_process": child, "trigger": "ipc_or_launchd", "spawn_reason": "Demand-spawned helper candidate observed in unified log", "classification": "unknown", "inference_confidence": "medium", "severity": "trace"} + + xpc = XPCPROXY_RE.search(msg) + if xpc: + return {**base, "event_type": "helper.spawn", "lifecycle_state": "xpcproxy_spawned", "parent_event_id": None, "parent_process": context, "child_process": child, "pid": int(xpc.group("pid")), "trigger": "xpcproxy", "spawn_reason": "xpcproxy materialized helper process", "classification": "unknown", "inference_confidence": "high", "severity": "trace"} + + if "SOURCE_ATTACH" in msg: + return {**base, "event_type": "helper.spawn", "lifecycle_state": "source_attach", "parent_event_id": None, "parent_process": context, "child_process": child, "trigger": "source_attach", "spawn_reason": "IPC/event source attached to helper", "classification": "unknown", "inference_confidence": "medium", "severity": "trace"} + + if "service state: running" in msg or "internal event: INIT" in msg or "job state = running" in msg: + return {**base, "event_type": "helper.spawn", "lifecycle_state": "running_or_init", "parent_event_id": None, "parent_process": context, "child_process": child, "trigger": "init", "spawn_reason": "Helper entered running/initialized state", "classification": "unknown", "inference_confidence": "medium", "severity": "trace"} + + if "denied lookup" in msg or "failed lookup" in msg: + service_match = SERVICE_NAME_RE.search(msg) + requestor_match = REQUESTOR_RE.search(msg) + service = service_match.group("name") if service_match else "unknown" + capability, sensitivity = capability_for(service) + denied = "denied lookup" in msg + return {**base, "event_type": "capability.request", "requestor": requestor_match.group("requestor") if requestor_match else context, "capability": capability, "capability_sensitivity": sensitivity, "requested_service": service, "decision": "deny" if denied else "missing", "classification": "expected_denial" if denied else "missing_service", "policy_rule": f"{base['policy_profile']}.default_deny" if denied else None, "data_accessed": False, "inference_confidence": "high" if denied else "medium", "severity": "notice"} + + teardown_rules = [ + ("Operation already in progress", "duplicate_activation_coalesced", "Duplicate activation request while helper already active"), + ("no client port found", "client_endpoint_missing_after_teardown", "Client disappeared before service reply completed"), + ("invalid client reply port", "invalid_reply_endpoint_after_teardown", "Reply endpoint invalid during cleanup"), + ("job not found", "service_removed_before_reply", "Service exited before late lookup/reply completed"), + ("ENOSERVICE", "service_removed_before_reply", "Service exited before late lookup/reply completed"), + ] + for needle, normalized_class, meaning in teardown_rules: + if needle in msg: + classification = "teardown_race" if "reply" in normalized_class or "teardown" in normalized_class else "duplicate_activation_coalesced" + return {**base, "event_type": "teardown.normalized", "normalized_class": normalized_class, "meaning": meaning, "receipt_complete": True, "policy_impact": "none", "classification": classification, "inference_confidence": "medium", "severity": "notice"} + + if "exited due to" in msg: + duration_match = EXIT_DURATION_RE.search(msg) + status = "clean" if "exit(0)" in msg else "supervisor_kill" if "SIGKILL" in msg else "unknown_exit" + classification = "supervisor_worker_lifecycle_kill" if "SIGKILL" in msg and ("mds[" in msg or "launchd" in msg) else "unknown" + return {**base, "event_type": "helper.exit", "lifecycle_state": "exited", "process": context, "exit_status": status, "duration_ms": int(duration_match.group("ms")) if duration_match else 0, "children_cleaned": None, "unexpected_denials": None, "network_used": None, "receipt_complete": True, "classification": classification, "inference_confidence": "high" if duration_match else "medium", "severity": "trace" if status == "clean" else "notice"} + + return None + + +def parse_text(text: str) -> list[dict]: + events = [] + for line_no, line in enumerate(text.splitlines(), 1): + match = LINE_RE.match(line) + if not match: + continue + event = classify(match.groupdict(), line_no) + if event: + events.append(event) + return events + + +def main() -> int: + parser = argparse.ArgumentParser() + parser.add_argument("input", type=Path) + parser.add_argument("--jsonl", type=Path, required=True) + args = parser.parse_args() + + events = parse_text(args.input.read_text(errors="replace")) + with args.jsonl.open("w") as handle: + for event in events: + handle.write(json.dumps(event, sort_keys=True) + "\n") + + print(json.dumps({"events": len(events)}, indent=2, sort_keys=True)) + return 0 + + +if __name__ == "__main__": + raise SystemExit(main())