Skip to content

Event Aggregation Pipeline for Semantic Processing #4

@rian-be

Description

@rian-be

Problem

With the introduction of the SemanticEvent layer, ChangeTrace will start producing higher level repository events derived from raw TraceEvent data.

Right now, however, there is no single place responsible for running aggregators and producing those semantic events. As a result:

  • aggregator execution order is implicit
  • adding new aggregators can become fragile
  • event processing logic risks being scattered across different parts of the codebase

As the system grows (visualization, analytics, scene generation), the aggregation step should be explicit and centralized.


Goal

Introduce Semantic Event Aggregation Pipeline responsible for converting the raw event stream into semantic events:

TraceEvent stream
      │
      ▼
EventAggregationEngine
      │
      ▼
SemanticEvent stream

The pipeline will orchestrate all IEventAggregator implementations and produce the semantic event stream consumed by rendering and analytics.


Proposed Architecture

IEventAggregator

Aggregators interpret low-level events and optionally emit semantic ones.

TraceEvent → SemanticEvent*

* means an aggregator may emit:

  • no events
  • one event
  • multiple events

Example aggregators:

  • CommitBundlingAggregator
  • MergeAggregator
  • HotspotAggregator (future)
  • ActorActivityAggregator (future)

EventAggregationEngine

A dedicated component responsible for running aggregators and producing the final semantic event stream.

Responsibilities:

  • receive the TraceEvent stream
  • execute aggregators in a deterministic order
  • emit resulting SemanticEvent instances
  • expose a clean semantic stream to downstream systems

Example flow:

TraceEvent Stream
      │
      ▼
CommitBundlingAggregator
      │
      ▼
MergeAggregator
      │
      ▼
SemanticEvent Stream

Design Principles

Deterministic Processing

Aggregator execution order should always be predictable so results remain reproducible.


Streaming Friendly

The pipeline should work on event streams, avoiding large in-memory collections whenever possible.


Clear Responsibility Boundaries

Each layer has a specific role:

  • TraceEvent → raw repository history
  • SemanticEvent → interpreted repository activity
  • EventAggregationEngine → transformation between the two

Rendering and analytics should consume only SemanticEvent.


Benefits

  • centralized event processing
  • predictable semantic event generation
  • easier addition of new analytics features
  • clearer separation between reconstruction and interpretation
  • better long term maintainability

Proposed Tasks

  • Introduce EventAggregationEngine
  • Define the IEventAggregator contract
  • Implement the aggregation pipeline
  • Ensure deterministic aggregator ordering
  • Integrate pipeline with the event processing flow
  • Document the architecture

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions