Problem Statement
Two related defects affect the Content Analytics pipeline in environments where ClickHouse replication is enabled:
1 — session_facts data is wiped on every refresh
The session_facts_rmv Refreshable Materialized View is configured with REWRITE mode, which truncates and rewrites the session_facts target table on every scheduled refresh cycle. This destroys all historical session data, making it impossible to retain engagement history beyond the most recent refresh window. The expected behavior is that session records accumulate over time without being purged.
2 — ReplacingMergeTree tables may return duplicated/stale rows
Several aggregate tables (session_facts, engagement_daily, sessions_by_device_daily, sessions_by_browser_daily, sessions_by_language_daily) use the ReplacingMergeTree engine. In ClickHouse deployments with replication enabled, reads against these tables are not guaranteed to be deduplicated unless the FINAL keyword is present in the query. The current CubeJS schema files query these tables without FINAL, which can lead to metrics surfacing duplicate or stale rows in replicated environments.
Impact: Loss of session history data and potentially incorrect engagement metrics in any ClickHouse-replicated deployment.
Steps to Reproduce
Issue 1 — Data wipe:
- Start the analytics stack with ClickHouse replication enabled.
- Ingest events so that
session_facts accumulates rows.
- Wait for
session_facts_rmv to refresh (every 30 s in the dev config).
- Query
session_facts — only rows produced in the most recent refresh window are present; all prior history is gone.
Issue 2 — Duplicate rows:
- Start the analytics stack with ClickHouse replication enabled.
- Ingest events and allow
engagement_daily (or any *_daily table) to populate.
- Query the table directly without
FINAL — observe that duplicate versions of the same row may be returned depending on merge state.
- Add
FINAL to the same query — observe that only the latest version is returned.
Acceptance Criteria
dotCMS Version
Any environment running the analytics stack with ClickHouse replication enabled.
Severity
High - Major functionality broken
Links
NA
Problem Statement
Two related defects affect the Content Analytics pipeline in environments where ClickHouse replication is enabled:
1 —
session_factsdata is wiped on every refreshThe
session_facts_rmvRefreshable Materialized View is configured with REWRITE mode, which truncates and rewrites thesession_factstarget table on every scheduled refresh cycle. This destroys all historical session data, making it impossible to retain engagement history beyond the most recent refresh window. The expected behavior is that session records accumulate over time without being purged.2 — ReplacingMergeTree tables may return duplicated/stale rows
Several aggregate tables (
session_facts,engagement_daily,sessions_by_device_daily,sessions_by_browser_daily,sessions_by_language_daily) use theReplacingMergeTreeengine. In ClickHouse deployments with replication enabled, reads against these tables are not guaranteed to be deduplicated unless theFINALkeyword is present in the query. The current CubeJS schema files query these tables withoutFINAL, which can lead to metrics surfacing duplicate or stale rows in replicated environments.Impact: Loss of session history data and potentially incorrect engagement metrics in any ClickHouse-replicated deployment.
Steps to Reproduce
Issue 1 — Data wipe:
session_factsaccumulates rows.session_facts_rmvto refresh (every 30 s in the dev config).session_facts— only rows produced in the most recent refresh window are present; all prior history is gone.Issue 2 — Duplicate rows:
engagement_daily(or any*_dailytable) to populate.FINAL— observe that duplicate versions of the same row may be returned depending on merge state.FINALto the same query — observe that only the latest version is returned.Acceptance Criteria
session_facts_rmvRefreshable Materialized View is updated from REWRITE to APPEND modesession_factsis preserved across all subsequent refresh cycles — no rows are removed on refreshsession_factswithout side-effects on existing rowsFINALis required for correct deduplication when reading from the following ReplacingMergeTree tables in replicated ClickHouse environments:session_facts,engagement_daily,sessions_by_device_daily,sessions_by_browser_daily,sessions_by_language_dailyFINALis added to all affected SQL queries, or the table engine / insert strategy is updated soFINALis not required — whichever is the more appropriate fixdotCMS Version
Any environment running the analytics stack with ClickHouse replication enabled.
Severity
High - Major functionality broken
Links
NA