A webapp, library, & cli for collecting, comparing, and governing databricks metric views, including converting from PowerBI semantic models (.pbix files).
The project is split into multiple packages:
| Package | Purpose |
|---|---|
daxglot |
DAX lexer, parser, AST, and SQL transpiler (dialect-aware via sqlglot) |
powermglot |
Power Query M parser and SQL transpiler; exposes parse_m_source to extract Unity Catalog table references and filter predicates from M let…in expressions |
pbi2dbr |
CLI + library that extracts a PBIX model, classifies fact/dimension tables, translates DAX measures to SQL, and emits metric-view YAML and DDL |
measurediff |
Extract, Compare & Contrast unity catalog metric view measures including upstream lineage |
webapp |
FastAPI Backend and React Frontend for web UI that provides all of the above |
flowchart TB
%% ── Power BI source ──────────────────────────────────────────────────
subgraph pbix ["📦 .pbix file"]
direction LR
pq["Power Query\nM expressions"]
rels["Relationships\n(FK columns, cardinality)"]
cols["Column schemas\n(names + dtypes)"]
dax["DAX measures\n(expression text)"]
end
%% ── Extraction ───────────────────────────────────────────────────────
subgraph ext ["PbixExtractor"]
direction TB
ucref["UC ref\n(catalog.schema.table)"]
sqlext["NativeQuery\nSQL"]
filtex["SelectRows\nfilter predicate"]
end
subgraph pmg ["powermglot"]
direction TB
mparse["parse_m_source\n(→ MSourceInfo)"]
end
pq --> mparse
mparse --> ucref & sqlext & filtex
%% ── Analysis ─────────────────────────────────────────────────────────
subgraph ana ["ModelAnalyzer"]
direction TB
classify["fact / dim\nclassification"]
jointree["join-tree\nbuilder"]
dimcols["dimension column\nextraction"]
end
rels --> classify & jointree
cols --> dimcols
classify --> jointree
jointree --> dimcols
%% ── DAX translation ──────────────────────────────────────────────────
subgraph daxglot ["daxglot"]
direction TB
parser["DAX parser\n(→ AST)"]
transpiler["transpiler\n(AST → SQL expr)"]
window["window spec builder\n(time-intelligence)"]
end
dax --> parser --> transpiler --> window
%% ── YAML output ──────────────────────────────────────────────────────
subgraph yaml_out ["📄 .yaml file"]
direction TB
y_source["source:\ndev.pbi.sales\n— or —\nSELECT … (NativeQuery SQL)"]
y_filter["filter:\nStatus = 'Active'"]
y_joins["joins:\n- name: customer\n source: dev.pbi.customer\n on: source.CustID = customer.CustID"]
y_dims["dimensions:\n- name: order_date\n expr: OrderDate\n- name: customer_name\n expr: customer.Name"]
y_measures["measures:\n- name: net_sales\n expr: SUM(Amount) FILTER (WHERE …)\n window:\n - order: date\n range: trailing 1 year"]
end
ucref --> y_source
sqlext --> y_source
filtex --> y_filter
jointree --> y_joins
dimcols --> y_dims
window --> y_measures
yaml_out --> sql_out(["📄 sales_mv.sql\nCREATE OR REPLACE VIEW dev.pbi.sales\nWITH METRICS LANGUAGE YAML AS\n$$ .yaml file contents $$"])
PbixExtractor wraps pbixray to pull out:
- Tables and their column schemas (names + pandas dtypes)
- DAX measures (expression text, display folder, description)
- Relationships (from/to table+column, active flag, cardinality)
- Power Query M expressions — parsed by
powermglot.parse_m_sourceto resolve Unity Catalog references (catalog.schema.table), pass-throughNativeQuerySQL, andTable.SelectRowsfilter predicates
parse_m_source handles all common Databricks connector patterns (Databricks.Catalogs, Databricks.Query, Value.NativeQuery, Sql.Database) and returns an MSourceInfo object:
from powermglot import parse_m_source
info = parse_m_source("""
let
Source = Databricks.Catalogs("host", "443", [Catalog="prod"]),
db = Source{[Name="pbi"]}[Data],
orders = db{[Name="orders"]}[Data],
filtered = Table.SelectRows(orders, each [status] = "Active")
in filtered
""")
# info.source_ref == "prod.pbi.orders"
# info.filter_sql == "status = 'Active'"
# info.native_sql is NoneWhen source_ref cannot be resolved (e.g. an unsupported connector), PbixExtractor falls back to regex-based pattern matching and then to the --source-catalog / --source-schema defaults.
from pbi2dbr.extractor import PbixExtractor
model = PbixExtractor("Sales Model.pbix").extract(
source_catalog="dev", # fallback when M expression can't be resolved
source_schema="pbi",
)
print(model.tables) # ['Sales', 'Customer', 'Date', ...]
print(model.measures[0].expression) # "= CALCULATE(SUM(Sales[Amount]), ...)"ModelAnalyzer classifies tables using a heuristic:
- Tables on the many side of at least one relationship → fact table
- Tables that only appear on the one side → dimension table
- Tables with measures but no FK relationships → also treated as fact tables
It then builds a star/snowflake join tree for each fact table (configurable depth) and extracts dimension columns (FK columns are excluded; joined dim columns are exposed with a joined_alias. prefix).
from pbi2dbr.analyzer import AnalysisOptions, ModelAnalyzer
opts = AnalysisOptions(
fact_tables=["Sales"], # override heuristic (optional)
exclude_tables=["_Dates"], # skip auto-generated date tables
max_snowflake_depth=3,
)
analyzer = ModelAnalyzer(model, opts)
fact_tables = analyzer.analyze()
for ft in fact_tables:
print(ft.name, "→", [j.name for j in ft.joins])
# Sales → ['customer', 'product', 'date']Each DAX measure expression is translated to a SQL aggregate expression by daxglot's translate_measure(). The translation covers:
| DAX | SQL |
|---|---|
SUM(Sales[Amount]) |
SUM(Amount) |
AVERAGE(Sales[Price]) |
AVG(Price) |
DISTINCTCOUNT(Sales[CustomerID]) |
COUNT(DISTINCT CustomerID) |
COUNTROWS(Sales) |
COUNT(*) |
= DIVIDE(SUM(Sales[Revenue]), SUM(Sales[Qty]))
SUM(Revenue) / NULLIF(SUM(Qty), 0)= CALCULATE(SUM(Sales[Amount]), Sales[Status] = "Sold")
SUM(Amount) FILTER (WHERE Status = 'Sold')Time-intelligence functions are converted to Databricks metric view window specs rather than SQL subqueries:
| DAX function | Window range |
|---|---|
SAMEPERIODLASTYEAR(Date[Date]) |
trailing 1 year |
PREVIOUSMONTH(Date[Date]) |
trailing 1 month |
PREVIOUSQUARTER(Date[Date]) |
trailing 1 quarter |
DATEADD(Date[Date], -3, MONTH) |
trailing 3 month |
DATESYTD(Date[Date]) |
cumulative + year boundary |
TOTALYTD(SUM(...), Date[Date]) |
cumulative + year boundary |
= CALCULATE(SUM(Sales[Amount]), SAMEPERIODLASTYEAR('Date'[Date]))
measures:
- name: sales_yoy
expr: SUM(Amount)
window:
- order: date
range: trailing 1 year
semiadditive: lastScalar VARs are inlined before translation:
= VAR _Numerator = SUM(Sales[Revenue])
VAR _Denominator = SUM(Sales[Qty])
RETURN DIVIDE(_Numerator, _Denominator)
SUM(Revenue) / NULLIF(SUM(Qty), 0)= IFERROR(DIVIDE(SUM(A[x]), SUM(B[y])), 0)
COALESCE(SUM(x) / NULLIF(SUM(y), 0), 0)= SWITCH(TRUE,
Sales[Region] = "West", "Pacific",
Sales[Region] = "East", "Atlantic",
"Other")
CASE WHEN Region = 'West' THEN 'Pacific'
WHEN Region = 'East' THEN 'Atlantic'
ELSE 'Other' ENDSome DAX patterns have no direct SQL equivalent (e.g. ALLSELECTED, complex iterator context). These are flagged with is_approximate = True and a -- WARNING header in the DDL output, so they can be reviewed manually.
MetricViewGenerator produces two files per fact table.
sales.yaml — the metric view definition:
version: '1.1'
comment: Metric view generated from PowerBI table 'Sales'
source: dev.pbi.sales
joins:
- name: customer
source: dev.pbi.customer
on: source.CustomerID = customer.CustomerID
- name: product
source: dev.pbi.product
on: source.ProductID = product.ProductID
dimensions:
- name: order_date
expr: OrderDate
- name: customer_name
expr: customer.Name
- name: product_category
expr: product.Category
measures:
- name: Net Sales
expr: SUM(Amount) FILTER (WHERE Status = 'Sold')
- name: Net Sales PM
expr: SUM(Amount) FILTER (WHERE Status = 'Sold')
window:
- order: date
range: trailing 1 month
semiadditive: lastsales_mv.sql — the DDL statement:
CREATE OR REPLACE VIEW `dev`.`pbi`.`sales`
WITH METRICS
LANGUAGE YAML
AS
$$
version: '1.1'
source: dev.pbi.sales
...
$$# Install
pip install pbi2dbr # installs daxglot and pbixray automatically
# Convert a PBIX file
pbi2dbr convert \
--pbix "Adventure Works DW 2020.pbix" \
--catalog dev \
--schema pbi \
--output-dir ./outputThis writes one .yaml + one _mv.sql file per detected fact table into ./output/.
Usage: pbi2dbr convert [OPTIONS]
Convert a PBIX file to Databricks metric view YAML and SQL DDL files.
Options:
--pbix PATH Path to the .pbix file. [required]
--catalog TEXT Target Unity Catalog catalog name. [required]
--schema TEXT Target Unity Catalog schema name. [required]
--output-dir PATH Directory to write output files. [default: ./output]
--fact-tables TEXT Comma-separated list of tables to treat as facts.
--prefix TEXT Prefix for generated view names.
--source-catalog TEXT Fallback source catalog (when M expr unresolvable).
--source-schema TEXT Fallback source schema.
--exclude-tables TEXT Comma-separated list of tables to skip.
--include-isolated Include tables with no relationships or measures.
--dialect TEXT SQL dialect for expressions. [default: databricks]
Example — convert with explicit fact tables and a naming prefix:
pbi2dbr convert \
--pbix "Sales Model.pbix" \
--catalog prod \
--schema pbi_views \
--fact-tables "Sales,Returns" \
--prefix "pbi_" \
--source-catalog raw \
--source-schema sales_raw \
--output-dir ./metric_viewsExplore a PBIX file before converting it:
# Show tables and column schemas
pbi2dbr inspect --pbix "Sales Model.pbix" --tables
# Show all DAX measures
pbi2dbr inspect --pbix "Sales Model.pbix" --measures
# Show active relationships
pbi2dbr inspect --pbix "Sales Model.pbix" --relationships
# Show everything
pbi2dbr inspect --pbix "Sales Model.pbix" --allfrom pbi2dbr.extractor import PbixExtractor
from pbi2dbr.analyzer import AnalysisOptions, ModelAnalyzer
from pbi2dbr.generator import MetricViewGenerator
# 1. Extract
model = PbixExtractor("Sales Model.pbix").extract(
source_catalog="dev", source_schema="pbi"
)
# 2. Analyse
facts = ModelAnalyzer(
model,
AnalysisOptions(exclude_tables=["_Dates"]),
).analyze()
# 3. Generate
gen = MetricViewGenerator()
for fact in facts:
spec = gen.build_spec(fact, target_catalog="dev", target_schema="pbi")
yaml_path, sql_path = gen.write(spec, output_dir="./output")
print(f"Wrote {yaml_path} and {sql_path}")Translate a single DAX expression directly:
from daxglot.measure_translator import translate_measure
result = translate_measure(
"= CALCULATE(SUM(Sales[Amount]), SAMEPERIODLASTYEAR('Date'[Date]))"
)
print(result.sql_expr) # SUM(Amount)
print(result.window_spec) # [WindowSpec(order='date', range='trailing 1 year', ...)]
print(result.warnings) # []measurediff has two responsibilities: collecting measure definitions from live Databricks metric views (enriched with recursive column lineage from system.access.column_lineage), and diffing any two collected measures to understand how similar or different they are.
Discovers metric views, parses their YAML definitions, traverses upstream column lineage, and writes one YAML file per measure.
# Collect all metric views in a catalog/schema (with lineage)
measurediff collect --catalog prod --schema finance -o ./definitions
# Skip lineage for speed
measurediff collect --catalog prod --schema finance --no-lineage -o ./definitionsEach output file is named {catalog}.{schema}.{view}.{measure}.yaml and contains the measure expression, optional window specs, and a recursive lineage tree showing how every referenced column flows from its ultimate source table:
metric_view: prod.finance.sales_metrics
name: Monthly_Sales
expr: SUM(source.total)
window:
- order: date
range: trailing 1 months
semiadditive: last
lineage:
- table: prod.finance.sales
column: total
type: UNKNOWN
upstream:
- table: prod.raw.orders
column: total
type: TABLECompares two collected measure files across three dimensions:
| Dimension | Weight | How it's measured |
|---|---|---|
| Expression | 50% | SQL expressions are normalized (table qualifiers stripped via sqlglot) before comparison |
| Window specs | 20% | Each window field (order, range, semiadditive) compared independently |
| Lineage | 30% | Jaccard similarity of ultimate leaf source columns; intermediate hops are flagged separately rather than penalised |
A weighted score produces one of three labels: Identical (≥ 98%), Similar (≥ 60%), or Different (< 60%).
measurediff diff \
playground.bravo.country_sales.Monthly_Sales.yaml \
playground.charlie.alphasales.MonthSales.yaml╭──────────────────────────── Measure Diff ────────────────────────────╮
│ A B │
│ playground.bravo.country_sales playground.charlie.alphasales │
│ Monthly_Sales MonthSales │
╰──────────────────────────────────────────────────────────────────────╯
╭── Similarity ──╮
│ Similar 93.3% │
╰────────────────╯
╭─────────────────── Expression ✓ ───────────────────╮
│ SUM(source.total) │
╰─────────────────────────────────────────────────────╯
╭─────────────────── Window ✗ ───────────────────────╮
│ Spec Field A B ✓/✗ │
│ [0] order date date ✓ │
│ [0] range trailing 1 months trailing ✗ │
│ 30 days │
│ [0] semiadditive last last ✓ │
╰─────────────────────────────────────────────────────╯
╭─────────────── Lineage (leaf sources) ✓ ───────────╮
│ Source Column In A In B │
│ playground.alpha.sales total ✓ ✓ │
│ ⚠ A has extra intermediate hops │
╰─────────────────────────────────────────────────────╯
The extra-hop warning means Monthly_Sales routes through playground.bravo.sales on its way to the same ultimate source — functionally equivalent, but the lineage path is longer.
- Measure references — DAX measures that call other measures (e.g.
[Net Sales PM]) are translated as plain column references. Full composability viaMEASURE(name)is applied where the referenced measure was already translated in the same view. - Row context / iterators —
SUMX,AVERAGEXand other iterator functions are approximated with their scalar aggregate equivalent and flagged as approximate. - Complex filter context —
ALL(),ALLSELECTED(), andKEEPFILTERS()modifiers insideCALCULATE()drop the context modifier and emit a warning; the base aggregate is still emitted. - Auto date tables — PowerBI generates hidden
DateTableTemplate_*andLocalDateTable_*tables. These are typically dimension tables and will not be classified as fact tables. Use--exclude-tablesto suppress them entirely. - Unity Catalog resolution — Power Query M expressions are parsed for Databricks catalog references (
DatabricksCatalog.Contents,Databricks.Catalogs,NativeQuerypatterns). If a table's M expression cannot be parsed,--source-catalog/--source-schemaprovide a fallback.

