- Azure Databricks workspace
- Azure Data Factory (for orchestration)
- Unity Catalog enabled
- Python 3.9+ environment
pip install -r requirements.txt- Open Databricks SQL
- Run
sql/create_tables.sqlto create all tables - Run
sql/security_setup.sqlfor security configuration
- Upload all notebooks from
notebooks/to your Databricks workspace - Ensure notebooks are in the correct order (01, 02, 03, 04)
Set up Databricks secrets for:
- Azure subscription ID
- Databricks token
- Azure tenant ID
databricks secrets create-scope --scope "cost-tracking"
databricks secrets put --scope "cost-tracking" --key "azure-subscription-id" --string-value "<your-subscription-id>"
databricks secrets put --scope "cost-tracking" --key "databricks-token" --string-value "<your-token>"
databricks secrets put --scope "cost-tracking" --key "azure-tenant-id" --string-value "<your-tenant-id>"- 01_cost_tracking_dlt.py - ETL pipeline (run every 15 minutes)
- 02_anomaly_detection.py - Anomaly detection (run hourly)
- 03_cost_forecast.py - Cost forecasting (run every 4 hours)
- 04_pipeline_monitoring.py - Health monitoring (run daily)
{
"activities": [
{
"name": "CostTrackingETL",
"type": "DatabricksNotebook",
"dependsOn": [],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30
},
"typeProperties": {
"notebookPath": "/Shared/Cost Tracking/01_cost_tracking_dlt",
"baseParameters": {}
}
},
{
"name": "AnomalyDetection",
"type": "DatabricksNotebook",
"dependsOn": [
{
"activity": "CostTrackingETL",
"dependencyConditions": ["Succeeded"]
}
]
}
]
}- Go to Databricks SQL β Dashboards
- Create new dashboard: "Cost Tracking Dashboard"
Use queries from sql/dashboard_queries.sql:
- Cost Trends:
dashboard_cost_trends - Cost Breakdown:
dashboard_compute_breakdown - Anomalies:
dashboard_anomalies - Project Summary:
dashboard_project_summary - Regional Analysis:
dashboard_regional_analysis - Optimization Recommendations:
dashboard_optimization_recommendations - Pipeline Health:
dashboard_pipeline_health
Use config/dashboard_chart_config.json for Chart.js configurations.
cd sample_data
python generate_sample_data.py- Run notebooks in order
- Check data in
cost_tracking.project_costs - Verify anomalies in
cost_tracking.anomalies - Check forecasts in
cost_tracking.forecasts
| Table | Purpose |
|---|---|
cost_tracking.project_costs |
Main cost data |
cost_tracking.anomalies |
Detected anomalies |
cost_tracking.forecasts |
Cost predictions |
cost_tracking.pipeline_logs |
Execution logs |
cost_tracking.workspace_config |
Workspace configuration |
cost_tracking.user_permissions |
User access permissions |
cost_tracking.audit_logs |
Access audit logs |
cost_tracking.optimization_recommendations |
Cost optimization suggestions |
- Check
cost_tracking.pipeline_logsfor execution status - Monitor success rates and execution times
- Set up alerts for failures
- Track daily costs by project
- Monitor for anomalies
- Review optimization recommendations
- Row-level security enabled
- Unity Catalog for data governance
- Audit logging for access tracking
- Secrets stored in Databricks secret scope
- Uses serverless SQL compute
- Auto-termination clusters
- Delta table optimization
- Data retention (90 days)
- Permission Errors: Check Unity Catalog permissions
- Data Not Loading: Verify Azure Cost Management API access
- Anomaly Detection Fails: Check ML cluster configuration
- Forecasting Errors: Ensure sufficient historical data
-- Check data freshness
SELECT MAX(updated_at) FROM cost_tracking.project_costs;
-- Check pipeline health
SELECT * FROM cost_tracking.pipeline_logs
WHERE start_time >= current_date - interval '1 day';
-- Check anomalies
SELECT COUNT(*) FROM cost_tracking.anomalies
WHERE detected_at >= current_date - interval '7 days';
-- Check forecasts
SELECT * FROM cost_tracking.forecasts
WHERE forecast_date >= current_date;- Check inline comments in notebooks
- Review SQL queries for data issues
- Monitor pipeline logs for execution problems
- Use sample data generator for testing
Note: This solution is designed to work within Azure's free tier constraints. Monitor usage to avoid unexpected charges.
