databricks-agent-notebooks is a Python package and CLI for executing Databricks-like notebooks from a local development environment. It makes automated notebook execution possible outside a workspace UI and IDE extensions, and is specifically optimized for coding agents such as Claude Code and Codex.
# Remote execution (serverless and cluster via Databricks Connect)
uv tool install databricks-agent-notebooks
# Full functionality including LOCAL_SPARK (bundles standalone pyspark)
uv tool install "databricks-agent-notebooks[local-spark]"Or with pip:
pip install databricks-agent-notebooks # remote-only
pip install "databricks-agent-notebooks[local-spark]" # with LOCAL_SPARKThen, give your agent:
Run `agent-notebook help` and follow the agent README and agent doctor instructions
- Configured Databricks unified authentication profile in
~/.databrickscfgorDATABRICKS_CONFIG_FILE - For Scala notebooks: Java 11+ with
coursierorcsonPATH
Execution settings come from four levels, each overriding the previous:
pyproject.toml--[tool.agent-notebook]in the nearestpyproject.toml(walks up to.gitboundary)- Environment variables --
AGENT_NOTEBOOK_*env vars - Notebook frontmatter -- YAML block at the top of markdown notebooks under the
agent-notebook:key (the legacydatabricks:key is also supported) - CLI flags -- always win
All resolved config and user-defined params are available at runtime as agent_notebook_parameters -- a Python dict injected into every Python and SQL notebook before execution (Scala support planned). Use hooks.python.prologue_cells in config to inject setup code (catalog selection, helper imports) between session creation and user content.
See the agent guide for the full precedence model, supported keys, and examples. See runtime parameters and hooks for the lifecycle extensibility features.
Executing a notebook requires a language and an execution target. The target is set via the cluster field: a Databricks cluster name/ID, SERVERLESS, or a local master URL like local[*]. An optional profile selects the Databricks auth profile. These values can be provided via pyproject.toml, notebook frontmatter, or CLI flags.
Markdown is the recommended authoring format. A YAML frontmatter block can embed all values so the notebook is self-contained. The frontmatter key is agent-notebook::
---
agent-notebook:
language: python
profile: my-workspace
cluster: my-cluster-name
---
# Exploratory analysis
```python
df = spark.sql("SELECT current_catalog(), current_schema()")
display(df)
```Scala notebooks work the same way (use cluster: SERVERLESS for explicit serverless, or omit for implicit serverless):
---
agent-notebook:
language: scala
profile: my-workspace
cluster: SERVERLESS
---
```scala
spark.sql("SELECT 1").show()
```Scala notebooks have additional restrictions. See Scala development for details.
Local Spark notebooks skip Databricks entirely -- useful for CI, testing, and development without credentials:
---
agent-notebook:
language: python
cluster: "local[*]"
---
```python
result = spark.range(10).count()
print(f"count={result}")
```Each markdown notebook must use a single language -- you cannot mix Python and Scala cells.
- Databricks Source Format (
.py,.scala,.sql) - Jupyter notebooks (
.ipynb)
These formats have no frontmatter mechanism, so profile and cluster must come from pyproject.toml or CLI flags.
# Check installation
agent-notebook doctor
# Run a notebook (profile/cluster from frontmatter or pyproject.toml)
agent-notebook run path/to/notebook.md
# Override profile and cluster on the CLI
agent-notebook run path/to/notebook.md --profile my-workspace --cluster my-cluster-id
# Explicit serverless execution
agent-notebook run path/to/notebook.md --cluster SERVERLESS
# Local execution -- no Databricks credentials needed
agent-notebook run path/to/notebook.md --cluster "local[*]"
agent-notebook run path/to/notebook.md --cluster "local[4]" # 4 threadsOutput files are written to path/to/notebook_output/:
notebook.executed.ipynb(executed notebook)notebook.executed.md(Markdown)notebook.executed.html(HTML)
Use --output-dir to change the parent directory, or --format all / --format md / --format html to control rendered output.
| Flag | Description |
|---|---|
--profile |
Databricks auth profile (see note on LOCAL_SPARK below) |
--cluster |
Execution target: cluster name/ID, SERVERLESS, or local[N] |
--language |
Override notebook language (python, scala) |
--format |
Output format: all (default), md, html |
--output-dir |
Output directory (default: input file's parent) |
--timeout |
Per-cell timeout in seconds (default: unset) |
--allow-errors |
Continue execution on cell errors |
--no-inject-session |
Skip Databricks Connect session injection |
--no-preprocess |
Skip preprocessing directive expansion |
--param NAME=VALUE |
Set a preprocessing parameter (repeatable) |
--library PATH |
Add a Python library path to sys.path (repeatable) |
--clean |
Remove and recreate the output directory before running |
--allow-errorsis useful when a notebook contains independent commands, e.g., a series of summary queries.- Passing a cluster ID is the deterministic path for cluster-based execution. Name resolution is best-effort with a configurable
cluster_list_timeout(default: 120s). --cleanremoves and recreates the output directory -- useful for deterministic re-runs.
The --cluster flag is a unified execution target selector:
| Value | Execution mode |
|---|---|
SERVERLESS |
Explicit serverless (Databricks Connect) |
local[*], local[4], etc. |
Local Spark session (no Databricks) |
my-cluster or cluster ID |
Databricks cluster-backed execution |
| (omitted) | Serverless (implicit default) |
SERVERLESS is case-insensitive. Local master patterns are case-sensitive (lowercase only), following Spark conventions (local, local[*], local[N], local[N,M]).
--cluster "local[*]" runs notebooks against a local Spark session with no Databricks credentials needed.
- Python requires pyspark, included with the
[local-spark]extra. Scala uses$ivyimports (self-contained) --no-inject-sessioncan be combined with local execution if you want to skip the injected session--profile LOCAL_SPARKstill works for backward compatibility but is deprecated. Use--cluster "local[*]"instead.
See the agent guide's local execution section for environment variable tuning, Scala restrictions, and configuration behavior details.
Built-in defaults that apply when no explicit configuration overrides them.
Override via environment definitions, pyproject.toml, env var, or frontmatter.
# agent-notebook built-in defaults
cluster_list_timeout: 120 # seconds -- budget for cluster listing and name resolution
# env var: AGENT_NOTEBOOK_CLUSTER_LIST_TIMEOUT- Agent guide -- comprehensive reference for agent and automated use
- Runtime parameters -- accessing resolved config at runtime
- Hooks (prologue cells) -- injecting setup code before notebook content
- First-time setup -- readiness checks
- Scala development -- Scala-specific tips and restrictions
- Runtime-home layout
- Release and publishing notes
External contributions are welcome. See CONTRIBUTING.md.
This project was originally created by Simeon Simeonov with support from Swoop and is available under the MIT License.