Status: living document. Mirrors the source tree as of the time of writing. When you add a new top-level subsystem, update the diagrams below in the same commit.
DataLab-Web is the browser-native sibling of the DataLab desktop application. It runs the Sigima computation engine inside Pyodide (CPython compiled to WebAssembly) and renders a dedicated React + TypeScript UI inspired by the desktop Qt application. Plotting is delegated to Plotly.js — PlotPy (Qt) is not available in the browser.
The whole stack runs inside a single browser tab: there is no backend. The HDF5 workspace file is the only durable artifact (see Persistence model).
flowchart TB
Host["Host page <i>(optional)</i><br/>@datalab-platform/web-sdk"]
subgraph Tab["Browser tab"]
direction TB
subgraph Main["Main thread"]
direction TB
UI["React UI<br/>(src/components, src/App.tsx)"]
Actions["Action registry<br/>(src/actions)"]
DBridge["DialogBridge.tsx"]
RT["DataLabRuntime (TS)<br/>(src/runtime/runtime.ts)"]
Py0["Pyodide #0 — main<br/>bootstrap.py · processor.py<br/>dlw_*.py · Sigima · numpy · scipy"]
UI -->|dispatch| Actions
Actions -->|typed calls| RT
RT -->|runPython| Py0
Py0 -->|dialog requests| DBridge
DBridge --> UI
end
PB["proxyBridge.ts<br/>(whitelisted command router)"]
subgraph Workers["Web Workers"]
direction LR
Mw["macroWorker.ts<br/>Pyodide #1<br/>macro_proxy.py + user code"]
Nw["notebookWorker.ts<br/>Pyodide #N (1 per notebook)<br/>notebook_display.py"]
end
IDB[("IndexedDB<br/>recentStore.ts")]
RT --- PB
PB <-->|postMessage| Mw
PB <-->|postMessage| Nw
UI -. autosave / recovery .-> IDB
end
Host <-->|postMessage · JSON-RPC<br/>remoteBridge.ts| RT
Key facts:
- One Pyodide instance owns the object model (
_MODELinbootstrap.py). All UI reads go through it. - Each macro / notebook runs in its own Web Worker with its own
Pyodide instance. Workers never touch the main object model directly:
they call back to the main runtime through
proxyBridge. - The host page (optional) controls an embedded DataLab-Web through
the same RPC vocabulary, but transported by
remoteBridgeoverwindow.postMessage.
Five layers, left to right — each layer only depends on the ones to its right.
flowchart LR
L1["<b>L1 — UI layer</b><br/>(React components)<br/><br/>src/components/<br/>src/App.tsx<br/>src/main.tsx<br/><br/><i>Purely presentational.<br/>No Pyodide imports.</i>"]
L2["<b>L2 — Orchestration</b><br/><br/>src/actions/<br/>(registry, menu builder)<br/>src/macros/<br/>src/notebook/<br/>src/plugins/<br/>src/aiassistant/<br/>src/preferences/"]
L3["<b>L3 — Runtime / bridge</b><br/>(TypeScript)<br/><br/>RuntimeApi.ts (façade)<br/>runtime.ts<br/>(DataLabRuntime + types)<br/>WorkerRuntimeProxy.ts · kernelWorker.ts<br/>runtimeMode.ts · workerProtocol.ts<br/>RuntimeContext.tsx<br/>WorkspaceContext.tsx<br/>MacroRuntime.ts<br/>proxyBridge.ts · remoteBridge.ts<br/>macroWorker.ts · notebookWorker.ts<br/>src/storage/ (OPFS spill stores)"]
L4["<b>L4 — Python kernel</b><br/>(loaded into Pyodide)<br/><br/>bootstrap.py<br/>processor.py<br/>dlw_main.py<br/>dlw_plugins.py<br/>dlw_h5browser.py<br/>dlw_interactive_fit.py<br/>dlw_title_format.py<br/>notebook_display.py<br/>macro_proxy.py<br/>_guidata_*_shim.py"]
L5["<b>L5 — Computation engine</b><br/><br/>Sigima<br/>+ numpy · scipy<br/>+ scikit-image<br/>+ h5py · pandas …<br/><br/><i>Installed via micropip<br/>on first load.</i>"]
L1 --> L2 --> L3 --> L4 --> L5
| Boundary | Contract |
|---|---|
| L1 → L2 | useAction(id) / event handlers; no direct runtime calls from components. |
| L2 → L3 | Typed methods on the RuntimeApi façade (useRuntime()); the concrete impl is the in-thread DataLabRuntime or a worker-backed proxy. |
| L3 → L4 | pyodide.runPython() / pyodide.runPythonAsync() calling bootstrap.py helpers. |
| L3 ↔ Worker | postMessage (bridge_call / bridge_reply / stdout / stderr / …). |
| L3 ↔ Host | postMessage JSON-RPC (request / response / event) via remoteBridge. |
| L4 → L5 | Python imports (sigima.proc, sigima.objects, …). Catalog is auto-discovered. |
- Components must not import from
src/runtime/runtime.tsdirectly; they consumeuseRuntime()(typed as theRuntimeApifaçade) and dispatch throughsrc/actions/. Depending on the façade — not the concrete class — is what lets the runtime run either in-thread or in a Dedicated Web Worker without touching any call site. - Python helpers return JSON-friendly dicts / lists (
tolist()on arrays), never live PyProxies, to keep the bridge cheap. - The pinned
PYODIDE_VERSIONinruntime.tsand the<script>tag inindex.htmlmust always match — the wheel index is version-specific.
Presentational React. Notable components:
MenuBar.tsx/MenuDropdown.tsx— top menu bar built from the action registry.ObjectTree.tsx,TreeKindSwitcher.tsx,SidePanel.tsx— left panel: hierarchical workspace (groups, signals, images) + properties / metadata / stats / history.SignalPlot.tsx,ImagePlot.tsx,MultiImagePlot.tsx,CentralViewSwitcher.tsx— Plotly-based central plot area.DataSetDialog.tsx+DataSetForm/— auto-generated parameter dialogs from guidata DataSet JSON schemas (do not hand-write a form unless the auto path genuinely cannot express it).DialogBridge.tsx— single React entry-point that receives dialog requests from Python viabootstrap.set_dialog_bridge()and routes them to the appropriate React dialog component.- ROI:
RoiPanel.tsx(non-modal docked editor for signal & image ROIs and for the image Erase area… session — draw / table-edit regions, then apply),RoiGridDialog.tsx,signalRoi.ts,imageRoi.ts. - Other domain dialogs:
H5BrowserDialog.tsx,InteractiveFitDialog.tsx,TextImportWizard.tsx,ObjectPropertiesDialog.tsx,ProfileDefinitionDialog.tsx,SaveToDirectoryDialog.tsx,SeparateViewDialog.tsx,PluginManagerDialog.tsx,PluginConsentDialog.tsx,HelpDialog.tsx. - Macros / notebooks:
MacroPanel.tsx,MacroEditorTabs.tsx,MacroConsole.tsx,notebook/. - AI assistant:
AIAssistant/.
flowchart TB
subgraph A["src/actions/"]
T["types.ts<br/>ActionDescriptor, kinds"]
R["registry.ts<br/>register / get / list actions"]
B["buildMenu.ts<br/>merge static entries +<br/>Sigima catalog → menu tree"]
I["submenuIcons.ts<br/>icon assignment"]
end
Cat["processor.py<br/>build_signal_catalog()<br/>build_image_catalog()"] --> B
R --> B
T --> R
B --> MB["MenuBar.tsx<br/>(renders the tree)"]
MB -->|click| Run["action.run(ctx)"]
Run --> RT2["runtime.<method>(…)"]
Most processings are not hand-registered: buildMenu.ts queries
processor.py (build_signal_catalog, build_image_catalog) and
synthesises menu entries automatically. The static registry only adds
DataLab-Web-specific commands (file IO, ROI editor, plugin manager,
notebooks, macros, …).
Single class that owns the main-thread Pyodide instance and exposes a typed surface to the rest of the app:
classDiagram
class DataLabRuntime {
+load() / ready
+runPython() / runPythonAsync()
+list_signals / get_signal_xy / … (object CRUD)
+createSignalTyped / createImageTyped
+applyFeature(id, oids, params)
+HDF5 workspace save / load
+macros / notebooks CRUD
+plugins (register, enable, hot-reload)
+interactive fit, ROI editing, …
}
DataLabRuntime ..> Pyodide : runPython
DataLabRuntime ..> SignalMeta
DataLabRuntime ..> SignalData
DataLabRuntime ..> ProcessingDescriptor
DataLabRuntime ..> FeatureDescriptor
DataLabRuntime ..> PanelTree
Companion exports (interfaces / types): SignalMeta, SignalStyle,
SignalData, ProcessingDescriptor, LastProcessingInfo, Pattern,
FeatureDescriptor, InteractiveFitInfo / Param / Init / Auto,
PluginInfoMeta, PluginRecord, PluginMenuAction, PanelKind,
ObjectNode, GroupNode, PanelTree, ObjectMeta, plus thin
PyodideAPI / PyProxy aliases.
flowchart LR
RC["<b>RuntimeContext</b><br/>singleton load()<br/>useRuntime()"]
WC["<b>WorkspaceContext</b><br/>dirty flag<br/>current filename<br/>recovered-from-cache flag"]
Consumers["L1 / L2 consumers"]
RC --> Consumers
WC --> Consumers
RuntimeContext guarantees Pyodide is loaded exactly once, even across
HMR reloads (the Python _MODEL and _CATALOG survive re-execution of
bootstrap.py).
flowchart LR
subgraph MT["Main thread"]
direction TB
MR["MacroRuntime.ts<br/>spawns workers on demand<br/>keeps a warm standby<br/>postMessage glue"]
PB2["proxyBridge.ts<br/>whitelisted command router"]
RT3["DataLabRuntime"]
PB2 --> RT3
end
subgraph Workers2["Web Workers"]
direction TB
MW2["macroWorker.ts<br/>Pyodide #1<br/>macro_proxy.py<br/>+ user script"]
NW2["notebookWorker.ts<br/>Pyodide #N (per notebook)<br/>notebook_display.py"]
end
MR ==>|postMessage| MW2
MR ==>|postMessage| NW2
MW2 -. bridge_call .-> PB2
NW2 -. bridge_call .-> PB2
Wire shape (same for both worker kinds):
| Direction | Message types |
|---|---|
| main → worker | init · run · bridge_reply |
| worker → main | ready · started · finished · stdout · stderr · bridge_call |
proxyBridge.ts is a whitelisted command router: it maps the
method string in a bridge_call to a specific main-thread runtime
call. Workers cannot invoke arbitrary methods — only the ones explicitly
listed (add_signal, add_image, list/get/delete object, add/set object
from pickle, add_group, get/set selection, etc.).
flowchart LR
SDK["@datalab-platform/web-sdk<br/>(packages/sdk/)"]
RB["remoteBridge.ts<br/>request / response / event"]
RT4["DataLabRuntime"]
SDK <-->|postMessage<br/>JSON-RPC| RB
RB --> RT4
By default the Pyodide runtime is hosted in its own Dedicated Web Worker,
selected by runtimeMode.ts. This keeps the UI thread responsive during
heavy work and unlocks synchronous OPFS spills (DEW ADR #2). The legacy
in-thread topology — where DataLabRuntime runs on the UI thread — is
kept as a backup / escape hatch, selectable with ?runtime=main (or a
localStorage key / the VITE_RUNTIME_WORKER=0 build flag). Both paths
satisfy the same RuntimeApi façade, so the UI is identical either way.
flowchart LR
subgraph MT3["Main thread"]
direction TB
RC2["RuntimeContext<br/>(picks the mode)"]
WP["WorkerRuntimeProxy.ts<br/>implements RuntimeApi<br/>RPC over postMessage<br/>+ synchronous mirror cache"]
RC2 --> WP
end
subgraph KW["Kernel worker"]
direction TB
KWp["kernelWorker.ts<br/>Pyodide #0 + DataLabRuntime<br/>+ OpfsSyncObjectStore"]
end
WP <-->|"workerProtocol.ts<br/>call / result / mirror /<br/>mutation / dialog (transferables)"| KWp
Why this exists:
- Synchronous OPFS spills. The on-disk storage mode (
StorageMode = "ram" | "disk") spills heavy signal/image arrays out of the WASM heap to the Origin Private File System throughsrc/storage/. The asyncOpfsObjectStore(createWritable) works on the main thread; the fasterOpfsSyncObjectStore(createSyncAccessHandle) is worker-only by spec, so it is selected only when the runtime runs in the worker. Either way HDF5 stays the durable source of truth — OPFS is an ephemeral spill cache. - Zero-copy bridge.
workerProtocol.collectTransferablespasses everyArrayBufferin call arguments and return values as apostMessagetransferable, so binary payloads (image/signal arrays, file/HDF5 bytes) cross the boundary without a structured-clone copy. - Synchronous accessors (
getStorageMode,getSpilledCount,getMemoryUsage, …) are served from a mirror the worker pushes after every call, preserving their synchronous shape across the boundary.
No COOP/COEP headers are required: the design uses transferables, not
SharedArrayBuffer, so it remains deployable to a plain static host
(GitHub Pages). Macro / notebook workers are unchanged — their
bridge_calls still route through proxyBridge to the (now possibly
worker-hosted) runtime.
Modules pushed into Pyodide's virtual FS at load time. They are written
to be re-executable so HMR keeps _MODEL / _CATALOG alive.
-
bootstrap.py— the kernel. Owns_MODEL(the hierarchical object store: panels → groups → objects), the dialog bridge, HDF5 serialisation, macro / notebook records, and ~150 helper functions consumed byDataLabRuntime. Logical groups:- Object CRUD:
add_object_pickled,set_object_pickled,duplicate_object,delete_object,move_object[s],rename_object,add_signal_from_arrays,add_image_from_array. - Tree / groups:
get_panel_tree,create_group,rename_group,delete_group,get_group_titles_with_object_info,resolve_group_oids. - Signals & images access:
get_signal_xy,get_signals_xy,set_signal_style,get_object_stats,get_object_property_schema,set_object_property_values, metadata helpers. - Typed generators:
list_signal_creation_types,create_signal_typed,update_signal_creation_params, idem image. - IO:
list_signal_io_formats,open_signal_from_bytes,save_signal_to_bytes, idem image. HDF5 workspace serialisation. - Macros / notebooks: full CRUD + reorder + replace (used by the in-browser recovery cache when restoring from IndexedDB).
- Dialog bridge:
set_dialog_bridge(bridge)registers the JS callable used by Python helpers that need to ask the user something.
- Object CRUD:
-
processor.py(loaded asdlw_processor.py) — introspects Sigima's catalog and exposesbuild_signal_catalog(),build_image_catalog()and theapply_*dispatchers. Equivalent of desktop DataLab'sregister_1_to_1/register_n_to_1/register_2_to_1. Override hooks allow renaming / hiding / regrouping entries or wiring custom result renderers. -
dlw_main.py— exposes adatalab.*namespace inside Pyodide (panels, current panel, etc.) so that DataLab desktop plugins run unchanged in the browser, provided they useawait param.edit_async(...)for parameter dialogs. -
dlw_plugins.py— host for the Qt-compatible plugin API: discovery, registration, hot-reload, consent dialog, menu wiring. -
dlw_h5browser.py— HDF5 browser backend (consumed byH5BrowserDialog.tsx). -
dlw_interactive_fit.py— interactive-fit backend (consumed byInteractiveFitDialog.tsx). -
dlw_title_format.py— central title formatting for computed results (mirrors Sigima's title strategy). -
notebook_display.py— implements Jupyter-likedisplay()and cell execution semantics inside the notebook worker. -
macro_proxy.py— defines the asyncproxyobject exposed to user scripts (macros and notebook cells). Each method ultimately becomes abridge_callto the main thread. -
_guidata_backends_shim.py/_guidata_jsonschema_shim.py— patch guidata so that DataSet parameter classes (a) do not require a Qt backend and (b) can be serialised as JSON schemas thatDataSetDialog.tsxconsumes. These are temporary backport shims: they are declared insrc/runtime/shims/registry.tsand audited for removal once upstream ships the feature — seeshim-registry.md.
Shipped as a separate tarball (@datalab-platform/web-sdk). Provides
typed TypeScript clients for host pages embedding DataLab-Web in an
iframe. Talks to remoteBridge.ts over the wire protocol described
above. Its package.json version must stay aligned with the app's.
recentStore.ts— IndexedDB-backed cache for the recent-files menu and for crash recovery. Not durable in the project sense: the HDF5 workspace file is the single source of truth.
Implemented in macroWorker.ts / notebookWorker.ts ↔ proxyBridge.ts.
main → worker
{ type: "init" }
{ type: "run", code: string, name?: string }
{ type: "bridge_reply", id: string, ok: boolean, value?: any, error?: string }
worker → main
{ type: "ready" }
{ type: "started", name: string }
{ type: "finished", ok: boolean, error?: string }
{ type: "stdout" | "stderr", text: string }
{ type: "bridge_call", id: string, method: string, payload: any }
method is matched against an explicit allow-list in
proxyBridge.ts; unknown methods produce an error reply.
Small JSON-RPC dialect, versioned (RPC_PROTOCOL_VERSION).
host → iframe { id, type: "request", method, params? }
iframe → host { id, type: "response", result? | error? }
iframe → host { type: "event", name, payload? }
Synthetic method get_protocol_version lets clients negotiate
compatibility independently of the application version.
1. User clicks Processing › <feature> in MenuBar
2. action.run(ctx) (from src/actions/registry.ts)
3. If params needed → DataSetDialog (auto-generated from JSON schema)
4. runtime.applyFeature(featureId, selectedOids, paramValues)
5. pyodide.runPython → processor.apply_*(feature, oids, params)
6. Sigima computes; new SignalObj/ImageObj added to _MODEL
7. bootstrap returns new oids; UI refreshes ObjectTree + plots
8. WorkspaceContext is marked dirty
1. User clicks Run in MacroPanel
2. MacroRuntime spawns / reuses a macroWorker
3. main → worker: { type: "run", code, name }
4. worker executes user code; calls `await proxy.<method>(…)`
5. macro_proxy.py emits { type: "bridge_call", … }
6. proxyBridge dispatches to a whitelisted DataLabRuntime call
7. main → worker: { type: "bridge_reply", … }
8. stdout / stderr stream back to MacroConsole
9. worker → main: { type: "finished", ok, error? }
1. Host imports @datalab-platform/web-sdk and embeds the iframe
2. Host calls client.addSignal(x, y, title)
3. SDK posts { id, type: "request", method: "add_signal", params }
4. remoteBridge.ts dispatches to DataLabRuntime (same allow-list spirit)
5. iframe posts { id, type: "response", result }
6. Model changes also surface as { type: "event", name, payload }
The architecture is mirrored by three test layers — see doc/testing-strategy.md for the full decision tree.
| Layer | Tool | Scope |
|---|---|---|
| Python | pytest (in Pyodide) | bootstrap.py, processor.py, dlw_*.py logic |
| TS unit | Vitest + RTL | runtime types, action registry, components |
| End-to-end | Playwright | full UI through real Pyodide round-trips |
- Type-safety end-to-end: never widen a Pyodide return value beyond its declared TS type. Add new fields in the Python helper and the TS interface in the same commit.
- No business logic in components: orchestration lives in
src/actions/and the runtime. - JSON across the bridge: prefer plain dicts/lists in Python over passing PyProxies.
base: "./"in Vite config — the build is drop-in deployable to GitHub Pages and other sub-path hosts. Do not introduce absolute URLs.- Pinned Pyodide:
PYODIDE_VERSIONinruntime.tsand the<script>tag inindex.htmlmust always be bumped together.