Skip to content

Revamp archiving#1229

Draft
ThomasKroes wants to merge 372 commits into
masterfrom
feature/revamp_archiving
Draft

Revamp archiving#1229
ThomasKroes wants to merge 372 commits into
masterfrom
feature/revamp_archiving

Conversation

@ThomasKroes

@ThomasKroes ThomasKroes commented Mar 12, 2026

Copy link
Copy Markdown
Contributor

This pull request introduces a new workflow-based architecture for project operations (such as open, save, import, and publish), adds support for Zstandard (zstd) compression, and refactors the CMake build system to register new codecs and workflow components. It also removes the legacy ProjectSerializationTask system and adds new utility and action classes to support the updated design.

Core architecture and workflow changes:

  • Introduced a workflow-based system for project operations by adding new workflow and codec-related headers and sources (e.g., WorkflowPlan, WorkflowExecutionOptions, AbstractWorkflowPlanExecutor, BlobCodec, and related classes) to the build system (ManiVault/cmake/CMakeMvSourcesApplication.cmake, ManiVault/cmake/CMakeMvSourcesPublic.cmake). [1] [2] [3] [4]
  • Added new parameter structs for project operations (ProjectOpenParameters, ProjectSaveParameters, etc.) and removed the legacy ProjectSerializationTask and its related code from AbstractProjectManager. [1] [2] [3] [4] [5] [6]
  • Added an abstract method getWorkflowPlanExecutor() to AbstractProjectManager to provide access to the new workflow plan executor.

Codec and compression support:

  • Added Zstandard (zstd) as a dependency via CPM in CMake and linked it to the application (ManiVault/CMakeLists.txt). [1] [2]
  • Registered new codec-related classes (e.g., ZstdBlobCodec, PassthroughBlobCodec, their factories, and settings actions) in the build system. [1] [2] [3] [4]

Actions and utility improvements:

  • Added new action and utility classes for workflow and codec management, including CodecSettingsAction, ActionOperation, and several action recipe classes. [1] [2] [3] [4] [5]

Build system and source organization:

  • Updated CMake source groups and lists to include new workflow, codec, and action classes, and commented out the legacy serialization group. [1] [2] [3] [4]

These changes modernize project serialization and operation handling, add robust compression support, and prepare the codebase for future extensibility.

@ThomasKroes ThomasKroes self-assigned this Mar 12, 2026
@ThomasKroes ThomasKroes added enhancement New feature or request amendment A change to the code to improve the quality architecture labels Mar 12, 2026
@ThomasKroes ThomasKroes marked this pull request as draft March 12, 2026 15:38
Replace fully-qualified util::ParallelizationOverride with the unqualified ParallelizationOverride and remove a commented-out WorkflowAsyncLauncher::startWorkflowAsync block. Cleans up dead/commented code and aligns type usage with current imports; no functional behavior changes intended.
Introduce a QThreadPool member to WorkflowPlanExecutor and expose control/accessors.

- Add a QThreadPool _threadPool member and public threadPool() accessors.
- Initialize the pool in the constructor: setObjectName, default max threads to QThread::idealThreadCount(), and set an expiry timeout to keep threads warm.
- Add setMaxWorkerThreadCount/getMaxWorkerThreadCount implementations (with a minimum of 1) to configure the pool at runtime.
- Extend the AbstractWorkflowPlanExecutor interface with pure-virtual set/get methods so implementations can expose worker-thread configuration.

This allows tuning and reusing a shared worker thread pool for workflow job execution and makes worker thread sizing configurable.
Change JobFunction to take a non-const Job& so callback functions can modify job state (e.g. call setResult()). Update WorkflowPlan::Job::run() to invoke the new signature using a const_cast to pass a non-const reference. Also remove an unused include (Task.h) from WorkflowPlan.cpp.
Introduce ProjectOperationParameters as a base for project operations and add a _maxParallelThreads field; ProjectOpenParameters, ProjectImportParameters, ProjectSaveParameters and ProjectPublishParameters now inherit from it. Update the Open Project dialog to expose a Parallel toggle and Maximum number of threads control, persist their defaults via Application settings (using a ProjectOpenParameters/Default/ prefix), and restore/save the working directory and parallel settings. Also wire UI enable/disable behavior for the max threads control and apply the selected values to the returned parameters. Minor cleanup: use auto for jobCount in WorkflowPlanExecutor to avoid explicit int.
Replace the ParallelizationOverride enum-based flow with a simpler boolean parallel flag and explicit max-thread count across execution options and project operation parameters. Project parameter getters (open/import/save/publish) now accept an optional filePath and only show file dialogs when filePath is empty; ProjectManager uses the returned parameters to set the executor thread count and to pass parallel/max-thread settings into workflow execution. Add recursive raw-block detection helpers (isVariantMapRawBlockObject, findRawBlockObject) in serialization and use them to sort datasets by presence of raw block data and size in DataHierarchyManager, improving load ordering. Update WorkflowPlanExecutor to honor the new _parallel flag (force sequential when false). Also include small misc fix in Miscellaneous.cpp and adjust related headers/exports.
Add QDataStream-based serializeVariantMap/deserializeVariantMap and export their declarations. Update DatasetImpl to encode properties to a raw variant map when serializing and decode them back when deserializing, with try/catch logging on failures. Also adjust decodeDataBufferFromVariantMap call to pass the block size parameter. These changes centralize binary QVariantMap (de)serialization and add error handling to avoid crashing on malformed property data.
Add configurable parallelism and ensure workflow executors use worker threads. Key changes:

- Introduce UI for parallel settings in the Save dialog (Parallel toggle + max threads control), persist defaults under ProjectSaveParameters/Default, and store chosen directory and thread settings.
- Pass parallel/max-thread options through WorkflowPlan::execute and executeAsync; WorkflowPlan now sets the executor's max worker thread count before execution.
- Make several save workflow stages run on CurrentWorkerThread instead of GuiThread.
- Add a safety check for null executors in executeAsync and log a warning.
- Minor fixes: early return in openProject when parameters are invalid and small refactors/formatting in project save/open flows.

These changes allow project save/open workflows to actually run on worker threads with configurable parallelism and persist the user's preferences.
Enable verbose qDebug logging throughout WorkflowPlanExecutor to aid tracing of execution, stage/job dispatch, and thread usage. Configure the executor's thread pool per-execution by calling setMaxWorkerThreadCount from executeRoot/executeChild using executionOptions, and remove prior pre-configuration in WorkflowPlan::execute/executeAsync. Fix a lambda capture/move of executionOptions to avoid double-moving and pass options by value into executeOnCurrentThread/executeRoot.

Also update the serialization API by removing the concurrencyMode parameters from rawDataToVariantMap, decodeDataBufferFromVariantMap, and populateDataBufferFromVariantMap, replacing explicit parallel-stage addStage(...) calls with addParallelStage(...) and updating callers accordingly. These changes simplify concurrency handling in serialization and align call sites with the new signatures.
Decouple the QThreadPool from WorkflowExecutionState by moving ownership into WorkflowExecutionContext as a shared_ptr. makeRoot now creates and configures the pool (object name, max thread count based on execution options, expiry timeout) and passes it into child contexts; createChild forwards the same pool. Call sites (WorkflowPlanExecutor) updated to use WorkflowExecutionContext::getThreadPool(), and the thread-pool member/methods were removed from WorkflowExecutionState.
Replace datasetList (vector<pair<QVariantMap,bool>>) with a simpler vector<QVariantMap> (datasetMaps). Update enumerateDatasetNames to collect dataset maps only, simplify the sort comparator to operate on maps directly using findRawBlockObject, and remove the unused derived/proxy flag handling and commented partition/reverse code. Add a qDebug log to print dataset ID, name and size, and adjust the subsequent loop to iterate over datasetMaps. This cleans up collection/sorting logic and removes dead code.
Rename threadPool() to getThreadPool() in WorkflowPlanExecutor and update callers; add debug logging for dataset load and progress emissions. Introduce decodeBlockFromFileTo API and change decoding pipeline to run decode jobs first, then validate and memcpy decoded blocks into the target buffer in a sequential "Copy blocks" stage (keeps commented parallel copy attempt). Simplify BlobCodec file decode paths and minor cleanup. Remove WorkflowExecutionNotifier usage from WorkflowExecutionState and stop emitting notifyProgress/messages in WorkflowExecutionContext; also silence WorkflowPlan job verbose logging. These changes streamline decoding into preallocated buffers, surface more debug info, and simplify execution state notifications.
Introduce WorkflowPlan::executeOnCurrentThread and use it for synchronous decoding; switch decode workflow execution to run on the current thread via executeOnCurrentThread. Adjust progress handling: stop force-setting job progress to 1.0 in worker thread execution and warn when adding a child after progress has already started. Remove the Job::setWeight setter and add WorkflowPlan as a friend of AbstractWorkflowPlanExecutor. Also comment out an old memcpy loop in Serialization.cpp as part of the decode changes.
Introduce SharedWorkflowPlanExecutor (std::shared_ptr<AbstractWorkflowPlanExecutor>) and update the WorkflowPlan API and implementations to accept the shared pointer by const reference. Update AbstractProjectManager to return the shared executor and adjust call sites (DataHierarchyManager, Serialization) to pass the shared_ptr instead of dereferencing a raw pointer. Remove several active-workflow virtual methods from AbstractProjectManager and add a missing QElapsedTimer include in AbstractWorkflowPlanExecutor.h.
Add job progress modes and propagate them through execution contexts; switch WorkflowResult to hold a pointer to the execution context. Key changes:

- Add JobProgressMode enum (Automatic, Atomic, Nested) to WorkflowPlan and store per-Job progressMode.
- Thread affinity and execution invocation adjustments: specify JobThreadAffinity in DataHierarchyManager and Serialization, and use executeOnCurrentThread for plan execution.
- Extend WorkflowExecutionContext to store/get progressMode and accept it when creating child contexts; createChild uses progressMode to choose progress node behavior (atomic vs. child-based).
- Update WorkflowPlanExecutor to create child/job contexts with the job's progressMode and to finalize progress for Atomic and Automatic (no children) jobs; remove redundant try/catch in nested execution path.
- Change WorkflowResult to hold a WorkflowExecutionContext* instead of copying the context; update accessors to handle null pointer.
- ProjectManager now constructs a shared WorkflowPlanExecutor.
- Minor: add debugging output when loading datasets.

These changes enable finer-grained control of job completion semantics (atomic vs. nested progress), avoid expensive context copies, and ensure correct progress bookkeeping across different job types.
Introduce helpers to read and estimate raw-block sizes (getRawBlockObjectSize, estimateRawBlockTotalSize) and use them when loading datasets to report sizes. Add Job::weighted to set job weights and apply a weight to dataset-load jobs so the workflow scheduler can account for dataset size. Implement WorkflowResultFuture methods (constructors, status/query helpers, makeReady/fromFuture) in the .cpp and simplify the header. Minor cleanup: replace direct Size access and some debug logs with the new helpers.
Use raw block object size to weight dataset load jobs (fallback to 1.0) and simplify lambda captures in DataHierarchyManager. Switch from executeOnCurrentThread to execute when scheduling workflow plans so plans run on the provided executor. In toVariantMap, specify job thread affinity and progress mode for dataset save jobs. In Serialization, skip adding/executing the encode stage when there are no jobs, restore the decode loop to memcpy decoded blocks into the destination buffer, and use execute for the decode workflow plan to run on the shared executor. These changes fix scheduling/weighting and ensure encoding/decoding stages behave correctly and avoid no-op stages.
Replace intermediate decoded QByteArray copies with direct in-place decoding: introduce a char* overload populateDataBufferFromVariantMap(...) that writes decoded blocks straight into a caller-provided buffer, and add a QByteArray convenience overload that resizes then calls the char* variant. Update callers (Set, ClusterData, PointData) to use the new API and remove extra memcpy/resizing. Refactor Serialization to use decodeBlockFromFileTo to stream blocks into the destination buffer and remove the sequential copy stage. Add simple Windows-only memory stats and logMemory utilities (and a couple debug logMemory calls); update Serialization header documentation and function declarations accordingly.
Change streaming of inline encoded blocks to decode directly into the provided destination buffer instead of allocating temporary decoded buffers. Introduces decodeBlockFromBase64To(decodeJob, createCodec, destination) and updates populateDataBufferFromVariantMap to use it. Add a decoded-size mismatch check in the ZstdBlobCodec decoder and tidy up decode call formatting. Also update BlobCodec::decodeTo docs to recommend overriding for in-place decoding. Misc: comment out dataset Properties loading and add/remove some debug/log statements in PointData and DataHierarchyManager.
Add optimized serialization/deserialization for QVariantMaps/Lists to reduce memory and payload size for large homogeneous lists. New APIs and helpers (getVariantListHomogenousType, saveOptimizedVariantMap/saveOptimizedVariant/saveOptimizedVariantList, loadOptimizedVariantMap/loadOptimizedVariant/loadOptimizedVariantList and typed helpers) convert big homogeneous QVariantList values into compact typed arrays (Int32/UInt32/Int64/UInt64/Float32/Float64/Bool/QString) and nested maps. Bool arrays are stored as byte arrays; MinimumOptimizedListSize is 1024 to avoid overhead on small lists. DatasetImpl now uses loadOptimizedVariant/saveOptimizedVariantMap when reading/writing the "Properties" entry to enable the optimizations. Also avoid an extra QByteArray copy in deserializeVariantMap by constructing QDataStream directly from the input bytes, and restore a memory log call before loading raw point data in PointData. Header declarations for the new functions were added.
Add dedicated loadStringList to deserialize optimized QStringList blocks via QDataStream (Qt_6_8), validate stream status and element count, and convert to QVariantList. Refactor loadOptimizedVariantList to delegate QStringList handling to the new helper while preserving existing typed-array deserialization paths. Remove several debug/logMemory/qDebug calls from PointData::fromVariantMap and Points::fromVariantMap. Also add a QDir entryList call in saveStringList (currently unused) when accessing the project's temporary save directory.
Introduce a metrics subsystem to track workflow execution values and report bytes loaded when opening projects. Adds WorkflowMetric and WorkflowExecutionMetrics (headers and implementations), integrates metrics into WorkflowExecutionState, and stores/snapshots metrics on WorkflowResult via WorkflowPlanExecutor. ProjectOpenWorkflowPlan now registers a "project.data.bytes_loaded" integer metric, Serialization increments that metric when decoding data, and ProjectManager displays the human-readable bytes loaded in the project-open notification. Also updates CMake lists to include the new sources and silences some debug output in DataHierarchyManager.
Replace beginFilterChange()/endFilterChange() sequences with invalidateFilter() in several filter models (OptionAction::StringsFilterModel, ColorSchemesFilterModel, ClustersFilterModel, ColorMapFilterModel) to simplify and reliably reapply filters. Simplify color map mirroring by using QImage::mirrored instead of conditional flipped() calls. Remove Qt6::TaskTree linking and the QtTaskTree include from CMake/headers (cleanup of duplicate/unused TaskTree references). Add missing QtConcurrent include in WorkflowPlanExecutor. Changes touch CMakeLists.txt, multiple action/model source files, ClusterData plugin, WorkflowPlanExecutor.cpp and Workflow.h.
Make cluster deserialization version-aware and add compatibility for app versions prior to 5.0.0: fromVariantMap now checks the project application version and delegates legacy payloads to new fromVariantMapPre500 which reconstructs packed indices and cluster entries. Improve error handling by catching exceptions and logging failures.

Also: comment out legacy sparse-point loading in PointData::fromVariantMap and adjust Points::toVariantMap to call rawDataToVariantMap without the removed boolean parameter. In Serialization::populateDataBufferFromVariantMap, stop requiring the CompressedSize key and default it to 0 when absent to make block decoding more robust.
Make PointData deserialization aware of project application version: fromVariantMap now checks the current project's application version and dispatches to fromVariantMapPre500 for projects older than 5.0.0. Implemented new fromVariantMapPre500 (and a stub for Points::fromVariantMapPre500) and updated PointData.h to declare the new method. Also improved dense-data parsing, added error handling around data population, and left sparse-data handling commented for future work. Minor cleanup in ClusterData::fromVariantMap removed an unused variable.
Detect application version and run legacy deserialization for data saved by versions < 1.5.0. Introduces PointDataLegacySerialization include and routes older VariantMap formats through legacy::... helpers instead of the inline fromVariantMapPre150 implementation.

Other cleanups:
- Refactored fromVariantMapWorkflow to check app version, use clearer plan naming, and split allocate vs populate stages.
- Use safer QVariant conversions (toULongLong, QVariant::value with defaults) and consistent size_t types.
- Adjusted populateBytesFromBlobMapWorkflow call to pass the Raw map directly.
- Added a legacy loading stage for Points when opening older projects and delegated base dataset loading to fromVariantMapWorkflow.
- Removed the old fromVariantMapPre150 declaration from the header (moved to legacy implementation).

These changes restore backward compatibility while keeping current deserialization flow intact.
Replace the synchronous toVariantMap serialization with a workflow-based approach: rename toVariantMap to toVariantMapWorkflow and change its return type to UniqueWorkflowPlan. The new implementation builds a WorkflowPlan that adds a "Save" stage which publishes raw dense or sparse data into the execution context. Update Points::toVariantMapWorkflow to add a nested stage that runs the PointData raw-data workflow and store related metadata (number of points, dense flag). Add/update header docs for the new workflow-based fromVariantMapWorkflow and toVariantMapWorkflow APIs. This refactors serialization to be staged/asynchronous and enables publishing raw payloads during workflow execution.
Add a WorkflowHandle type and migrate workflow result publishing to an output-based API. Introduces WorkflowHandle (header + implementation) and exposes it in CMake. WorkflowExecutionContext now tracks an output id, supports registering child contexts, setting outputs (setOutput) and taking outputs by id or WorkflowHandle; child lookup and parent access helpers were added. WorkflowExecutionState stores outputs with thread-safe access (setOutput/takeOutput). Replaces multiple executionContext->publishResultValue(...) calls with executionContext->setOutput(...). WorkflowPlan API was changed so stage-adding methods return WorkflowHandle (addStage/addSequentialStage/addParallelStage/addNestedWorkflowStage/etc.) and internal stage/job IDs are generated to produce handles. TaskflowWorkflowPlanExecutor sets stage output ids when compiling stages. Misc: update various workflows (PointData, ClusterData, ClustersSerializer, ActionsManager, DataHierarchyManager, EventManager, PluginManager, WorkspaceManager, Serializable) to use the new output API, tweak PointData/Points logic and add debug traces, and update CMake sources list. Note: These are API/behavior changes and may be breaking for callers relying on the old publishResultValue/getResultScope behavior.
Introduce output-forwarding support so nested workflow outputs can propagate to their parent contexts. WorkflowExecutionContext: add _forwardOutputToParent flag, isOutputForwarding(), setOutputForwarding(), and hasExplicitOutputId(). TaskflowWorkflowPlanExecutor: when creating stage contexts, inherit the parent's outputId if output forwarding is enabled; for compiled child jobs set the child's outputId and enable forwarding, then capture the nested output after subflow completion and apply it to the parent job context if valid. This ensures explicit output IDs and nested results are correctly forwarded and picked up by parent contexts.
Refactor serialization workflows and standardize how stage outputs are produced/consumed across the codebase. Key changes:

- Project/Points/PointData/DataHierarchyManager: switch from ad-hoc scoped contexts to using WorkflowExecutionContext outputs (takeOutput/setOutput) and nested workflow stages; assemble final maps by collecting outputs from nested handles.
- Serializable: remove synchronous toJsonDocumentScoped/toJsonFileScoped; introduce toJsonDocument(), toJsonFile(), and a new toJsonFileWorkflow() that writes JSON as a workflow stage. Replace scoped calls with toJsonFile/toJsonFileWorkflow where appropriate.
- TaskflowWorkflowPlanExecutor: prefer final output from root context (takeOutput) and fall back to legacy result values; add getFinalStageHandle() and integrate publishing of final stage output into compiled workflow; remove reliance on output-forwarding flag on contexts.
- ProjectSaveWorkflowPlan/WorkspaceManager/ApplicationConfigurationAction/PresetsAction: use new workflow-based JSON file saving (toJsonFile/toJsonFileWorkflow) instead of scoped helpers.
- PointData/Points: improve handling of raw/sparse data serialization (preserve result map across stages, use reinterpret_cast, avoid early returns missing output), and thread-affinity annotations for dataset stages.
- DataHierarchyManager: collect item and dataset maps via nested stages, then assemble hierarchy from collected maps; remove legacy context object usage and add debug log.

Also includes minor fixes and cleanups (casts, lambda captures, small API adjustments) to support the unified workflow contract.
Remove the thread-safe ToVariantMapWorkflowContext and refactor toVariantMapWorkflow to pass intermediate data between workflow stages via the WorkflowExecutionContext outputs. Add explicit sequential stages to save the current workspace and then save dock managers, using executionContext->setOutput / takeOutput to transfer the "CurrentWorkspace" map instead of a shared mutex-protected map. Simplifies concurrency, removes the mutex/variant-map helper, and consolidates construction of the DockManagers entry into the workspace map.
Stop embedding the Dataset variant in DataHierarchyItem::toVariantMapScoped to avoid serializing the dataset inline. Remove the noisy per-serializer warning for >=1000 clusters in ClustersSerializer and instead emit a higher-threshold warning (>500,000 clusters) in Clusters::toVariantMapWorkflow before saving raw data, including a human-readable count. This reduces spurious warnings for moderately sized datasets and warns only for extremely large cluster sets that may impact serialization performance.
Migrate synchronous property/dataset serialization to workflow-based plans. PropertiesSerializer::toVariantMap was replaced by toVariantMapWorkflow that returns a UniqueWorkflowPlan and adds a Serialize stage; header updated with docs. DatasetImpl now provides toVariantMapWorkflow (replacing the old toVariantMap and removing the legacy fromVariantMapPre150), composing nested workflow stages for WidgetAction and PropertiesSerializer and assembling the final QVariantMap in a sequential stage. PointData workflow stage lambdas were simplified and explicit GUI thread affinity arguments removed. This change enables asynchronous, composable serialization and requires callers to use the new workflow-based APIs.
Replace ad-hoc VariantMapWorkflowContext usage with explicit nested workflow stages and unify stage naming/handling across ClusterData, Clusters, PointData and Points.

- ClusterData: remove manual context; add nested "Save raw data base" and "Serialize clusters" stages; add Preflight warning for very large cluster sets; assemble final output from nested stage outputs in a dedicated "Save data" stage.
- Clusters: remove context and sequential context stages; use nested "Save dataset base" and "Encode raw data" stages and compose the final map by taking outputs from those stages.
- PointData / Points: rename stage handle variables from *Handle to *Stage, replace resultMap with outputMap, use executionContext->takeOutput(...) consistently and insert serialized/raw payloads into the dataset map; streamline saving/encoding flow.

Overall this makes the save/export workflows more modular, consistent, and easier to compose and reason about.
Qualify the WidgetAction call with this-> in Set.cpp to resolve dependent-name lookup / ensure correct member lookup in the lambda. Add Doxygen-style documentation to Set.h for fromVariantMapWorkflow and toVariantMapWorkflow, describing their purpose, parameters, return values, and referencing the Serializable contract and execution semantics.
Convert stage outputs to maps when publishing the project (plugins, actions, events, workspaces) so the resulting variant map contains concrete QVariantMap entries. When writing JSON, wrap the created map under the serialization name so the top-level JSON document is a valid object. Also remove explicit thread-affinity and priority arguments from nested workflow stage creation to rely on default parameters and simplify the API usage.

Files changed: ManiVault/src/Project.cpp, ManiVault/src/private/ProjectSaveWorkflowPlan.cpp, ManiVault/src/util/Serializable.cpp.
Drop the optional parent SharedWorkflowExecutionContext parameter from fromVariantMapWorkflow throughout the codebase and update all call sites and lambdas to use the execution context provided by workflow stages. Added Doxygen-style documentation for fromVariantMapWorkflow/toVariantMapWorkflow in multiple headers. Files updated include Project, Set, ClusterData, PointData, ActionsManager, DataHierarchyManager, EventManager, PluginManager, WorkspaceManager, ProjectOpenWorkflowPlan and related headers. This simplifies the serialization API by avoiding redundant context forwarding and clarifies stage-local executionContext usage.
Tidy and refactor workspace JSON handling and project save workflow. Initialize QJsonDocument inline when reading workspace JSON. Convert several ProjectSaveWorkflowPlan stages from sequential to nested workflow stages, remove explicit thread-affinity/priority args, and return the workspace save step as a nested workflow (using toJsonFileWorkflow) instead of calling saveWorkspace directly. In WorkspaceManager: simplify loadWorkspace flow by removing surrounding try/catch, reformat control flow, fix file dialog widget population and preview binding (use workspace.getCommentsAction() for comments), ensure begin/end load calls are paired correctly, and adjust toVariantMapWorkflow to extract the "CurrentWorkspace" map from the saved output.
Remove the executionContext parameter from ClustersSerializer::fromVariantMapWorkflow and update its declaration and call-site in ClusterData. Rework the workflow into clearer sequential stages: read metadata and indices (with memcpy into allIndices) and a separate "Prepare clusters" stage that resizes the clusters vector. Rename local output map (result -> outputMap) in toVariantMapWorkflow and call executionContext->setOutput(outputMap). Minor formatting and layout cleanups.
Clean up ClusterData/ClustersSerializer/DataHierarchyManager/ProjectManager: remove unused lambda parameters, tidy map assignments (merge serialize output into save output and use operator[] for explicit keys), strip a stray debug qDebug() and an extraneous comment, and format serialized blob assignments. Also set explicit named fields for WorkflowExecutionOptions (including maxConsoleLogDepth) when executing the save workflow. These changes are small refactors to reduce warnings, improve clarity, and adjust executor configuration.
Switch dimension and cluster serialization to use the workflow execution context and nested workflows. Key changes:

- Project.cpp: removed an obsolete commented manager load stage.
- ClustersSerializer: lambda now uses executionContext (instead of parentExecutionContext) when extracting blob bytes.
- DimensionNamesSerializer: converted to a workflow-returning API (toVariantMapWorkflow) that creates a sequential stage to serialize dimension names and sets the stage output; header updated accordingly.
- PointData: added a nested "Serialize dimensions" workflow stage that invokes the new DimensionNamesSerializer workflow, and updated the save stage to consume the nested stage's output.

These changes enable asynchronous/nested workflow composition and ensure blob conversions use the correct execution context.
Convert manager and dataset load jobs into nested workflow stages so each manager/dataset can return its own UniqueWorkflowPlan and execute in the shared execution context. Forward executionContext to bytesFromBlobVariantMap in Points plugin to ensure proper deserialization. Remove prior batched/parallel dataset-loading heuristics in DataHierarchyManager and instead add per-dataset nested workflow stages; also add a debug qDebug() log when loading. These changes simplify orchestration and ensure nested workflows receive the execution context they need.
Introduce support for scheduling nested workflow jobs as parallel stages and adjust executor behavior. WorkflowPlan: add JobProgressMode parameter for nested jobs, expose getHandle(), and add addParallelNestedWorkflowStage with validation for nested/GUI-thread jobs. DataHierarchyManager: remove debug log, pre-reserve dataset job vector, build dataset jobs as nested workflow jobs (with progress mode Nested) and add them as a parallel "Load datasets" stage. TaskflowWorkflowPlanExecutor: minor signature formatting, set child context output id to the job id, and stop propagating nested workflow outputs (takeOutput commented out). These changes enable running dataset nested workflows in parallel and improve job/progress handling.
Rename DataHierarchyItem::toVariantMapScoped -> toVariantMap and update call sites to use the simpler serialization API. Add debug logging in Project and DataHierarchyManager to inspect output maps.

Introduce explicit Job output ID support in WorkflowPlan (setOutputId/getOutputId and backing field) and set the nested-stage job's outputId to the stage handle so nested workflow outputs can be routed by handle. Change addNestedWorkflowStage to create and register a Stage with its Job output ID assigned.

Update TaskflowWorkflowPlanExecutor to create nested child contexts using job metadata (name, weight, progress mode) and to set the childContext output id from the job's outputId before creating the nested plan. Comment out automatic propagation of nested outputs to the parent (left as explicit/commented code).

Fix JSON serialization by converting executionContext outputs to maps (toMap()) before building QJsonDocument. Replace a batched parallel stage call with addParallelStage for block decode jobs.

Overall these changes centralize nested-job output routing, clean up serialization calls, and add lightweight debugging to help trace saved item/dataset maps.
Remove leftover qDebug() statements from toVariantMapWorkflow in Project.cpp and DataHierarchyManager.cpp to reduce noisy console output during workflow serialization. No functional changes; only debugging prints were removed.
Increase default block size from 32 to 64 MiB and lower Zstd default compression level from 3 to 2. Also remove the explicit ZstdCodecSettingsAction destructor (implementation and declaration) to rely on the compiler-generated default, cleaning up unused trivial code.
Change dataset load jobs in DataHierarchyManager from WorkflowPlan::JobProgressMode::Nested to ::Atomic so dataset loads are reported as atomic units. In WorkflowExecutionContext::createChild, tighten the conditions that trigger the "adding child to progress node after progress already started" warning (only when parent progressMode is Automatic, child is not a NestedWorkflow, parent progress is in (0,1), and effective weight > 0) and include richer diagnostic fields (parent path, node types, progressMode, child info). Also perform minor formatting cleanup in makeRoot.
Adjust workflow stage signatures and thread affinities, and harden nested workflow execution and cluster deserialization.

Highlights:
- Project/DataHierarchyManager/DimensionNames: adapt stage lambdas to accept SharedWorkflowExecutionContext and specify GUI thread affinity where needed (populateDataHierarchy runs on GuiThread).
- ClustersSerializer: rewrite deserialization to use parallel data-loading jobs that populate byte buffers, then sequentially deserialize headers and rebuild clusters; remove old commented rebuild helper. Uses populateBytesFromBlobMapWorkflow and avoids nested per-chunk rebuild complexity.
- TaskflowWorkflowPlanExecutor: wrap nested workflow creation/compilation/join in try/catch, report failures to child context for ManiVaultException, std::exception and unknown exceptions, and ensure lifecycle.finish() is called on success.
- WorkflowPlan.h: add static_assert fallback for unsupported stage function signatures.

These changes improve correctness around thread affinity, make cluster loading more efficient and robust, and provide clearer failure reporting for nested workflows.
Move the DimensionNamesLoadContext into the .cpp as a local Context struct and rename fields for clarity. Replace inline blob decoding with a nested populateBytesFromBlobMapWorkflow stage, update workflow stage signatures and usages, and remove the now-unnecessary protected context struct from the header. This isolates implementation details from the header, simplifies asynchronous byte population, and cleans up naming.
Adds stage weight parameters and refines threading/affinity for workflow stages to better prioritize work (e.g. manager load stages in Project, dataset save/load stages, and project-open nested stage). Ensures task progress reflects workflow state by adding WorkflowExecutionContext::syncTaskProgress and calling it from reportFinished/reportFailed/reportSkipped/setProgress; also makes those report methods non-const so they can sync progress. Adjusts TaskflowWorkflowPlanExecutor to use a modal GUI scope when reporting progress, logs GUI scopes for debugging, and sets job progress to 1.0 for Atomic jobs after completion. Fixes an accidental early return in TasksFilterModel::filterAcceptsRow and bumps _maxConsoleLogDepth from 6 to 8. These changes improve progress reporting accuracy and UI behavior during workflow execution.
Replace the previous parallel dataset stage with a batched parallel stage to limit concurrency: compute cores via std::thread::hardware_concurrency(), derive batchSize = clamp(cores/4, 4, 16), and call addBatchedParallelStage("Load datasets", ...) instead of addParallelStage. Also remove a noisy qDebug() in DimensionNamesSerializer that printed dimension names. The old addParallelStage call is left commented for reference.
Set thread affinities and scheduling weights for multiple workflow stages, and tweak defaults and executor tracing.

- Add WorkflowPlan::JobThreadAffinity and weight parameters to nested/sequential stages in Project and ProjectSaveWorkflowPlan to better control scheduling.
- Increase DataHierarchyManager parallelism: batchSize now uses cores/2 (clamped 4..32) and Save datasets parallel stage weight raised to 80.
- Adjust default codec settings: block size default 64->32 MiB and zstd compression level 2->3.
- Enhance TaskflowWorkflowPlanExecutor: create a ChromeObserver, store it on the executor, and dump a chrome trace file (workflow_trace_YYYYMMDD_HHMMSS.json) after root execution for debugging; add necessary includes and member variable.

These changes aim to improve task scheduling, parallel I/O throughput, and developer visibility into taskflow execution.
Introduce a static helper makeTraceName(QString, QString) and use it to set human-readable task names for job tasks (including nested workflows), stage start/end tasks, and the workflow output task. Also tidy up some function signature formatting for compileSequentialStage/compileParallelStage. This improves traceability in Taskflow/ChromeObserver views by providing consistent names built from kind and element name.
Extract workflow-related headers/sources from util/ into a dedicated workflow/ module and update build config and references accordingly. CMakeMvSourcesPublic.cmake: add PUBLIC_WORKFLOW_* lists, remove those files from PUBLIC_UTIL lists, and add a Workflow source_group. Code changes: update includes (util/... -> workflow/...), switch types and using-directives to mv::workflow (e.g. UniqueWorkflowPlan, WorkflowMessage, WorkflowGuiThreadDispatcher, SharedWorkflowResult), and adjust various classes to use the workflow namespace. Misc: delete/relocate several legacy util workflow files and update call in DataHierarchyManager to use addBatchedParallelStage("Save datasets", ..., 8) instead of addParallelStage. This reorganizes the workflow API into its own module and aligns source references and build targets.
Add missing headers and small API/namespace fixes across the codebase: include util/Serialization, CoreInterface, Application, BlobCodec, SeverityLevel, ManiVaultException and various Qt headers where needed. Fully qualify SeverityLevel usages as util::SeverityLevel and update ManiVaultException constructor/members accordingly. Adjust a few namespaces/usings and minor callsites (e.g. Archiver usages, BlobCodec type returns, exception construction) to match the refined types. These are maintenance/refactor changes to resolve compile errors and standardize include/namespace usage.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

amendment A change to the code to improve the quality architecture enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant