Skip to content

Ingestion module for foundational DP#276

Open
Aashutosh-cognite wants to merge 1 commit into
foundational-dp-cleanupfrom
foundational-dp-ingestion
Open

Ingestion module for foundational DP#276
Aashutosh-cognite wants to merge 1 commit into
foundational-dp-cleanupfrom
foundational-dp-ingestion

Conversation

@Aashutosh-cognite
Copy link
Copy Markdown
Contributor

@Aashutosh-cognite Aashutosh-cognite commented May 20, 2026

depends on PR #275

This module owns the ingestion workflow, transformation definitions, auth groups, and data model configuration tooling for the Foundation Deployment Pack. It lives under modules/common/ and is registered in packages.toml as part of dp:foundation.

What's included

Two-phase workflow driven entirely by config flags — no YAML editing required when toggling a source on or off:

  • Phase 1 (Population) — transformation tasks for PI, OPC-UA, and SAP run in parallel, landing data into the active DM views (ISATimeSeries, ISAAsset, Equipment, WorkOrder, Operation for ISA; FunctionalLocation, TimeSeriesData, Files for CFIHOS).
  • Phase 2 (Contextualization) — relationship transforms run after population completes, setting Equipment.asset and Operation.workOrder properties.

Which phases and tasks are included is controlled by enabledSources, enabledContextualization, and dataModelVariant in default.config.yaml.

Key files

  • scripts/build_workflow.py — generates wf_ingestion_v1.WorkflowVersion.yaml from per-task snippets based on the active config. Run with --check in CI to detect drift.
  • scripts/configure_datamodel.py — writes DM-variant variable overrides (schemaSpace, view names, instanceSpace) into all discovered config.<env>.yaml files, covering both contextualization modules (cdf_entity_matching, cdf_file_annotation) and source system modules (cdf_pi_foundation, cdf_sap_foundation, cdf_opcua_foundation, cdf_files_foundation).

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

@Aashutosh-cognite
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the cdf_ingestion_foundation module, which provides a framework for orchestrating two-phase ingestion workflows (population and contextualization) across PI, OPC-UA, and SAP source systems. It includes a Python generator script to build workflow versions from task snippets, various SQL transformations, and authorization group definitions. Review feedback primarily addresses Python style guide violations in the build script—such as import sorting, the need for typed data structures (dataclasses/Pydantic), and proper logging—as well as security recommendations to restrict overly broad wildcard scopes in the authorization group capabilities.

Comment thread modules/common/cdf_ingestion_foundation/scripts/build_workflow.py
Comment thread modules/common/cdf_ingestion_foundation/scripts/build_workflow.py Outdated
Comment thread modules/common/cdf_ingestion_foundation/scripts/build_workflow.py Outdated
Comment thread modules/common/cdf_ingestion_foundation/scripts/build_workflow.py
Copy link
Copy Markdown
Contributor

@BergsethCognite BergsethCognite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove the Transformation examples here - maybe just use this module to set up the generic auth groups?

@Aashutosh-cognite
Copy link
Copy Markdown
Contributor Author

I have removed the transformations - right now this just contains transformation configs and sql placeholder files. So should I remove that as well?

By generic auth groups you mean project level auth groups?

@Aashutosh-cognite Aashutosh-cognite force-pushed the foundational-dp-cleanup branch from a14c818 to c44274d Compare May 26, 2026 05:44
Orchestrates two-phase ingestion workflow (population + contextualization)
with config-driven task generation via build_workflow.py and
configure_datamodel.py. Rebased onto foundational-dp-cleanup so this PR
contains only the ingestion foundation module.

Co-authored-by: Cursor <cursoragent@cursor.com>
@Aashutosh-cognite Aashutosh-cognite force-pushed the foundational-dp-ingestion branch from 9b63eb7 to 48772c0 Compare May 26, 2026 10:00
@valnaumova
Copy link
Copy Markdown
Contributor

Rename cdf_ingestion_foundation to cdf_project_setup or something similar with the auth groups and a script that cleans auth groups and populates relevant configs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants