Skip to content

Module id fix#294

Open
charanraju15 wants to merge 3 commits into
mainfrom
module_id_fix
Open

Module id fix#294
charanraju15 wants to merge 3 commits into
mainfrom
module_id_fix

Conversation

@charanraju15
Copy link
Copy Markdown
Contributor

@charanraju15 charanraju15 commented May 25, 2026

Summary

Normalizes library module id values to the dp:<pack>:<slug> convention documented in ADDING_PACKAGES_AND_MODULES.md, and aligns Mixpanel usage tracking and Qualitizer deployment-pack ids with those canonical ids.

Previously, many modules used legacy human-readable ids (CDF Common, P&ID Annotation), bare slugs (report_quality), or accelerator-style ids (dp:acc:…). New and updated modules now use stable pack-scoped ids that match package_id and folder layout.

Module id changes (16 module.toml updates)

Module Old id New id
common/cdf_common CDF Common dp:common:cdf_common
common/cdf_ingestion dp:acc:cdf_ingestion dp:common:cdf_ingestion
common/cdf_search dp:acc:industrial_tools:cdf_search dp:common:cdf_search
contextualization/cdf_entity_matching CDF Entity Matching dp:contextualization:cdf_entity_matching
contextualization/cdf_file_annotation dp:acc:contextualization:cdf_file_annotation dp:contextualization:cdf_file_annotation
contextualization/cdf_connection_sql dp:acc:ctx:cdf_connection_sql dp:contextualization:cdf_connection_sql
contextualization/cdf_p_and_id_annotation P&ID Annotation dp:contextualization:cdf_p_and_id_annotation
contextualization/cdf_p_and_id_parser P&ID Parser dp:contextualization:cdf_p_and_id_parser
dashboards/context_quality Contextualization Quality Dashboard dp:dashboards:context_quality
dashboards/project_health CDF Project Health Dashboard dp:dashboards:project_health
dashboards/report_quality report_quality dp:dashboards:report_quality
sourcesystem/cdf_oid_sync dp:acc:cdf_oid_sync dp:sourcesystem:cdf_oid_sync
tools/…/cdf_performance_testing Performance Testing Notebook dp:tool:cdf_performance_testing
tools/…/cdf_transformation_jobs_metric_explorer Transformation Job Metrics… dp:tool:cdf_transformation_jobs_metric_explorer
tools/apps/qualitizer Qualitizer Application dp:tool:qualitizer
atlas_ai/ootb_agents Atlas AI OOTB Agents dp:atlas_ai:ootb_agents

(title fields are unchanged for Toolkit UI.)

Mixpanel & usage tracking

  • fn-handle / notebook source updated to canonical ids in all handlers, Streamlit dashboards, marimo notebooks, and performance-testing notebooks.
  • mixpanel/module_lookup.csv rebuilt: 33 canonical rows + 11 legacy function-source aliases (e.g. dp:cdf_common → same label as dp:common:cdf_common) for projects not yet redeployed.
  • Added mixpanel/README.md describing lookup format and alias purpose.

Qualitizer

  • deployment-packs.ts pack ids updated to match canonical module ids (including sourcesystem packs).

Breaking changes

  • Toolkit module id values change for the modules above. Projects or automation that reference old ids (config, scripts, cherry-pick by id) must switch to the new dp:… ids after upgrading the library.
  • Mixpanel source on new deploys uses canonical ids; historical events still use old source values until functions/notebooks are redeployed (aliases in lookup CSV cover labeling only).

charanraju15 and others added 2 commits May 25, 2026 13:13
Align module.toml ids, fn-handle source values, Qualitizer deployment packs,
and module_lookup.csv with canonical naming; keep legacy source aliases for
older deployed functions.

Co-authored-by: Cursor <cursoragent@cursor.com>
Restore notebooks from main and apply only the canonical Mixpanel source
id change; the prior edit had reformatted entire ipynb files.

Co-authored-by: Cursor <cursoragent@cursor.com>
@charanraju15 charanraju15 requested a review from a team as a code owner May 25, 2026 07:57
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Fix mismatched ids in Qualitizer deployment packs and the custom template,
add canonical module id registry in modules/README, document ids in READMEs,
enforce allowed dp:<pack>: prefixes in validate_packages.py, and normalize
cdf_common display title in module.toml and Mixpanel lookup.

Co-authored-by: Cursor <cursoragent@cursor.com>
@valnaumova
Copy link
Copy Markdown
Contributor

/gemini review

Copy link
Copy Markdown
Contributor

@valnaumova valnaumova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for doing a detailed description. One small comment, I believe we agreed to deprecate

tools/…/cdf_performance_testing Performance Testing Notebook dp:tool:cdf_performance_testing
tools/…/cdf_transformation_jobs_metric_explorer Transformation Job Metrics… dp:tool:cdf_transformation_jobs_metric_explorer

@BergsethCognite ?

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request standardizes module IDs across all Toolkit modules, documentation, usage tracking scripts, and the Qualitizer application to use the canonical format dp:<package_short>:<slug>. It also introduces a validation check in validate_packages.py to enforce these ID prefixes. The review feedback suggests enhancing cross-platform compatibility on Windows by using .as_posix(), adopting defensive programming to guard against potential KeyError exceptions during dictionary access, and updating Python type hints to use standard collection types (dict instead of Dict) per Python 3.11+ guidelines.

Comment thread validate_packages.py
Comment on lines +177 to +181
if not module_id.startswith("dp:") or module_id.count(":") < 2:
rel_path = str(module_toml.relative_to(base_path_obj))
mismatches.append((rel_path, module_id, set()))
continue
rel_path = str(module_toml.parent.relative_to(base_path_obj))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

On Windows, calling str() on a Path object uses backslashes (\) as directory separators. Since the module paths in packages.toml use forward slashes (/), the membership check module_rel_path in package_data.get("modules", []) will fail on Windows.

To ensure cross-platform compatibility, use .as_posix() instead of str() when converting Path objects to strings.

Suggested change
if not module_id.startswith("dp:") or module_id.count(":") < 2:
rel_path = str(module_toml.relative_to(base_path_obj))
mismatches.append((rel_path, module_id, set()))
continue
rel_path = str(module_toml.parent.relative_to(base_path_obj))
if not module_id.startswith("dp:") or module_id.count(":") < 2:
rel_path = module_toml.relative_to(base_path_obj).as_posix()
mismatches.append((rel_path, module_id, set()))
continue
rel_path = module_toml.parent.relative_to(base_path_obj).as_posix()

Comment thread validate_packages.py
Comment on lines +155 to +157
for package_data in packages.values():
if module_rel_path in package_data.get("modules", []):
prefixes.add(_package_id_prefix(package_data["id"]))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The dictionary access package_data["id"] is unguarded and will raise a KeyError if any package definition in packages.toml is missing the id field.

To adhere to the defensive programming guidelines and avoid logic gaps, use .get("id") with a safety check.

Suggested change
for package_data in packages.values():
if module_rel_path in package_data.get("modules", []):
prefixes.add(_package_id_prefix(package_data["id"]))
for package_data in packages.values():
if module_rel_path in package_data.get("modules", []):
if pkg_id := package_data.get("id"):
prefixes.add(_package_id_prefix(pkg_id))
References
  1. Avoid unguarded dictionary access and ensure defensive programming. (link)

Comment thread validate_packages.py
Comment on lines +148 to +152
def _allowed_id_prefixes_for_module(
module_rel_path: str,
package_id: str,
packages: Dict[str, Any],
) -> set[str]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

In Python 3.11+, standard collection types should be used for type hints instead of importing from typing. Please use dict instead of Dict.

Suggested change
def _allowed_id_prefixes_for_module(
module_rel_path: str,
package_id: str,
packages: Dict[str, Any],
) -> set[str]:
def _allowed_id_prefixes_for_module(
module_rel_path: str,
package_id: str,
packages: dict[str, Any],
) -> set[str]:
References
  1. Use standard collection types (like dict and list) instead of typing.Dict and typing.List for type hints in Python 3.11+. (link)

Comment thread validate_packages.py
Comment on lines +161 to +164
def validate_module_id_prefixes(
base_path: str = "modules",
packages: Dict[str, Any] | None = None,
) -> bool:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

In Python 3.11+, standard collection types should be used for type hints instead of importing from typing. Please use dict instead of Dict.

Suggested change
def validate_module_id_prefixes(
base_path: str = "modules",
packages: Dict[str, Any] | None = None,
) -> bool:
def validate_module_id_prefixes(
base_path: str = "modules",
packages: dict[str, Any] | None = None,
) -> bool:
References
  1. Use standard collection types (like dict and list) instead of typing.Dict and typing.List for type hints in Python 3.11+. (link)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants