Module id fix#294
Conversation
Align module.toml ids, fn-handle source values, Qualitizer deployment packs, and module_lookup.csv with canonical naming; keep legacy source aliases for older deployed functions. Co-authored-by: Cursor <cursoragent@cursor.com>
Restore notebooks from main and apply only the canonical Mixpanel source id change; the prior edit had reformatted entire ipynb files. Co-authored-by: Cursor <cursoragent@cursor.com>
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. |
Fix mismatched ids in Qualitizer deployment packs and the custom template, add canonical module id registry in modules/README, document ids in READMEs, enforce allowed dp:<pack>: prefixes in validate_packages.py, and normalize cdf_common display title in module.toml and Mixpanel lookup. Co-authored-by: Cursor <cursoragent@cursor.com>
|
/gemini review |
valnaumova
left a comment
There was a problem hiding this comment.
Looks good, thanks for doing a detailed description. One small comment, I believe we agreed to deprecate
| tools/…/cdf_performance_testing | Performance Testing Notebook | dp:tool:cdf_performance_testing |
|---|---|---|
| tools/…/cdf_transformation_jobs_metric_explorer | Transformation Job Metrics… | dp:tool:cdf_transformation_jobs_metric_explorer |
There was a problem hiding this comment.
Code Review
This pull request standardizes module IDs across all Toolkit modules, documentation, usage tracking scripts, and the Qualitizer application to use the canonical format dp:<package_short>:<slug>. It also introduces a validation check in validate_packages.py to enforce these ID prefixes. The review feedback suggests enhancing cross-platform compatibility on Windows by using .as_posix(), adopting defensive programming to guard against potential KeyError exceptions during dictionary access, and updating Python type hints to use standard collection types (dict instead of Dict) per Python 3.11+ guidelines.
| if not module_id.startswith("dp:") or module_id.count(":") < 2: | ||
| rel_path = str(module_toml.relative_to(base_path_obj)) | ||
| mismatches.append((rel_path, module_id, set())) | ||
| continue | ||
| rel_path = str(module_toml.parent.relative_to(base_path_obj)) |
There was a problem hiding this comment.
On Windows, calling str() on a Path object uses backslashes (\) as directory separators. Since the module paths in packages.toml use forward slashes (/), the membership check module_rel_path in package_data.get("modules", []) will fail on Windows.
To ensure cross-platform compatibility, use .as_posix() instead of str() when converting Path objects to strings.
| if not module_id.startswith("dp:") or module_id.count(":") < 2: | |
| rel_path = str(module_toml.relative_to(base_path_obj)) | |
| mismatches.append((rel_path, module_id, set())) | |
| continue | |
| rel_path = str(module_toml.parent.relative_to(base_path_obj)) | |
| if not module_id.startswith("dp:") or module_id.count(":") < 2: | |
| rel_path = module_toml.relative_to(base_path_obj).as_posix() | |
| mismatches.append((rel_path, module_id, set())) | |
| continue | |
| rel_path = module_toml.parent.relative_to(base_path_obj).as_posix() |
| for package_data in packages.values(): | ||
| if module_rel_path in package_data.get("modules", []): | ||
| prefixes.add(_package_id_prefix(package_data["id"])) |
There was a problem hiding this comment.
The dictionary access package_data["id"] is unguarded and will raise a KeyError if any package definition in packages.toml is missing the id field.
To adhere to the defensive programming guidelines and avoid logic gaps, use .get("id") with a safety check.
| for package_data in packages.values(): | |
| if module_rel_path in package_data.get("modules", []): | |
| prefixes.add(_package_id_prefix(package_data["id"])) | |
| for package_data in packages.values(): | |
| if module_rel_path in package_data.get("modules", []): | |
| if pkg_id := package_data.get("id"): | |
| prefixes.add(_package_id_prefix(pkg_id)) |
References
- Avoid unguarded dictionary access and ensure defensive programming. (link)
| def _allowed_id_prefixes_for_module( | ||
| module_rel_path: str, | ||
| package_id: str, | ||
| packages: Dict[str, Any], | ||
| ) -> set[str]: |
There was a problem hiding this comment.
In Python 3.11+, standard collection types should be used for type hints instead of importing from typing. Please use dict instead of Dict.
| def _allowed_id_prefixes_for_module( | |
| module_rel_path: str, | |
| package_id: str, | |
| packages: Dict[str, Any], | |
| ) -> set[str]: | |
| def _allowed_id_prefixes_for_module( | |
| module_rel_path: str, | |
| package_id: str, | |
| packages: dict[str, Any], | |
| ) -> set[str]: |
References
- Use standard collection types (like
dictandlist) instead oftyping.Dictandtyping.Listfor type hints in Python 3.11+. (link)
| def validate_module_id_prefixes( | ||
| base_path: str = "modules", | ||
| packages: Dict[str, Any] | None = None, | ||
| ) -> bool: |
There was a problem hiding this comment.
In Python 3.11+, standard collection types should be used for type hints instead of importing from typing. Please use dict instead of Dict.
| def validate_module_id_prefixes( | |
| base_path: str = "modules", | |
| packages: Dict[str, Any] | None = None, | |
| ) -> bool: | |
| def validate_module_id_prefixes( | |
| base_path: str = "modules", | |
| packages: dict[str, Any] | None = None, | |
| ) -> bool: |
References
- Use standard collection types (like
dictandlist) instead oftyping.Dictandtyping.Listfor type hints in Python 3.11+. (link)
Summary
Normalizes library module
idvalues to thedp:<pack>:<slug>convention documented inADDING_PACKAGES_AND_MODULES.md, and aligns Mixpanel usage tracking and Qualitizer deployment-pack ids with those canonical ids.Previously, many modules used legacy human-readable ids (
CDF Common,P&ID Annotation), bare slugs (report_quality), or accelerator-style ids (dp:acc:…). New and updated modules now use stable pack-scoped ids that matchpackage_idand folder layout.Module id changes (16
module.tomlupdates)ididcommon/cdf_commonCDF Commondp:common:cdf_commoncommon/cdf_ingestiondp:acc:cdf_ingestiondp:common:cdf_ingestioncommon/cdf_searchdp:acc:industrial_tools:cdf_searchdp:common:cdf_searchcontextualization/cdf_entity_matchingCDF Entity Matchingdp:contextualization:cdf_entity_matchingcontextualization/cdf_file_annotationdp:acc:contextualization:cdf_file_annotationdp:contextualization:cdf_file_annotationcontextualization/cdf_connection_sqldp:acc:ctx:cdf_connection_sqldp:contextualization:cdf_connection_sqlcontextualization/cdf_p_and_id_annotationP&ID Annotationdp:contextualization:cdf_p_and_id_annotationcontextualization/cdf_p_and_id_parserP&ID Parserdp:contextualization:cdf_p_and_id_parserdashboards/context_qualityContextualization Quality Dashboarddp:dashboards:context_qualitydashboards/project_healthCDF Project Health Dashboarddp:dashboards:project_healthdashboards/report_qualityreport_qualitydp:dashboards:report_qualitysourcesystem/cdf_oid_syncdp:acc:cdf_oid_syncdp:sourcesystem:cdf_oid_synctools/…/cdf_performance_testingPerformance Testing Notebookdp:tool:cdf_performance_testingtools/…/cdf_transformation_jobs_metric_explorerTransformation Job Metrics…dp:tool:cdf_transformation_jobs_metric_explorertools/apps/qualitizerQualitizer Applicationdp:tool:qualitizeratlas_ai/ootb_agentsAtlas AI OOTB Agentsdp:atlas_ai:ootb_agents(
titlefields are unchanged for Toolkit UI.)Mixpanel & usage tracking
fn-handle/ notebooksourceupdated to canonical ids in all handlers, Streamlit dashboards, marimo notebooks, and performance-testing notebooks.mixpanel/module_lookup.csvrebuilt: 33 canonical rows + 11 legacy function-source aliases (e.g.dp:cdf_common→ same label asdp:common:cdf_common) for projects not yet redeployed.mixpanel/README.mddescribing lookup format and alias purpose.Qualitizer
deployment-packs.tspack ids updated to match canonical module ids (including sourcesystem packs).Breaking changes
idvalues change for the modules above. Projects or automation that reference old ids (config, scripts, cherry-pick by id) must switch to the newdp:…ids after upgrading the library.sourceon new deploys uses canonical ids; historical events still use oldsourcevalues until functions/notebooks are redeployed (aliases in lookup CSV cover labeling only).