Skip to content

[DOP-27960] Move to dataset symlink group#476

Merged
IlyasDevelopment merged 1 commit into
developfrom
feature/DOP-27960
Jul 2, 2026
Merged

[DOP-27960] Move to dataset symlink group#476
IlyasDevelopment merged 1 commit into
developfrom
feature/DOP-27960

Conversation

@IlyasDevelopment

@IlyasDevelopment IlyasDevelopment commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Change Summary

Introduce fingerprint-based dataset symlink group storage while preserving the old directed symlink interface.

  • Add dataset_symlink_group table and migrate existing dataset_symlink rows into fingerprinted group members.
  • Replace the physical dataset_symlink table with a compatibility view that reconstructs the old from_dataset_id/to_dataset_id/type shape.
  • Add DatasetSymlinkGroupDTO and deterministic symlink fingerprint computation.
  • Update consumer extraction flow to emit symlink groups instead of directed symlink pairs.
  • Update BatchExtractionResult and DatabaseSaver to deduplicate and bulk-insert symlink groups.
  • Keep server lineage read behavior one-hop via the dataset_symlink compatibility view.
  • Update seed data, factories, and tests for group-based symlink writes.
  • Add migration coverage for backfill correctness, fingerprint grouping, role assignment, and compatibility view output.

Checklist

  • Commit message and PR title is comprehensive
  • Keep the change as small as possible
  • Unit and integration tests for the changes exist
  • Tests pass on CI and coverage does not decrease
  • Documentation reflects the changes where applicable
  • docs/changelog/next_release/<pull request or issue id>.<change type>.rst file added describing change
    (see CONTRIBUTING.rst for details.)
  • My PR is ready to review.

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

Coverage

Coverage Report •
FileStmtsMissBranchBrPartCoverMissing
data_rentgen/consumer
   saver.py1601346491%42–43, 183–189, 191–192, 195–196
data_rentgen/consumer/extractors/impl
   flink.py3716197%42->43, 43
   spark.py64114298%88->90, 108->109, 109
data_rentgen/db/migrations/versions
   2026-06-29_4a02d2d5c8b1_create_dataset_symlink_group.py3737880%11–12, 14, 21, 24–27, 30–31, 48–50, 111–114, 117–119, 121, 145, 153–155, 158–159, 161–163, 165, 173, 182, 186, 193, 201–202
data_rentgen/db/models
   dataset_symlink.py1710094%21
TOTAL82251070132023886% 

Comment thread data_rentgen/consumer/extractors/generic/dataset.py Outdated
Comment thread data_rentgen/dto/base.py Outdated
Comment thread data_rentgen/consumer/extractors/generic/dataset.py Outdated
Comment thread tests/test_consumer/test_extractors/test_extractors_batch_airflow.py Outdated
Comment thread data_rentgen/db/repositories/dataset_symlink.py
Comment thread data_rentgen/db/models/dataset_symlink_group.py Outdated
Comment thread data_rentgen/dto/dataset_symlink.py
Comment thread data_rentgen/dto/dataset_symlink.py Outdated
Comment thread data_rentgen/dto/dataset_symlink.py Outdated
Comment thread tests/test_server/fixtures/factories/dataset.py Outdated
@IlyasDevelopment IlyasDevelopment merged commit 284fb10 into develop Jul 2, 2026
10 of 11 checks passed
@IlyasDevelopment IlyasDevelopment deleted the feature/DOP-27960 branch July 2, 2026 13:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants