Skip to content

Improve FSDB validation and health‑check CLI with extra roots and safe deletions#99

Merged
jlegrand62 merged 16 commits into
devfrom
feature/is_fsdb
Jun 26, 2026
Merged

Improve FSDB validation and health‑check CLI with extra roots and safe deletions#99
jlegrand62 merged 16 commits into
devfrom
feature/is_fsdb

Conversation

@jlegrand62

Copy link
Copy Markdown
Member

Summary

This PR introduces several enhancements to the FSDB subsystem:

  • Extra root directoriesFSDB.__init__ and _is_fsdb now accept an extra_dirs list (default ['configs']) that are ignored during scan validation.
  • Health‑check CLI overhaul – Replaced the old fsdb_check script with a Click‑based fsdb_healthcheck command, adding grouped options, logging controls, and fix flags.
  • Robust validation – Added comprehensive checks for marker files, required metadata subdirectory, files.json integrity, and fileset completeness. Detailed error logging is now provided.
  • Safe deletions – All destructive operations now use send2trash (added as a dependency) and backup files.json before overwriting.
  • User prompts – Introduced yes_no_abort_choice for abort‑capable confirmations and updated prompt handling throughout the codebase.
  • API cleanup – Removed the obsolete required_filesets attribute and related logic, simplifying scan loading.
  • Type hints & refactoring – Added explicit type annotations to file‑operation helpers, updated signatures, and streamlined internal calls.
  • Error handlingFSDB.__init__ now validates the database path via _is_fsdb and raises NotAnFSDBError for invalid directories.

These changes improve reliability, usability, and safety when working with FSDB databases.

- Import `NotAnFSDBError` in `core.py`
- Import `_is_fsdb` from `plantdb.commons.fsdb.validation` in `core.py`
- Add a `_is_fsdb(basedir)` check in `FSDB.__init__` and raise `NotAnFSDBError` if the directory is not a valid FSDB
- Add return type hint `-> bool` to `_is_fsdb` in `validation.py`
- Expand `_is_fsdb` docstring with detailed description and usage examples
- Verify the provided path is a directory before proceeding
- Ensure the presence of the `MARKER_FILE_NAME` file
- Introduce scan‑directory validation using new helper `_is_scan_dataset`
- Log warnings for empty databases and for any bad scan directories found
- Add new function `_is_scan_dataset` to validate FSDB datasets:
  - Checks for required `files.json` and valid JSON structure
  - Optionally validates filesets if `validate_json_fileset` is true
  - Confirms presence of required `metadata` subdirectory
- Update `_is_safe_to_delete` signature to `-> bool` and improve its docstring.
- Import `_fileset_path` and `_scan_json_file` from `path_helpers` in `validation.py`
- Extend `_is_fsdb` signature to accept `validate_json_fileset` and pass it to scan checks
- Rename parameters to `scan_path` in `_is_scan_dataset` and update all internal path usages
- Add optional `validate_json_fileset` flag documentation to both `_is_fsdb` and `_is_scan_dataset`
- Implement `_is_valid_fileset` to verify fileset directories and required files listed in `files.json`
- Update `_is_scan_dataset` to invoke `_is_valid_fileset` when validation is enabled
- Adjust calls to `_is_scan_dataset` in `_is_fsdb` to include the new flag
- Refactor variable names and path handling for clarity across the validation module
- Import `_is_scan_dataset` in `file_ops.py` (pre‑load for validation utilities).
- Remove `required_fs` handling in `get_scans`; now only checks that the scan directory exists before loading filesets.
- Comment out the user prompt (`yes_no_choice`) and deletion loop for bad scans, preventing interactive prompts in non‑TTY environments.
- Keep existing logic for loading filesets and updating scans unchanged.
…mentation

- Update `pyproject.toml` script entry: replace `fsdb_check` with `fsdb_healthcheck`.
- Modify `validation.py` to suggest the new `fsdb_healthcheck` CLI instead of a TODO comment.
- Add new CLI module `src/commons/plantdb/commons/cli/fsdb_healthcheck.py`:
  - Replace `argparse` with `click` for argument parsing.
  - Introduce `--log-level`, `--fix`, `--fix-missing`, and `--fix-extra` options.
  - Configure logger via `get_logger('fsdb_healthcheck', ...)`.
  - Implement missing‑reference fixing logic with progress bar and backup handling.
  - Stub `--fix-extra` with `NotImplementedError`.
- Remove old `fsdb_check.py` implementation.
- Log an error when the provided path is not a directory.
- Log an error when the required marker file `MARKER_FILE_NAME` is missing.
- Update empty‑FSDB warning to use the path string directly.
- Store bad scan directories as strings and simplify the bad‑scan log output.
- Add explicit error logging for missing `metadata` subdirectory.
- Add error logs for missing `files.json`, JSON parse failures, and missing `filesets` entry.
- Log an error when a fileset directory defined in `files.json` is absent.
- Introduce `_fileset_files_exists` helper to verify all required files exist and log the count of missing files.
- Update `_is_valid_fileset` to use the new helper for detailed missing‑file reporting.
…ional files‑json updates

- Updated `file_ops.py` to use `Path` from `pathlib` and added `typing` imports for clearer type hints.
- Replaced direct `shutil.rmtree` calls with `send2trash` for safer, reversible deletions of scans, filesets, and metadata directories.
- Introduced `backup_file` usage before overwriting `files.json` when `updates_files_json` is enabled.
- Extended `_load_scan` signature to `def _load_scan(db: 'FSDB', scan_id: str, updates_files_json: bool = False) -> 'Scan | None'` and added detailed docstring.
- Modified `_load_scans` to return a `dict[str, 'Scan']`, accept `updates_files_json` flag, and improved handling of bad scans (no interactive prompts).
- Added validation imports (`_is_valid_fileset`, `_is_scan_dataset`) and guarded type‑checking imports with `TYPE_CHECKING` to avoid circular dependencies.
- Updated helper functions (`_load_scan_filesets`, `_load_fileset`, `_load_fileset_files`) to return a `(result, needs_update)` tuple, propagating the update flag.
- Adjusted internal calls to reflect new return signatures and update logic.
- Replaced legacy `yes_no_choice` prompt handling with commented‑out code, eliminating interactive deletion in non‑TTY environments.
- Updated function signatures for `_load_dummy_fileset`, `_load_file`, `_load_measures`, `_load_scan_measures`, `_delete_file`, `_delete_fileset`, `_delete_scan`, `_make_fileset`, `_make_scan`, and `_store_scan` with explicit type hints and return annotations.
- Reformat `yes_no_choice` signature to use explicit spacing (`default: bool = True`).
- Introduce `yes_no_abort_choice` in `utils.py` to allow aborting a yes/no prompt and return `None` when aborted.
- Add `backup_filename` function to create a timestamped backup path for a given file.
- Add `backup_file` function that copies the original file to the backup path generated by `backup_filename`.
…nnect

- Drop the `required_filesets` attribute, its `__init__` parameter, and related docstring sections in `core.py`
- Initialize scans without setting `self.required_filesets`; default validation now relies on the presence of a `metadata` fileset
- Add an `_is_fsdb(self.basedir)` check in `FSDB.connect` to raise `NotAnFSDBError` when the directory is not a valid FSDB
- Clean up import comments and unused code related to required filesets.
- Updated `plantdb/src/commons/pyproject.toml` to include ``send2trash`` in the `dependencies` list, enabling safe recycle‑bin deletions.
- Grouped CLI options with `click_option_group` into “Fix” and “Logging” sections and reordered parameters (`fsdb_path`, `fix`, `fix_missing`, `fix_extra`, `log_level`)
- Added explicit FSDB marker validation using `MARKER_FILE_NAME` and raise `NotAnFSDBError` when missing
- Replaced direct directory check with `Path.is_dir()` and added early error handling for non‑directory paths
- Collected scan directories while ignoring hidden folders and added empty‑FSDB warning
- Implemented `fix_missing_scans_reference` helper:
  - Validates each scan with `_is_scan_dataset`
  - Updates `files.json` via `_load_scan(..., updates_files_json=True)`
  - Tracks bad scans, prompts user with `yes_no_abort_choice`, and moves them to trash using `send2trash`
- Removed old backup and progress‑bar logic; introduced new interactive deletion flow with clear warnings
- Updated imports: added `Logger`, `OptionGroup`, `optgroup`, `send2trash`, and `yes_no_abort_choice`; removed unused `datetime`, `json`, `shutil`, and `tqdm`
- Adjusted logger initialization comment and eliminated unnecessary `db.connect()`/`db.disconnect()` calls
- Updated documentation strings to reflect new behavior and parameters in `fsdb_healthcheck.py`
- Extend `_is_fsdb` signature with `extra_dirs:list[str]=['configs']` and update docstring.
- Skip verification of directories listed in `extra_dirs` during scan dataset checks.
- Add `extra_dirs` parameter to `FSDB.__init__` (default `['configs']`) and store it as an instance attribute.
- Pass `self.extra_dirs` to `_is_fsdb` in `FSDB.connect` to respect extra directory handling.
@jlegrand62 jlegrand62 self-assigned this Jun 26, 2026
@jlegrand62 jlegrand62 added bug Something isn't working enhancement New feature or request labels Jun 26, 2026
- Updated `src/commons/pyproject.toml` to include `click_option_group` in the `dependencies` list.
- No functional code changes; the move improves file organization and readability.
@jlegrand62 jlegrand62 merged commit f5fdea6 into dev Jun 26, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant