Skip to content

Improve FSDB healthcheck CLI and validation workflow#98

Closed
jlegrand62 wants to merge 13 commits into
mainfrom
feature/is_fsdb
Closed

Improve FSDB healthcheck CLI and validation workflow#98
jlegrand62 wants to merge 13 commits into
mainfrom
feature/is_fsdb

Conversation

@jlegrand62

Copy link
Copy Markdown
Member

Summary

This PR overhauls the FSDB health‑check and validation utilities:

  • New Click‑based fsdb_healthcheck CLI

    • Replaces the old fsdb_check script.
    • Options are grouped into Fix and Logging sections (--fix, --fix-missing, --fix-extra, --log-level).
    • Early validation of the FSDB path and marker file, with a clear NotAnFSDBError when invalid.
  • Robust validation logic

    • Added full directory checks, marker‑file validation, and detailed logging for missing metadata, files.json, and fileset issues.
    • Introduced _is_scan_dataset and optional JSON‑fileset validation flags.
    • Empty‑FSDB and bad‑scan warnings are now logged with precise paths.
  • Safe deletion and backup

    • Integrated send2trash for reversible deletions of scans, filesets, and metadata.
    • New backup_file / backup_filename helpers create timestamped backups before overwriting files.json.
  • Interactive fix flow

    • yes_no_abort_choice allows users to abort the fixing process.
    • fix_missing_scans_reference validates scans, updates files.json, and moves corrupted scans to the trash.
  • API clean‑up

    • Dropped the required_filesets attribute and related init parameters.
    • Enforced FSDB validation on FSDB.connect.
    • Refactored file‑operation helpers with type hints and updated signatures to propagate update flags.
  • Dependency update

    • Added send2trash to the commons dependencies.

Overall, the changes make FSDB health checks more user‑friendly, safer, and better logged while simplifying the underlying code base.

- Import `NotAnFSDBError` in `core.py`
- Import `_is_fsdb` from `plantdb.commons.fsdb.validation` in `core.py`
- Add a `_is_fsdb(basedir)` check in `FSDB.__init__` and raise `NotAnFSDBError` if the directory is not a valid FSDB
- Add return type hint `-> bool` to `_is_fsdb` in `validation.py`
- Expand `_is_fsdb` docstring with detailed description and usage examples
- Verify the provided path is a directory before proceeding
- Ensure the presence of the `MARKER_FILE_NAME` file
- Introduce scan‑directory validation using new helper `_is_scan_dataset`
- Log warnings for empty databases and for any bad scan directories found
- Add new function `_is_scan_dataset` to validate FSDB datasets:
  - Checks for required `files.json` and valid JSON structure
  - Optionally validates filesets if `validate_json_fileset` is true
  - Confirms presence of required `metadata` subdirectory
- Update `_is_safe_to_delete` signature to `-> bool` and improve its docstring.
- Import `_fileset_path` and `_scan_json_file` from `path_helpers` in `validation.py`
- Extend `_is_fsdb` signature to accept `validate_json_fileset` and pass it to scan checks
- Rename parameters to `scan_path` in `_is_scan_dataset` and update all internal path usages
- Add optional `validate_json_fileset` flag documentation to both `_is_fsdb` and `_is_scan_dataset`
- Implement `_is_valid_fileset` to verify fileset directories and required files listed in `files.json`
- Update `_is_scan_dataset` to invoke `_is_valid_fileset` when validation is enabled
- Adjust calls to `_is_scan_dataset` in `_is_fsdb` to include the new flag
- Refactor variable names and path handling for clarity across the validation module
- Import `_is_scan_dataset` in `file_ops.py` (pre‑load for validation utilities).
- Remove `required_fs` handling in `get_scans`; now only checks that the scan directory exists before loading filesets.
- Comment out the user prompt (`yes_no_choice`) and deletion loop for bad scans, preventing interactive prompts in non‑TTY environments.
- Keep existing logic for loading filesets and updating scans unchanged.
…mentation

- Update `pyproject.toml` script entry: replace `fsdb_check` with `fsdb_healthcheck`.
- Modify `validation.py` to suggest the new `fsdb_healthcheck` CLI instead of a TODO comment.
- Add new CLI module `src/commons/plantdb/commons/cli/fsdb_healthcheck.py`:
  - Replace `argparse` with `click` for argument parsing.
  - Introduce `--log-level`, `--fix`, `--fix-missing`, and `--fix-extra` options.
  - Configure logger via `get_logger('fsdb_healthcheck', ...)`.
  - Implement missing‑reference fixing logic with progress bar and backup handling.
  - Stub `--fix-extra` with `NotImplementedError`.
- Remove old `fsdb_check.py` implementation.
- Log an error when the provided path is not a directory.
- Log an error when the required marker file `MARKER_FILE_NAME` is missing.
- Update empty‑FSDB warning to use the path string directly.
- Store bad scan directories as strings and simplify the bad‑scan log output.
- Add explicit error logging for missing `metadata` subdirectory.
- Add error logs for missing `files.json`, JSON parse failures, and missing `filesets` entry.
- Log an error when a fileset directory defined in `files.json` is absent.
- Introduce `_fileset_files_exists` helper to verify all required files exist and log the count of missing files.
- Update `_is_valid_fileset` to use the new helper for detailed missing‑file reporting.
…ional files‑json updates

- Updated `file_ops.py` to use `Path` from `pathlib` and added `typing` imports for clearer type hints.
- Replaced direct `shutil.rmtree` calls with `send2trash` for safer, reversible deletions of scans, filesets, and metadata directories.
- Introduced `backup_file` usage before overwriting `files.json` when `updates_files_json` is enabled.
- Extended `_load_scan` signature to `def _load_scan(db: 'FSDB', scan_id: str, updates_files_json: bool = False) -> 'Scan | None'` and added detailed docstring.
- Modified `_load_scans` to return a `dict[str, 'Scan']`, accept `updates_files_json` flag, and improved handling of bad scans (no interactive prompts).
- Added validation imports (`_is_valid_fileset`, `_is_scan_dataset`) and guarded type‑checking imports with `TYPE_CHECKING` to avoid circular dependencies.
- Updated helper functions (`_load_scan_filesets`, `_load_fileset`, `_load_fileset_files`) to return a `(result, needs_update)` tuple, propagating the update flag.
- Adjusted internal calls to reflect new return signatures and update logic.
- Replaced legacy `yes_no_choice` prompt handling with commented‑out code, eliminating interactive deletion in non‑TTY environments.
- Updated function signatures for `_load_dummy_fileset`, `_load_file`, `_load_measures`, `_load_scan_measures`, `_delete_file`, `_delete_fileset`, `_delete_scan`, `_make_fileset`, `_make_scan`, and `_store_scan` with explicit type hints and return annotations.
- Reformat `yes_no_choice` signature to use explicit spacing (`default: bool = True`).
- Introduce `yes_no_abort_choice` in `utils.py` to allow aborting a yes/no prompt and return `None` when aborted.
- Add `backup_filename` function to create a timestamped backup path for a given file.
- Add `backup_file` function that copies the original file to the backup path generated by `backup_filename`.
…nnect

- Drop the `required_filesets` attribute, its `__init__` parameter, and related docstring sections in `core.py`
- Initialize scans without setting `self.required_filesets`; default validation now relies on the presence of a `metadata` fileset
- Add an `_is_fsdb(self.basedir)` check in `FSDB.connect` to raise `NotAnFSDBError` when the directory is not a valid FSDB
- Clean up import comments and unused code related to required filesets.
- Updated `plantdb/src/commons/pyproject.toml` to include ``send2trash`` in the `dependencies` list, enabling safe recycle‑bin deletions.
- Grouped CLI options with `click_option_group` into “Fix” and “Logging” sections and reordered parameters (`fsdb_path`, `fix`, `fix_missing`, `fix_extra`, `log_level`)
- Added explicit FSDB marker validation using `MARKER_FILE_NAME` and raise `NotAnFSDBError` when missing
- Replaced direct directory check with `Path.is_dir()` and added early error handling for non‑directory paths
- Collected scan directories while ignoring hidden folders and added empty‑FSDB warning
- Implemented `fix_missing_scans_reference` helper:
  - Validates each scan with `_is_scan_dataset`
  - Updates `files.json` via `_load_scan(..., updates_files_json=True)`
  - Tracks bad scans, prompts user with `yes_no_abort_choice`, and moves them to trash using `send2trash`
- Removed old backup and progress‑bar logic; introduced new interactive deletion flow with clear warnings
- Updated imports: added `Logger`, `OptionGroup`, `optgroup`, `send2trash`, and `yes_no_abort_choice`; removed unused `datetime`, `json`, `shutil`, and `tqdm`
- Adjusted logger initialization comment and eliminated unnecessary `db.connect()`/`db.disconnect()` calls
- Updated documentation strings to reflect new behavior and parameters in `fsdb_healthcheck.py`
@jlegrand62 jlegrand62 self-assigned this Jun 26, 2026
@jlegrand62 jlegrand62 added bug Something isn't working enhancement New feature or request labels Jun 26, 2026
- Extend `_is_fsdb` signature with `extra_dirs:list[str]=['configs']` and update docstring.
- Skip verification of directories listed in `extra_dirs` during scan dataset checks.
- Add `extra_dirs` parameter to `FSDB.__init__` (default `['configs']`) and store it as an instance attribute.
- Pass `self.extra_dirs` to `_is_fsdb` in `FSDB.connect` to respect extra directory handling.
@jlegrand62 jlegrand62 closed this Jun 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant