Improve FSDB healthcheck CLI and validation workflow#98
Closed
jlegrand62 wants to merge 13 commits into
Closed
Conversation
- Import `NotAnFSDBError` in `core.py` - Import `_is_fsdb` from `plantdb.commons.fsdb.validation` in `core.py` - Add a `_is_fsdb(basedir)` check in `FSDB.__init__` and raise `NotAnFSDBError` if the directory is not a valid FSDB
- Add return type hint `-> bool` to `_is_fsdb` in `validation.py` - Expand `_is_fsdb` docstring with detailed description and usage examples - Verify the provided path is a directory before proceeding - Ensure the presence of the `MARKER_FILE_NAME` file - Introduce scan‑directory validation using new helper `_is_scan_dataset` - Log warnings for empty databases and for any bad scan directories found - Add new function `_is_scan_dataset` to validate FSDB datasets: - Checks for required `files.json` and valid JSON structure - Optionally validates filesets if `validate_json_fileset` is true - Confirms presence of required `metadata` subdirectory - Update `_is_safe_to_delete` signature to `-> bool` and improve its docstring.
- Import `_fileset_path` and `_scan_json_file` from `path_helpers` in `validation.py` - Extend `_is_fsdb` signature to accept `validate_json_fileset` and pass it to scan checks - Rename parameters to `scan_path` in `_is_scan_dataset` and update all internal path usages - Add optional `validate_json_fileset` flag documentation to both `_is_fsdb` and `_is_scan_dataset` - Implement `_is_valid_fileset` to verify fileset directories and required files listed in `files.json` - Update `_is_scan_dataset` to invoke `_is_valid_fileset` when validation is enabled - Adjust calls to `_is_scan_dataset` in `_is_fsdb` to include the new flag - Refactor variable names and path handling for clarity across the validation module
- Import `_is_scan_dataset` in `file_ops.py` (pre‑load for validation utilities). - Remove `required_fs` handling in `get_scans`; now only checks that the scan directory exists before loading filesets. - Comment out the user prompt (`yes_no_choice`) and deletion loop for bad scans, preventing interactive prompts in non‑TTY environments. - Keep existing logic for loading filesets and updating scans unchanged.
…mentation
- Update `pyproject.toml` script entry: replace `fsdb_check` with `fsdb_healthcheck`.
- Modify `validation.py` to suggest the new `fsdb_healthcheck` CLI instead of a TODO comment.
- Add new CLI module `src/commons/plantdb/commons/cli/fsdb_healthcheck.py`:
- Replace `argparse` with `click` for argument parsing.
- Introduce `--log-level`, `--fix`, `--fix-missing`, and `--fix-extra` options.
- Configure logger via `get_logger('fsdb_healthcheck', ...)`.
- Implement missing‑reference fixing logic with progress bar and backup handling.
- Stub `--fix-extra` with `NotImplementedError`.
- Remove old `fsdb_check.py` implementation.
- Log an error when the provided path is not a directory. - Log an error when the required marker file `MARKER_FILE_NAME` is missing. - Update empty‑FSDB warning to use the path string directly. - Store bad scan directories as strings and simplify the bad‑scan log output. - Add explicit error logging for missing `metadata` subdirectory. - Add error logs for missing `files.json`, JSON parse failures, and missing `filesets` entry. - Log an error when a fileset directory defined in `files.json` is absent. - Introduce `_fileset_files_exists` helper to verify all required files exist and log the count of missing files. - Update `_is_valid_fileset` to use the new helper for detailed missing‑file reporting.
…ional files‑json updates - Updated `file_ops.py` to use `Path` from `pathlib` and added `typing` imports for clearer type hints. - Replaced direct `shutil.rmtree` calls with `send2trash` for safer, reversible deletions of scans, filesets, and metadata directories. - Introduced `backup_file` usage before overwriting `files.json` when `updates_files_json` is enabled. - Extended `_load_scan` signature to `def _load_scan(db: 'FSDB', scan_id: str, updates_files_json: bool = False) -> 'Scan | None'` and added detailed docstring. - Modified `_load_scans` to return a `dict[str, 'Scan']`, accept `updates_files_json` flag, and improved handling of bad scans (no interactive prompts). - Added validation imports (`_is_valid_fileset`, `_is_scan_dataset`) and guarded type‑checking imports with `TYPE_CHECKING` to avoid circular dependencies. - Updated helper functions (`_load_scan_filesets`, `_load_fileset`, `_load_fileset_files`) to return a `(result, needs_update)` tuple, propagating the update flag. - Adjusted internal calls to reflect new return signatures and update logic. - Replaced legacy `yes_no_choice` prompt handling with commented‑out code, eliminating interactive deletion in non‑TTY environments. - Updated function signatures for `_load_dummy_fileset`, `_load_file`, `_load_measures`, `_load_scan_measures`, `_delete_file`, `_delete_fileset`, `_delete_scan`, `_make_fileset`, `_make_scan`, and `_store_scan` with explicit type hints and return annotations.
- Reformat `yes_no_choice` signature to use explicit spacing (`default: bool = True`). - Introduce `yes_no_abort_choice` in `utils.py` to allow aborting a yes/no prompt and return `None` when aborted. - Add `backup_filename` function to create a timestamped backup path for a given file. - Add `backup_file` function that copies the original file to the backup path generated by `backup_filename`.
…nnect - Drop the `required_filesets` attribute, its `__init__` parameter, and related docstring sections in `core.py` - Initialize scans without setting `self.required_filesets`; default validation now relies on the presence of a `metadata` fileset - Add an `_is_fsdb(self.basedir)` check in `FSDB.connect` to raise `NotAnFSDBError` when the directory is not a valid FSDB - Clean up import comments and unused code related to required filesets.
- Updated `plantdb/src/commons/pyproject.toml` to include ``send2trash`` in the `dependencies` list, enabling safe recycle‑bin deletions.
- Grouped CLI options with `click_option_group` into “Fix” and “Logging” sections and reordered parameters (`fsdb_path`, `fix`, `fix_missing`, `fix_extra`, `log_level`) - Added explicit FSDB marker validation using `MARKER_FILE_NAME` and raise `NotAnFSDBError` when missing - Replaced direct directory check with `Path.is_dir()` and added early error handling for non‑directory paths - Collected scan directories while ignoring hidden folders and added empty‑FSDB warning - Implemented `fix_missing_scans_reference` helper: - Validates each scan with `_is_scan_dataset` - Updates `files.json` via `_load_scan(..., updates_files_json=True)` - Tracks bad scans, prompts user with `yes_no_abort_choice`, and moves them to trash using `send2trash` - Removed old backup and progress‑bar logic; introduced new interactive deletion flow with clear warnings - Updated imports: added `Logger`, `OptionGroup`, `optgroup`, `send2trash`, and `yes_no_abort_choice`; removed unused `datetime`, `json`, `shutil`, and `tqdm` - Adjusted logger initialization comment and eliminated unnecessary `db.connect()`/`db.disconnect()` calls - Updated documentation strings to reflect new behavior and parameters in `fsdb_healthcheck.py`
- Extend `_is_fsdb` signature with `extra_dirs:list[str]=['configs']` and update docstring. - Skip verification of directories listed in `extra_dirs` during scan dataset checks. - Add `extra_dirs` parameter to `FSDB.__init__` (default `['configs']`) and store it as an instance attribute. - Pass `self.extra_dirs` to `_is_fsdb` in `FSDB.connect` to respect extra directory handling.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR overhauls the FSDB health‑check and validation utilities:
New Click‑based
fsdb_healthcheckCLIfsdb_checkscript.--fix,--fix-missing,--fix-extra,--log-level).NotAnFSDBErrorwhen invalid.Robust validation logic
files.json, and fileset issues._is_scan_datasetand optional JSON‑fileset validation flags.Safe deletion and backup
send2trashfor reversible deletions of scans, filesets, and metadata.backup_file/backup_filenamehelpers create timestamped backups before overwritingfiles.json.Interactive fix flow
yes_no_abort_choiceallows users to abort the fixing process.fix_missing_scans_referencevalidates scans, updatesfiles.json, and moves corrupted scans to the trash.API clean‑up
required_filesetsattribute and related init parameters.FSDB.connect.Dependency update
send2trashto the commons dependencies.Overall, the changes make FSDB health checks more user‑friendly, safer, and better logged while simplifying the underlying code base.