Improve FSDB validation and health‑check CLI with extra roots and safe deletions#99
Merged
Conversation
- Import `NotAnFSDBError` in `core.py` - Import `_is_fsdb` from `plantdb.commons.fsdb.validation` in `core.py` - Add a `_is_fsdb(basedir)` check in `FSDB.__init__` and raise `NotAnFSDBError` if the directory is not a valid FSDB
- Add return type hint `-> bool` to `_is_fsdb` in `validation.py` - Expand `_is_fsdb` docstring with detailed description and usage examples - Verify the provided path is a directory before proceeding - Ensure the presence of the `MARKER_FILE_NAME` file - Introduce scan‑directory validation using new helper `_is_scan_dataset` - Log warnings for empty databases and for any bad scan directories found - Add new function `_is_scan_dataset` to validate FSDB datasets: - Checks for required `files.json` and valid JSON structure - Optionally validates filesets if `validate_json_fileset` is true - Confirms presence of required `metadata` subdirectory - Update `_is_safe_to_delete` signature to `-> bool` and improve its docstring.
- Import `_fileset_path` and `_scan_json_file` from `path_helpers` in `validation.py` - Extend `_is_fsdb` signature to accept `validate_json_fileset` and pass it to scan checks - Rename parameters to `scan_path` in `_is_scan_dataset` and update all internal path usages - Add optional `validate_json_fileset` flag documentation to both `_is_fsdb` and `_is_scan_dataset` - Implement `_is_valid_fileset` to verify fileset directories and required files listed in `files.json` - Update `_is_scan_dataset` to invoke `_is_valid_fileset` when validation is enabled - Adjust calls to `_is_scan_dataset` in `_is_fsdb` to include the new flag - Refactor variable names and path handling for clarity across the validation module
- Import `_is_scan_dataset` in `file_ops.py` (pre‑load for validation utilities). - Remove `required_fs` handling in `get_scans`; now only checks that the scan directory exists before loading filesets. - Comment out the user prompt (`yes_no_choice`) and deletion loop for bad scans, preventing interactive prompts in non‑TTY environments. - Keep existing logic for loading filesets and updating scans unchanged.
…mentation
- Update `pyproject.toml` script entry: replace `fsdb_check` with `fsdb_healthcheck`.
- Modify `validation.py` to suggest the new `fsdb_healthcheck` CLI instead of a TODO comment.
- Add new CLI module `src/commons/plantdb/commons/cli/fsdb_healthcheck.py`:
- Replace `argparse` with `click` for argument parsing.
- Introduce `--log-level`, `--fix`, `--fix-missing`, and `--fix-extra` options.
- Configure logger via `get_logger('fsdb_healthcheck', ...)`.
- Implement missing‑reference fixing logic with progress bar and backup handling.
- Stub `--fix-extra` with `NotImplementedError`.
- Remove old `fsdb_check.py` implementation.
- Log an error when the provided path is not a directory. - Log an error when the required marker file `MARKER_FILE_NAME` is missing. - Update empty‑FSDB warning to use the path string directly. - Store bad scan directories as strings and simplify the bad‑scan log output. - Add explicit error logging for missing `metadata` subdirectory. - Add error logs for missing `files.json`, JSON parse failures, and missing `filesets` entry. - Log an error when a fileset directory defined in `files.json` is absent. - Introduce `_fileset_files_exists` helper to verify all required files exist and log the count of missing files. - Update `_is_valid_fileset` to use the new helper for detailed missing‑file reporting.
…ional files‑json updates - Updated `file_ops.py` to use `Path` from `pathlib` and added `typing` imports for clearer type hints. - Replaced direct `shutil.rmtree` calls with `send2trash` for safer, reversible deletions of scans, filesets, and metadata directories. - Introduced `backup_file` usage before overwriting `files.json` when `updates_files_json` is enabled. - Extended `_load_scan` signature to `def _load_scan(db: 'FSDB', scan_id: str, updates_files_json: bool = False) -> 'Scan | None'` and added detailed docstring. - Modified `_load_scans` to return a `dict[str, 'Scan']`, accept `updates_files_json` flag, and improved handling of bad scans (no interactive prompts). - Added validation imports (`_is_valid_fileset`, `_is_scan_dataset`) and guarded type‑checking imports with `TYPE_CHECKING` to avoid circular dependencies. - Updated helper functions (`_load_scan_filesets`, `_load_fileset`, `_load_fileset_files`) to return a `(result, needs_update)` tuple, propagating the update flag. - Adjusted internal calls to reflect new return signatures and update logic. - Replaced legacy `yes_no_choice` prompt handling with commented‑out code, eliminating interactive deletion in non‑TTY environments. - Updated function signatures for `_load_dummy_fileset`, `_load_file`, `_load_measures`, `_load_scan_measures`, `_delete_file`, `_delete_fileset`, `_delete_scan`, `_make_fileset`, `_make_scan`, and `_store_scan` with explicit type hints and return annotations.
- Reformat `yes_no_choice` signature to use explicit spacing (`default: bool = True`). - Introduce `yes_no_abort_choice` in `utils.py` to allow aborting a yes/no prompt and return `None` when aborted. - Add `backup_filename` function to create a timestamped backup path for a given file. - Add `backup_file` function that copies the original file to the backup path generated by `backup_filename`.
…nnect - Drop the `required_filesets` attribute, its `__init__` parameter, and related docstring sections in `core.py` - Initialize scans without setting `self.required_filesets`; default validation now relies on the presence of a `metadata` fileset - Add an `_is_fsdb(self.basedir)` check in `FSDB.connect` to raise `NotAnFSDBError` when the directory is not a valid FSDB - Clean up import comments and unused code related to required filesets.
- Updated `plantdb/src/commons/pyproject.toml` to include ``send2trash`` in the `dependencies` list, enabling safe recycle‑bin deletions.
- Grouped CLI options with `click_option_group` into “Fix” and “Logging” sections and reordered parameters (`fsdb_path`, `fix`, `fix_missing`, `fix_extra`, `log_level`) - Added explicit FSDB marker validation using `MARKER_FILE_NAME` and raise `NotAnFSDBError` when missing - Replaced direct directory check with `Path.is_dir()` and added early error handling for non‑directory paths - Collected scan directories while ignoring hidden folders and added empty‑FSDB warning - Implemented `fix_missing_scans_reference` helper: - Validates each scan with `_is_scan_dataset` - Updates `files.json` via `_load_scan(..., updates_files_json=True)` - Tracks bad scans, prompts user with `yes_no_abort_choice`, and moves them to trash using `send2trash` - Removed old backup and progress‑bar logic; introduced new interactive deletion flow with clear warnings - Updated imports: added `Logger`, `OptionGroup`, `optgroup`, `send2trash`, and `yes_no_abort_choice`; removed unused `datetime`, `json`, `shutil`, and `tqdm` - Adjusted logger initialization comment and eliminated unnecessary `db.connect()`/`db.disconnect()` calls - Updated documentation strings to reflect new behavior and parameters in `fsdb_healthcheck.py`
- Extend `_is_fsdb` signature with `extra_dirs:list[str]=['configs']` and update docstring. - Skip verification of directories listed in `extra_dirs` during scan dataset checks. - Add `extra_dirs` parameter to `FSDB.__init__` (default `['configs']`) and store it as an instance attribute. - Pass `self.extra_dirs` to `_is_fsdb` in `FSDB.connect` to respect extra directory handling.
- Updated `src/commons/pyproject.toml` to include `click_option_group` in the `dependencies` list.
- No functional code changes; the move improves file organization and readability.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces several enhancements to the FSDB subsystem:
FSDB.__init__and_is_fsdbnow accept anextra_dirslist (default['configs']) that are ignored during scan validation.fsdb_checkscript with a Click‑basedfsdb_healthcheckcommand, adding grouped options, logging controls, and fix flags.metadatasubdirectory,files.jsonintegrity, and fileset completeness. Detailed error logging is now provided.send2trash(added as a dependency) and backupfiles.jsonbefore overwriting.yes_no_abort_choicefor abort‑capable confirmations and updated prompt handling throughout the codebase.required_filesetsattribute and related logic, simplifying scan loading.FSDB.__init__now validates the database path via_is_fsdband raisesNotAnFSDBErrorfor invalid directories.These changes improve reliability, usability, and safety when working with FSDB databases.