Skip to content

Enhancement patches#57

Merged
GavinHuttley merged 17 commits into
HuttleyLab:developfrom
GavinHuttley:develop
Jun 24, 2026
Merged

Enhancement patches#57
GavinHuttley merged 17 commits into
HuttleyLab:developfrom
GavinHuttley:develop

Conversation

@GavinHuttley

@GavinHuttley GavinHuttley commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

Summary by Sourcery

Introduce richer package metadata handling, dependency and license introspection, and log summarisation utilities, while updating logging labels and documentation to a new README format.

New Features:

  • Add a LogLabel enum for first-class, typed log labels used throughout the logging API.
  • Add dependency inspection helpers that resolve an installed package's declared requirements with environment marker evaluation and optional installation filtering.
  • Add license inspection helpers that resolve package licenses from metadata and expose them via get_package_licenses and CachingLogger.log_licenses.
  • Add a log_summary utility to parse scitrack log files into a structured, label-keyed summary API.

Enhancements:

  • Extend get_package_name to support no-argument calls that infer the caller's installed distribution name via frame inspection.
  • Improve get_version_for_package to prefer importlib.metadata versions, distinguish missing versus uninstalled packages, and support module objects more robustly.
  • Enhance CachingLogger to accept LogLabel values, to log versions and licenses for the caller package plus its installed dependencies and user-specified packages, and to centralize shared metadata logging logic.
  • Adjust built-in logging of system details, Python version, user, and command string to use the new LogLabel values consistently.

Build:

  • Configure coverage.py to focus on the scitrack package and adjust Ruff configuration to ignore COM812.

Documentation:

  • Replace the reStructuredText README with an expanded Markdown README that documents installation, CachingLogger usage, new version/license logging behavior, and log_summary.
  • Update packaging metadata and sdist configuration to point to the new README.md.

Tests:

  • Greatly expand tests around get_package_name, version resolution, dependency parsing and filtering, version and license logging behaviors, and the new log_summary and license utilities to ensure robust, edge-case-aware behavior.

[NEW] returns log file entries grouped by label.
    Useful for client code that wants to generate
    summaries.
Returns a dict keyed by install option ("core" plus one entry per
extra), with values being base package names. Non-installed packages
yield {}; non-extra env markers are evaluated against the current
interpreter so non-applicable deps are dropped.
When True, returns only installed dependencies; each distinct name is
probed at most once per call via a local cache.
…alled

Previously raised a generic ValueError, which made it impossible for
callers of log_versions() to distinguish "package not installed" from
other ValueError conditions. The new behaviour propagates through
log_versions() unchanged.
[CHANGED] CachingLogger.log_versions() now also emits a version line for each
    installed dependency of the caller's package, as reported by
    get_package_dependencies(if_installed=True).
[NEW] for license tracking. The log_licenses() method mirrors
    log_versions(). License resolution prefers PEP 639 License-Expression
    over the legacy License field. Missing licenses are recorded as "UNKNOWN".
…rack

Capture the caller frame at the public-method boundary where f_back is
unambiguously the call site.
The leading per-line timestamp and host:pid prefix were previously
discarded. log_summary() now returns three reserved keys derived from
the log itself: "datetime" (first line's timestamp, when logging
started), "hostname" (host from the line prefix) and "os" (from the
system_details line, when present).
@sourcery-ai

sourcery-ai Bot commented Jun 24, 2026

Copy link
Copy Markdown

Reviewer's Guide

Adds richer package/version/license introspection utilities, a LogLabel enum-based labeling scheme, log summary parsing, and extensive tests while modernizing packaging metadata and tooling configuration.

Sequence diagram for enhanced log_versions and log_licenses metadata logging

sequenceDiagram
    actor User
    participant Logger as CachingLogger
    participant MetaHelper as _log_metadata
    participant Deps as get_package_dependencies
    participant Ver as get_version_for_package
    participant Lic as _license_for_package

    User->>Logger: log_versions(packages)
    activate Logger
    Logger->>Logger: _installed_package_from_globals(parent.f_globals)
    Logger->>MetaHelper: _log_metadata(packages, get_version_for_package, LogLabel.VERSION, caller_name)
    activate MetaHelper
    alt caller_name is installed
        MetaHelper->>Ver: get_version_for_package(caller_name)
        Ver-->>MetaHelper: version
        MetaHelper->>Deps: get_package_dependencies(caller_name, if_installed=True)
        Deps-->>MetaHelper: {group: [dep_names]}
    end
    MetaHelper->>Ver: get_version_for_package(dep_or_user_name)
    Ver-->>MetaHelper: version
    MetaHelper->>Logger: log_message("name==version", label=LogLabel.VERSION)
    Logger-->>User:
    deactivate MetaHelper
    deactivate Logger

    User->>Logger: log_licenses(packages)
    activate Logger
    Logger->>Logger: _installed_package_from_globals(parent.f_globals)
    Logger->>MetaHelper: _log_metadata(packages, _license_for_package, LogLabel.LICENSE, caller_name)
    activate MetaHelper
    alt caller_name is installed
        MetaHelper->>Lic: _license_for_package(caller_name)
        Lic-->>MetaHelper: license
        MetaHelper->>Deps: get_package_dependencies(caller_name, if_installed=True)
        Deps-->>MetaHelper: {group: [dep_names]}
    end
    MetaHelper->>Lic: _license_for_package(dep_or_user_name)
    Lic-->>MetaHelper: license
    MetaHelper->>Logger: log_message("name==license", label=LogLabel.LICENSE)
    Logger-->>User:
    deactivate MetaHelper
    deactivate Logger
Loading

File-Level Changes

Change Details Files
Introduce LogLabel enum and migrate logging API to accept both enum members and strings for labels.
  • Add LogLabel enum encapsulating all built-in labels and implement str for seamless formatting.
  • Update CachingLogger methods (log_message, input_file, output_file, text_data, log_args) to accept LogLabel or string labels and to emit labels via LogLabel constants.
  • Adjust set_logger to use LogLabel-based labels for system details, Python version, user, and command string.
src/scitrack/__init__.py
tests/test_logging.py
Enhance package/version/license discovery and dependency handling, and integrate this into version and license logging.
  • Extend get_package_name to support a no-argument mode that infers the caller’s installed package using frame globals and a helper _installed_package_from_globals.
  • Add _version_via_metadata and change get_version_for_package to prefer importlib.metadata-based resolution, raising PackageNotFoundError for unknown string packages and falling back to module attributes otherwise.
  • Implement dependency parsing helpers (_split_requirement, _extract_extra, marker evaluation helpers) and get_package_dependencies with support for extras, environment markers, and optional installation filtering with memoization.
  • Add license helpers _license_for_package and get_package_licenses, including precedence rules for License-Expression vs License and eager PackageNotFoundError behavior.
  • Refactor CachingLogger.log_versions to log caller package, its installed dependencies, and optional user-specified packages via a shared _log_metadata helper, and add log_licenses with analogous behavior for licenses.
src/scitrack/__init__.py
tests/test_logging.py
Add log_summary utility to parse scitrack log files into structured label-based summaries with derived metadata and reserved label semantics.
  • Introduce log_summary(path, labels=None, all_labels=False) that groups log entries by label, deriving datetime, hostname, and os keys and enforcing reserved label constraints.
  • Recognise default labels from LogLabel plus md5sum variants and optionally extend the recognised set via labels or capture all labels with all_labels=True.
  • Implement robust parsing that skips malformed lines, ignores unknown labels by default, preserves value order, and handles values containing colons.
  • Add extensive tests covering default behaviour, reserved labels, extra labels, all-labels mode, md5sum labels, and interaction with LogLabel enum members.
src/scitrack/__init__.py
tests/test_logging.py
Expand and tighten tests around version logging, dependency handling, license logging, and LogLabel behavior.
  • Add comprehensive tests for get_package_name no-arg behavior across installed packages, subpackages, non-installed packages, main, and missing frames.
  • Update and extend tests for get_version_for_package to expect PackageNotFoundError on bad names and ValueError on bad types, and add separate tests for uninstalled names.
  • Extend CachingLogger.log_versions tests to cover dependency flattening, de-duplication, ordering, external caller resolution, and behavior with uninstalled deps.
  • Add an extensive new test suite for get_package_dependencies, get_package_licenses, log_licenses, LogLabel semantics, and the new log_summary function, including various marker and extras combinations and error cases.
  • Ensure tests confirm memoization of installation checks and that eager-raising behavior avoids partial writes for both versions and licenses.
tests/test_logging.py
Modernize project metadata, documentation, linting, and coverage configuration.
  • Replace README.rst with a new, more detailed README.md and switch pyproject readme and sdist include configuration to use README.md.
  • Add coverage.run and coverage.report configuration in pyproject.toml to scope coverage collection to the scitrack package and avoid stdlib leakage.
  • Update ruff configuration to ignore COM812 in addition to existing ignores.
pyproject.toml
ruff.toml
README.md
README.rst

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@codacy-production

Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 176 complexity · 32 duplication

Metric Results
Complexity 176
Duplication 32

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@GavinHuttley GavinHuttley merged commit 54fc447 into HuttleyLab:develop Jun 24, 2026
9 checks passed

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 7 issues, and left some high level feedback:

  • In get_package_name, log_versions, and log_licenses you now retain references to frames returned by inspect.currentframe() without deleting them, which can create reference cycles; consider explicitly del-ing the frame (and parent) once you’ve extracted the data, as the previous implementation did.
  • The _version_via_metadata helper currently silences all exceptions from importlib.metadata.version by catching Exception; it would be safer to catch PackageNotFoundError (and perhaps a small set of expected errors) so that genuine metadata problems still surface.
  • In _log_metadata, caller_value can be None if _version_via_metadata/_license_for_package returns None, leading to name==None entries; if this is not desired, you might want to either skip those entries or normalise the value into a clearer sentinel (or ensure the helper never returns None).
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `get_package_name`, `log_versions`, and `log_licenses` you now retain references to frames returned by `inspect.currentframe()` without deleting them, which can create reference cycles; consider explicitly `del`-ing the frame (and parent) once you’ve extracted the data, as the previous implementation did.
- The `_version_via_metadata` helper currently silences all exceptions from `importlib.metadata.version` by catching `Exception`; it would be safer to catch `PackageNotFoundError` (and perhaps a small set of expected errors) so that genuine metadata problems still surface.
- In `_log_metadata`, `caller_value` can be `None` if `_version_via_metadata`/`_license_for_package` returns `None`, leading to `name==None` entries; if this is not desired, you might want to either skip those entries or normalise the value into a clearer sentinel (or ensure the helper never returns `None`).

## Individual Comments

### Comment 1
<location path="src/scitrack/__init__.py" line_range="107-115" />
<code_context>
+    return _installed_package_from_globals(parent.f_globals)
+
+
+def _version_via_metadata(name: str) -> str | None:
+    """resolve version from installed distribution metadata (PEP 566).
+
+    Returns None if the distribution is not installed, the lookup raises
+    any exception, or the recorded version is empty.
+    """
+    try:
+        version = importlib.metadata.version(name)
+    except Exception:  # noqa: BLE001
+        return None
+    return version or None
</code_context>
<issue_to_address>
**issue (bug_risk):** Catching bare `Exception` when resolving distribution versions may hide unexpected errors.

This will also mask unrelated errors (e.g. `ValueError` from `importlib.metadata.version` or environmental issues) by turning them into `None`, which makes real problems harder to diagnose. Prefer catching only the expected exceptions (e.g. `PackageNotFoundError`, and `ValueError`/`KeyError` if they are known failure modes) and allow other errors to propagate instead of treating them as “no version”.
</issue_to_address>

### Comment 2
<location path="README.md" line_range="8" />
<code_context>
+
+# About `scitrack`
+
+One of the critical challenges in scientific analysis is to track all the elements involved. This includes the arguments provided to a specific application (including default values), input data files referenced by those arguments and output data generated by the application. In addition to this, tracking a minimal set of system specific information.
+
+`scitrack` is a simple package aimed at researchers writing scripts, or more substantial scientific software, to support the tracking of scientific computation. The package provides elementary functionality to support logging. The primary capabilities concern generating checksums on input and output files and facilitating logging of the computational environment.
</code_context>
<issue_to_address>
**suggestion (typo):** Consider fixing the sentence fragment and adding a hyphen in "system-specific".

For example, you could rephrase it to: "In addition to this, it is important to track a minimal set of system-specific information."

```suggestion
One of the critical challenges in scientific analysis is to track all the elements involved. This includes the arguments provided to a specific application (including default values), input data files referenced by those arguments and output data generated by the application. In addition to this, it is important to track a minimal set of system-specific information.
```
</issue_to_address>

### Comment 3
<location path="README.md" line_range="40" />
<code_context>
+
+> **Warning**
+>
+> Once set, a loggers `.log_file_path` cannot be changed.
+
+## Capturing a programs arguments and options
</code_context>
<issue_to_address>
**suggestion (typo):** Use the possessive form "logger's" instead of "loggers".

Change this sentence to: `Once set, a logger's ".log_file_path" cannot be changed.`

```suggestion
> Once set, a logger's ".log_file_path" cannot be changed.
```
</issue_to_address>

### Comment 4
<location path="README.md" line_range="42" />
<code_context>
+>
+> Once set, a loggers `.log_file_path` cannot be changed.
+
+## Capturing a programs arguments and options
+
+`scitrack` will write the contents of `sys.argv` to the log file, prefixed by `command_string`. However, this only captures arguments specified on the command line. Tracking the value of optional arguments not specified, which may have default values, is critical to tracking the full command set. Doing this is now easy with the simple statement `LOGGER.log_args()`. The logger can also record the versions of named dependencies.
</code_context>
<issue_to_address>
**suggestion (typo):** Add a possessive apostrophe in "program's".

Consider “Capturing a program's arguments and options.” for the heading.

```suggestion
## Capturing a program's arguments and options
```
</issue_to_address>

### Comment 5
<location path="README.md" line_range="145" />
<code_context>
+
+## Other useful functions
+
+Two other useful functions are `get_file_hexdigest()` and `get_text_hexdigest()` compute md5sum for files or text. Those can be used to validate the state recorded in the log-file matches results at a later date, e.g. `output_file()` records the path and md5sum of an output file.
+
+`get_package_licenses(packages)` returns a `{name: license}` mapping for a list of installed packages, raising `PackageNotFoundError` eagerly if any name is not installed.
</code_context>
<issue_to_address>
**suggestion (typo):** Clarify the sentence by adding a missing "that"/"which" before "compute md5sum".

The wording "...and `get_text_hexdigest()` compute md5sum" is ungrammatical; consider restructuring the sentence, e.g. "Two other useful functions are `get_file_hexdigest()` and `get_text_hexdigest()`, which compute md5sums for files or text."

```suggestion
Two other useful functions are `get_file_hexdigest()` and `get_text_hexdigest()`, which compute md5sums for files or text. Those can be used to validate the state recorded in the log-file matches results at a later date, e.g. `output_file()` records the path and md5sum of an output file.
```
</issue_to_address>

### Comment 6
<location path="README.md" line_range="183" />
<code_context>
+$ uv build
+```
+
+[^1]: The hexdigest serves as a unique signature of a files contents.
</code_context>
<issue_to_address>
**suggestion (typo):** Use the possessive form "file's contents" in the footnote.

Change this to "a file's contents" for correct grammar.
</issue_to_address>

### Comment 7
<location path="src/scitrack/__init__.py" line_range="157" />
<code_context>
     return vn


+_REQ_NAME_RE = re.compile(r"^\s*([A-Za-z0-9][A-Za-z0-9._-]*)")
+_EXTRA_CLAUSE_RE = re.compile(r"""^\s*extra\s*==\s*['\"]([^'\"]+)['\"]\s*$""")
+
</code_context>
<issue_to_address>
**issue (complexity):** Consider extracting and simplifying the new dependency/metadata/log parsing helpers (and optionally using `packaging`) so `__init__.py` is smaller and each feature has clearer, more focused code paths.

You’ve added several valuable features, but a few spots can be simplified without losing functionality.

### 1. Custom PEP 508 / dependency parsing

The stack `_split_requirement` / `_extract_extra` / `_marker_env` / `_version_tuple` / `_eval_atom` / `_evaluate_marker` makes `__init__.py` heavy and brittle. If you can depend on `packaging`, you can drop most of this and rely on `packaging.requirements.Requirement` and its marker evaluation:

```python
# requirements.py (new module)
from __future__ import annotations

from collections.abc import Iterable
from importlib import metadata
from packaging.requirements import Requirement

def get_package_dependencies(
    package: str,
    *,
    if_installed: bool = False,
) -> dict[str, list[str]]:
    try:
        raw = metadata.requires(package)
    except metadata.PackageNotFoundError:
        return {}
    if not raw:
        return {}

    result: dict[str, list[str]] = {}
    installed_cache: dict[str, bool] = {}

    for line in raw:
        req = Requirement(line)
        # Skip environment‑specific deps that don't apply
        if req.marker is not None and not req.marker.evaluate():
            continue

        name = req.name
        if if_installed:
            if name not in installed_cache:
                try:
                    metadata.distribution(name)
                except metadata.PackageNotFoundError:
                    installed_cache[name] = False
                else:
                    installed_cache[name] = True
            if not installed_cache[name]:
                continue

        key = next(iter(req.extras), "core") if req.extras else "core"
        result.setdefault(key, []).append(name)

    return result
```

Then in `__init__.py` you can:

```python
# __init__.py
from .requirements import get_package_dependencies
```

and delete `_REQ_NAME_RE`, `_EXTRA_CLAUSE_RE`, `_split_requirement`, `_extract_extra`, `_marker_env`, `_version_tuple`, `_ATOM_RE`, `_VERSION_VARS`, `_compare`, `_eval_atom`, `_evaluate_marker`, and the current `get_package_dependencies` body. This keeps the behavior but isolates and simplifies PEP 508 handling.

If you must stay stdlib-only, moving the existing logic into `requirements.py` (as above, but without `packaging`) still trims `__init__.py` and makes the parsing easier to test in isolation.

---

### 2. Over‑generalised `_log_metadata`

`_log_metadata` mixes: frame inspection, package normalization, dependency expansion, and logging behavior controlled via multiple flags. With only two call sites, you can inline two clearer flows and keep only a tiny shared helper.

For example:

```python
# __init__.py

def _normalize_package_list(
    packages: list[str] | str | types.ModuleType | None,
    *,
    accept_modules: bool,
) -> set[str]:
    if packages is None:
        return set()
    if isinstance(packages, str) or (accept_modules and inspect.ismodule(packages)):
        items = [packages]
    else:
        items = list(packages)

    return {
        (p.__name__.split(".")[0] if inspect.ismodule(p) else p)
        for p in items
    }

class CachingLogger:
    ...

    def log_versions(self, packages: list[str] | str | None = None) -> None:
        frame = inspect.currentframe()
        parent = frame.f_back if frame is not None else None
        caller_name = (
            _installed_package_from_globals(parent.f_globals)
            if parent is not None
            else ""
        )

        # resolve caller first
        entries: list[tuple[str, str | None]] = []
        if caller_name:
            try:
                caller_value = get_version_for_package(caller_name)
            except importlib.metadata.PackageNotFoundError:
                caller_name = ""
            else:
                entries.append((caller_name, caller_value))

        user_names = _normalize_package_list(
            packages,
            accept_modules=True,
        )

        deps = (
            get_package_dependencies(caller_name, if_installed=True)
            if caller_name
            else {}
        )
        dep_names: set[str] = {n for names in deps.values() for n in names}

        for pkg in sorted((dep_names | user_names) - {caller_name}):
            entries.append((pkg, get_version_for_package(pkg)))

        for name, value in entries:
            self.log_message(f"{name}=={value}", label=LogLabel.VERSION)

    def log_licenses(self, packages: list[str] | str | None = None) -> None:
        frame = inspect.currentframe()
        parent = frame.f_back if frame is not None else None
        caller_name = (
            _installed_package_from_globals(parent.f_globals)
            if parent is not None
            else ""
        )

        entries: list[tuple[str, str]] = []
        if caller_name:
            entries.append((caller_name, _license_for_package(caller_name)))

        user_names = _normalize_package_list(
            packages,
            accept_modules=False,
        )

        deps = (
            get_package_dependencies(caller_name, if_installed=True)
            if caller_name
            else {}
        )
        dep_names: set[str] = {n for names in deps.values() for n in names}

        for pkg in sorted((dep_names | user_names) - {caller_name}):
            entries.append((pkg, _license_for_package(pkg)))

        for name, value in entries:
            self.log_message(f"{name}=={value}", label=LogLabel.LICENSE)
```

This:

- Keeps the behavior: caller package + installed deps + user packages, versions/licenses, ordered the same way.
- Removes the highly parameterized `_log_metadata` and its behavioral flags.
- Leaves a small `_normalize_package_list` helper for shared list handling, which is easy to follow.

---

### 3. `log_summary` responsibilities

`log_summary` currently parses the log line format, manages reserved keys, and aggregates values. Splitting the parsing into a tiny helper reduces nesting and makes semantics clearer:

```python
def _parse_log_line(line: str) -> tuple[str, str, str, str] | None:
    line = line.rstrip("\n")
    if not line:
        return None
    try:
        ts, hostpid, level, message = line.split("\t", 3)
    except ValueError:
        return None
    try:
        label, value = message.split(" : ", 1)
    except ValueError:
        return None
    return ts, hostpid, label, value

def log_summary(...):
    ...
    with Path(path).open(encoding="utf-8") as fh:
        for raw in fh:
            parsed = _parse_log_line(raw)
            if parsed is None:
                continue
            ts, hostpid, label, value = parsed
            ...
```

The rest of the aggregation logic (reserved labels, `all_labels`, `LogLabel`‑based recognition) stays the same but reads more linearly.

---

These changes keep all new capabilities but move complexity out of `__init__.py`, leverage battle‑tested parsing, and reduce indirection in the logging API.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread src/scitrack/__init__.py
Comment on lines +107 to +115
def _version_via_metadata(name: str) -> str | None:
"""resolve version from installed distribution metadata (PEP 566).

Returns None if the distribution is not installed, the lookup raises
any exception, or the recorded version is empty.
"""
try:
version = importlib.metadata.version(name)
except Exception: # noqa: BLE001

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Catching bare Exception when resolving distribution versions may hide unexpected errors.

This will also mask unrelated errors (e.g. ValueError from importlib.metadata.version or environmental issues) by turning them into None, which makes real problems harder to diagnose. Prefer catching only the expected exceptions (e.g. PackageNotFoundError, and ValueError/KeyError if they are known failure modes) and allow other errors to propagate instead of treating them as “no version”.

Comment thread README.md

# About `scitrack`

One of the critical challenges in scientific analysis is to track all the elements involved. This includes the arguments provided to a specific application (including default values), input data files referenced by those arguments and output data generated by the application. In addition to this, tracking a minimal set of system specific information.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (typo): Consider fixing the sentence fragment and adding a hyphen in "system-specific".

For example, you could rephrase it to: "In addition to this, it is important to track a minimal set of system-specific information."

Suggested change
One of the critical challenges in scientific analysis is to track all the elements involved. This includes the arguments provided to a specific application (including default values), input data files referenced by those arguments and output data generated by the application. In addition to this, tracking a minimal set of system specific information.
One of the critical challenges in scientific analysis is to track all the elements involved. This includes the arguments provided to a specific application (including default values), input data files referenced by those arguments and output data generated by the application. In addition to this, it is important to track a minimal set of system-specific information.

Comment thread README.md

> **Warning**
>
> Once set, a loggers `.log_file_path` cannot be changed.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (typo): Use the possessive form "logger's" instead of "loggers".

Change this sentence to: Once set, a logger's ".log_file_path" cannot be changed.

Suggested change
> Once set, a loggers `.log_file_path` cannot be changed.
> Once set, a logger's ".log_file_path" cannot be changed.

Comment thread README.md
>
> Once set, a loggers `.log_file_path` cannot be changed.

## Capturing a programs arguments and options

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (typo): Add a possessive apostrophe in "program's".

Consider “Capturing a program's arguments and options.” for the heading.

Suggested change
## Capturing a programs arguments and options
## Capturing a program's arguments and options

Comment thread README.md

## Other useful functions

Two other useful functions are `get_file_hexdigest()` and `get_text_hexdigest()` compute md5sum for files or text. Those can be used to validate the state recorded in the log-file matches results at a later date, e.g. `output_file()` records the path and md5sum of an output file.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (typo): Clarify the sentence by adding a missing "that"/"which" before "compute md5sum".

The wording "...and get_text_hexdigest() compute md5sum" is ungrammatical; consider restructuring the sentence, e.g. "Two other useful functions are get_file_hexdigest() and get_text_hexdigest(), which compute md5sums for files or text."

Suggested change
Two other useful functions are `get_file_hexdigest()` and `get_text_hexdigest()` compute md5sum for files or text. Those can be used to validate the state recorded in the log-file matches results at a later date, e.g. `output_file()` records the path and md5sum of an output file.
Two other useful functions are `get_file_hexdigest()` and `get_text_hexdigest()`, which compute md5sums for files or text. Those can be used to validate the state recorded in the log-file matches results at a later date, e.g. `output_file()` records the path and md5sum of an output file.

Comment thread README.md
$ uv build
```

[^1]: The hexdigest serves as a unique signature of a files contents.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (typo): Use the possessive form "file's contents" in the footnote.

Change this to "a file's contents" for correct grammar.

Comment thread src/scitrack/__init__.py
return vn


_REQ_NAME_RE = re.compile(r"^\s*([A-Za-z0-9][A-Za-z0-9._-]*)")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider extracting and simplifying the new dependency/metadata/log parsing helpers (and optionally using packaging) so __init__.py is smaller and each feature has clearer, more focused code paths.

You’ve added several valuable features, but a few spots can be simplified without losing functionality.

1. Custom PEP 508 / dependency parsing

The stack _split_requirement / _extract_extra / _marker_env / _version_tuple / _eval_atom / _evaluate_marker makes __init__.py heavy and brittle. If you can depend on packaging, you can drop most of this and rely on packaging.requirements.Requirement and its marker evaluation:

# requirements.py (new module)
from __future__ import annotations

from collections.abc import Iterable
from importlib import metadata
from packaging.requirements import Requirement

def get_package_dependencies(
    package: str,
    *,
    if_installed: bool = False,
) -> dict[str, list[str]]:
    try:
        raw = metadata.requires(package)
    except metadata.PackageNotFoundError:
        return {}
    if not raw:
        return {}

    result: dict[str, list[str]] = {}
    installed_cache: dict[str, bool] = {}

    for line in raw:
        req = Requirement(line)
        # Skip environment‑specific deps that don't apply
        if req.marker is not None and not req.marker.evaluate():
            continue

        name = req.name
        if if_installed:
            if name not in installed_cache:
                try:
                    metadata.distribution(name)
                except metadata.PackageNotFoundError:
                    installed_cache[name] = False
                else:
                    installed_cache[name] = True
            if not installed_cache[name]:
                continue

        key = next(iter(req.extras), "core") if req.extras else "core"
        result.setdefault(key, []).append(name)

    return result

Then in __init__.py you can:

# __init__.py
from .requirements import get_package_dependencies

and delete _REQ_NAME_RE, _EXTRA_CLAUSE_RE, _split_requirement, _extract_extra, _marker_env, _version_tuple, _ATOM_RE, _VERSION_VARS, _compare, _eval_atom, _evaluate_marker, and the current get_package_dependencies body. This keeps the behavior but isolates and simplifies PEP 508 handling.

If you must stay stdlib-only, moving the existing logic into requirements.py (as above, but without packaging) still trims __init__.py and makes the parsing easier to test in isolation.


2. Over‑generalised _log_metadata

_log_metadata mixes: frame inspection, package normalization, dependency expansion, and logging behavior controlled via multiple flags. With only two call sites, you can inline two clearer flows and keep only a tiny shared helper.

For example:

# __init__.py

def _normalize_package_list(
    packages: list[str] | str | types.ModuleType | None,
    *,
    accept_modules: bool,
) -> set[str]:
    if packages is None:
        return set()
    if isinstance(packages, str) or (accept_modules and inspect.ismodule(packages)):
        items = [packages]
    else:
        items = list(packages)

    return {
        (p.__name__.split(".")[0] if inspect.ismodule(p) else p)
        for p in items
    }

class CachingLogger:
    ...

    def log_versions(self, packages: list[str] | str | None = None) -> None:
        frame = inspect.currentframe()
        parent = frame.f_back if frame is not None else None
        caller_name = (
            _installed_package_from_globals(parent.f_globals)
            if parent is not None
            else ""
        )

        # resolve caller first
        entries: list[tuple[str, str | None]] = []
        if caller_name:
            try:
                caller_value = get_version_for_package(caller_name)
            except importlib.metadata.PackageNotFoundError:
                caller_name = ""
            else:
                entries.append((caller_name, caller_value))

        user_names = _normalize_package_list(
            packages,
            accept_modules=True,
        )

        deps = (
            get_package_dependencies(caller_name, if_installed=True)
            if caller_name
            else {}
        )
        dep_names: set[str] = {n for names in deps.values() for n in names}

        for pkg in sorted((dep_names | user_names) - {caller_name}):
            entries.append((pkg, get_version_for_package(pkg)))

        for name, value in entries:
            self.log_message(f"{name}=={value}", label=LogLabel.VERSION)

    def log_licenses(self, packages: list[str] | str | None = None) -> None:
        frame = inspect.currentframe()
        parent = frame.f_back if frame is not None else None
        caller_name = (
            _installed_package_from_globals(parent.f_globals)
            if parent is not None
            else ""
        )

        entries: list[tuple[str, str]] = []
        if caller_name:
            entries.append((caller_name, _license_for_package(caller_name)))

        user_names = _normalize_package_list(
            packages,
            accept_modules=False,
        )

        deps = (
            get_package_dependencies(caller_name, if_installed=True)
            if caller_name
            else {}
        )
        dep_names: set[str] = {n for names in deps.values() for n in names}

        for pkg in sorted((dep_names | user_names) - {caller_name}):
            entries.append((pkg, _license_for_package(pkg)))

        for name, value in entries:
            self.log_message(f"{name}=={value}", label=LogLabel.LICENSE)

This:

  • Keeps the behavior: caller package + installed deps + user packages, versions/licenses, ordered the same way.
  • Removes the highly parameterized _log_metadata and its behavioral flags.
  • Leaves a small _normalize_package_list helper for shared list handling, which is easy to follow.

3. log_summary responsibilities

log_summary currently parses the log line format, manages reserved keys, and aggregates values. Splitting the parsing into a tiny helper reduces nesting and makes semantics clearer:

def _parse_log_line(line: str) -> tuple[str, str, str, str] | None:
    line = line.rstrip("\n")
    if not line:
        return None
    try:
        ts, hostpid, level, message = line.split("\t", 3)
    except ValueError:
        return None
    try:
        label, value = message.split(" : ", 1)
    except ValueError:
        return None
    return ts, hostpid, label, value

def log_summary(...):
    ...
    with Path(path).open(encoding="utf-8") as fh:
        for raw in fh:
            parsed = _parse_log_line(raw)
            if parsed is None:
                continue
            ts, hostpid, label, value = parsed
            ...

The rest of the aggregation logic (reserved labels, all_labels, LogLabel‑based recognition) stays the same but reads more linearly.


These changes keep all new capabilities but move complexity out of __init__.py, leverage battle‑tested parsing, and reduce indirection in the logging API.

@github-actions

Copy link
Copy Markdown

Coverage Report for CI Build 28068855788

Coverage decreased (-91.5%) to 8.525%

Details

  • Coverage decreased (-91.5%) from the base build.
  • Patch coverage: 11 uncovered changes across 1 file (221 of 232 lines covered, 95.26%).
  • No coverage regressions found.

Uncovered Changes

File Changed Covered %
src/scitrack/init.py 232 221 95.26%

Coverage Regressions

No coverage regressions found.


Coverage Stats

Coverage Status
Relevant Lines: 5619
Covered Lines: 479
Line Coverage: 8.52%
Coverage Strength: 0.4 hits per line

💛 - Coveralls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant