Enhancement patches by GavinHuttley · Pull Request #57 · HuttleyLab/scitrack

GavinHuttley · 2026-06-24T01:29:17Z

Summary by Sourcery

Introduce richer package metadata handling, dependency and license introspection, and log summarisation utilities, while updating logging labels and documentation to a new README format.

New Features:

Add a LogLabel enum for first-class, typed log labels used throughout the logging API.
Add dependency inspection helpers that resolve an installed package's declared requirements with environment marker evaluation and optional installation filtering.
Add license inspection helpers that resolve package licenses from metadata and expose them via get_package_licenses and CachingLogger.log_licenses.
Add a log_summary utility to parse scitrack log files into a structured, label-keyed summary API.

Enhancements:

Extend get_package_name to support no-argument calls that infer the caller's installed distribution name via frame inspection.
Improve get_version_for_package to prefer importlib.metadata versions, distinguish missing versus uninstalled packages, and support module objects more robustly.
Enhance CachingLogger to accept LogLabel values, to log versions and licenses for the caller package plus its installed dependencies and user-specified packages, and to centralize shared metadata logging logic.
Adjust built-in logging of system details, Python version, user, and command string to use the new LogLabel values consistently.

Build:

Configure coverage.py to focus on the scitrack package and adjust Ruff configuration to ignore COM812.

Documentation:

Replace the reStructuredText README with an expanded Markdown README that documents installation, CachingLogger usage, new version/license logging behavior, and log_summary.
Update packaging metadata and sdist configuration to point to the new README.md.

Tests:

Greatly expand tests around get_package_name, version resolution, dependency parsing and filtering, version and license logging behaviors, and the new log_summary and license utilities to ensure robust, edge-case-aware behavior.

[NEW] returns log file entries grouped by label. Useful for client code that wants to generate summaries.

… is given

Returns a dict keyed by install option ("core" plus one entry per extra), with values being base package names. Non-installed packages yield {}; non-extra env markers are evaluated against the current interpreter so non-applicable deps are dropped.

When True, returns only installed dependencies; each distinct name is probed at most once per call via a local cache.

…alled Previously raised a generic ValueError, which made it impossible for callers of log_versions() to distinguish "package not installed" from other ValueError conditions. The new behaviour propagates through log_versions() unchanged.

[CHANGED] CachingLogger.log_versions() now also emits a version line for each installed dependency of the caller's package, as reported by get_package_dependencies(if_installed=True).

[NEW] for license tracking. The log_licenses() method mirrors log_versions(). License resolution prefers PEP 639 License-Expression over the legacy License field. Missing licenses are recorded as "UNKNOWN".

…rack Capture the caller frame at the public-method boundary where f_back is unambiguously the call site.

The leading per-line timestamp and host:pid prefix were previously discarded. log_summary() now returns three reserved keys derived from the log itself: "datetime" (first line's timestamp, when logging started), "hostname" (host from the line prefix) and "os" (from the system_details line, when present).

sourcery-ai · 2026-06-24T01:29:23Z

Reviewer's Guide

Adds richer package/version/license introspection utilities, a LogLabel enum-based labeling scheme, log summary parsing, and extensive tests while modernizing packaging metadata and tooling configuration.

Sequence diagram for enhanced log_versions and log_licenses metadata logging

sequenceDiagram
    actor User
    participant Logger as CachingLogger
    participant MetaHelper as _log_metadata
    participant Deps as get_package_dependencies
    participant Ver as get_version_for_package
    participant Lic as _license_for_package

    User->>Logger: log_versions(packages)
    activate Logger
    Logger->>Logger: _installed_package_from_globals(parent.f_globals)
    Logger->>MetaHelper: _log_metadata(packages, get_version_for_package, LogLabel.VERSION, caller_name)
    activate MetaHelper
    alt caller_name is installed
        MetaHelper->>Ver: get_version_for_package(caller_name)
        Ver-->>MetaHelper: version
        MetaHelper->>Deps: get_package_dependencies(caller_name, if_installed=True)
        Deps-->>MetaHelper: {group: [dep_names]}
    end
    MetaHelper->>Ver: get_version_for_package(dep_or_user_name)
    Ver-->>MetaHelper: version
    MetaHelper->>Logger: log_message("name==version", label=LogLabel.VERSION)
    Logger-->>User:
    deactivate MetaHelper
    deactivate Logger

    User->>Logger: log_licenses(packages)
    activate Logger
    Logger->>Logger: _installed_package_from_globals(parent.f_globals)
    Logger->>MetaHelper: _log_metadata(packages, _license_for_package, LogLabel.LICENSE, caller_name)
    activate MetaHelper
    alt caller_name is installed
        MetaHelper->>Lic: _license_for_package(caller_name)
        Lic-->>MetaHelper: license
        MetaHelper->>Deps: get_package_dependencies(caller_name, if_installed=True)
        Deps-->>MetaHelper: {group: [dep_names]}
    end
    MetaHelper->>Lic: _license_for_package(dep_or_user_name)
    Lic-->>MetaHelper: license
    MetaHelper->>Logger: log_message("name==license", label=LogLabel.LICENSE)
    Logger-->>User:
    deactivate MetaHelper
    deactivate Logger

File-Level Changes

Change	Details	Files
Introduce LogLabel enum and migrate logging API to accept both enum members and strings for labels.	Add LogLabel enum encapsulating all built-in labels and implement str for seamless formatting. Update CachingLogger methods (log_message, input_file, output_file, text_data, log_args) to accept LogLabel or string labels and to emit labels via LogLabel constants. Adjust set_logger to use LogLabel-based labels for system details, Python version, user, and command string.	`src/scitrack/__init__.py` `tests/test_logging.py`
Enhance package/version/license discovery and dependency handling, and integrate this into version and license logging.	Extend get_package_name to support a no-argument mode that infers the caller’s installed package using frame globals and a helper _installed_package_from_globals. Add _version_via_metadata and change get_version_for_package to prefer importlib.metadata-based resolution, raising PackageNotFoundError for unknown string packages and falling back to module attributes otherwise. Implement dependency parsing helpers (_split_requirement, _extract_extra, marker evaluation helpers) and get_package_dependencies with support for extras, environment markers, and optional installation filtering with memoization. Add license helpers _license_for_package and get_package_licenses, including precedence rules for License-Expression vs License and eager PackageNotFoundError behavior. Refactor CachingLogger.log_versions to log caller package, its installed dependencies, and optional user-specified packages via a shared _log_metadata helper, and add log_licenses with analogous behavior for licenses.	`src/scitrack/__init__.py` `tests/test_logging.py`
Add log_summary utility to parse scitrack log files into structured label-based summaries with derived metadata and reserved label semantics.	Introduce log_summary(path, labels=None, all_labels=False) that groups log entries by label, deriving datetime, hostname, and os keys and enforcing reserved label constraints. Recognise default labels from LogLabel plus md5sum variants and optionally extend the recognised set via labels or capture all labels with all_labels=True. Implement robust parsing that skips malformed lines, ignores unknown labels by default, preserves value order, and handles values containing colons. Add extensive tests covering default behaviour, reserved labels, extra labels, all-labels mode, md5sum labels, and interaction with LogLabel enum members.	`src/scitrack/__init__.py` `tests/test_logging.py`
Expand and tighten tests around version logging, dependency handling, license logging, and LogLabel behavior.	Add comprehensive tests for get_package_name no-arg behavior across installed packages, subpackages, non-installed packages, main, and missing frames. Update and extend tests for get_version_for_package to expect PackageNotFoundError on bad names and ValueError on bad types, and add separate tests for uninstalled names. Extend CachingLogger.log_versions tests to cover dependency flattening, de-duplication, ordering, external caller resolution, and behavior with uninstalled deps. Add an extensive new test suite for get_package_dependencies, get_package_licenses, log_licenses, LogLabel semantics, and the new log_summary function, including various marker and extras combinations and error cases. Ensure tests confirm memoization of installation checks and that eager-raising behavior avoids partial writes for both versions and licenses.	`tests/test_logging.py`
Modernize project metadata, documentation, linting, and coverage configuration.	Replace README.rst with a new, more detailed README.md and switch pyproject readme and sdist include configuration to use README.md. Add coverage.run and coverage.report configuration in pyproject.toml to scope coverage collection to the scitrack package and avoid stdlib leakage. Update ruff configuration to ignore COM812 in addition to existing ignores.	`pyproject.toml` `ruff.toml` `README.md` `README.rst`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

codacy-production · 2026-06-24T01:30:15Z

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 176 complexity · 32 duplication

Metric Results

Complexity 176

Duplication 32

View in Codacy

_{NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer}
_{TIP This summary will be updated as you push new changes.}

sourcery-ai

Hey - I've found 7 issues, and left some high level feedback:

In get_package_name, log_versions, and log_licenses you now retain references to frames returned by inspect.currentframe() without deleting them, which can create reference cycles; consider explicitly del-ing the frame (and parent) once you’ve extracted the data, as the previous implementation did.
The _version_via_metadata helper currently silences all exceptions from importlib.metadata.version by catching Exception; it would be safer to catch PackageNotFoundError (and perhaps a small set of expected errors) so that genuine metadata problems still surface.
In _log_metadata, caller_value can be None if _version_via_metadata/_license_for_package returns None, leading to name==None entries; if this is not desired, you might want to either skip those entries or normalise the value into a clearer sentinel (or ensure the helper never returns None).

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In `get_package_name`, `log_versions`, and `log_licenses` you now retain references to frames returned by `inspect.currentframe()` without deleting them, which can create reference cycles; consider explicitly `del`-ing the frame (and parent) once you’ve extracted the data, as the previous implementation did.
- The `_version_via_metadata` helper currently silences all exceptions from `importlib.metadata.version` by catching `Exception`; it would be safer to catch `PackageNotFoundError` (and perhaps a small set of expected errors) so that genuine metadata problems still surface.
- In `_log_metadata`, `caller_value` can be `None` if `_version_via_metadata`/`_license_for_package` returns `None`, leading to `name==None` entries; if this is not desired, you might want to either skip those entries or normalise the value into a clearer sentinel (or ensure the helper never returns `None`).

## Individual Comments

### Comment 1
<location path="src/scitrack/__init__.py" line_range="107-115" />
<code_context>
+    return _installed_package_from_globals(parent.f_globals)
+
+
+def _version_via_metadata(name: str) -> str | None:
+    """resolve version from installed distribution metadata (PEP 566).
+
+    Returns None if the distribution is not installed, the lookup raises
+    any exception, or the recorded version is empty.
+    """
+    try:
+        version = importlib.metadata.version(name)
+    except Exception:  # noqa: BLE001
+        return None
+    return version or None
</code_context>
<issue_to_address>
**issue (bug_risk):** Catching bare `Exception` when resolving distribution versions may hide unexpected errors.

This will also mask unrelated errors (e.g. `ValueError` from `importlib.metadata.version` or environmental issues) by turning them into `None`, which makes real problems harder to diagnose. Prefer catching only the expected exceptions (e.g. `PackageNotFoundError`, and `ValueError`/`KeyError` if they are known failure modes) and allow other errors to propagate instead of treating them as “no version”.
</issue_to_address>

### Comment 2
<location path="README.md" line_range="8" />
<code_context>
+
+# About `scitrack`
+
+One of the critical challenges in scientific analysis is to track all the elements involved. This includes the arguments provided to a specific application (including default values), input data files referenced by those arguments and output data generated by the application. In addition to this, tracking a minimal set of system specific information.
+
+`scitrack` is a simple package aimed at researchers writing scripts, or more substantial scientific software, to support the tracking of scientific computation. The package provides elementary functionality to support logging. The primary capabilities concern generating checksums on input and output files and facilitating logging of the computational environment.
</code_context>
<issue_to_address>
**suggestion (typo):** Consider fixing the sentence fragment and adding a hyphen in "system-specific".

For example, you could rephrase it to: "In addition to this, it is important to track a minimal set of system-specific information."

```suggestion
One of the critical challenges in scientific analysis is to track all the elements involved. This includes the arguments provided to a specific application (including default values), input data files referenced by those arguments and output data generated by the application. In addition to this, it is important to track a minimal set of system-specific information.
```
</issue_to_address>

### Comment 3
<location path="README.md" line_range="40" />
<code_context>
+
+> **Warning**
+>
+> Once set, a loggers `.log_file_path` cannot be changed.
+
+## Capturing a programs arguments and options
</code_context>
<issue_to_address>
**suggestion (typo):** Use the possessive form "logger's" instead of "loggers".

Change this sentence to: `Once set, a logger's ".log_file_path" cannot be changed.`

```suggestion
> Once set, a logger's ".log_file_path" cannot be changed.
```
</issue_to_address>

### Comment 4
<location path="README.md" line_range="42" />
<code_context>
+>
+> Once set, a loggers `.log_file_path` cannot be changed.
+
+## Capturing a programs arguments and options
+
+`scitrack` will write the contents of `sys.argv` to the log file, prefixed by `command_string`. However, this only captures arguments specified on the command line. Tracking the value of optional arguments not specified, which may have default values, is critical to tracking the full command set. Doing this is now easy with the simple statement `LOGGER.log_args()`. The logger can also record the versions of named dependencies.
</code_context>
<issue_to_address>
**suggestion (typo):** Add a possessive apostrophe in "program's".

Consider “Capturing a program's arguments and options.” for the heading.

```suggestion
## Capturing a program's arguments and options
```
</issue_to_address>

### Comment 5
<location path="README.md" line_range="145" />
<code_context>
+
+## Other useful functions
+
+Two other useful functions are `get_file_hexdigest()` and `get_text_hexdigest()` compute md5sum for files or text. Those can be used to validate the state recorded in the log-file matches results at a later date, e.g. `output_file()` records the path and md5sum of an output file.
+
+`get_package_licenses(packages)` returns a `{name: license}` mapping for a list of installed packages, raising `PackageNotFoundError` eagerly if any name is not installed.
</code_context>
<issue_to_address>
**suggestion (typo):** Clarify the sentence by adding a missing "that"/"which" before "compute md5sum".

The wording "...and `get_text_hexdigest()` compute md5sum" is ungrammatical; consider restructuring the sentence, e.g. "Two other useful functions are `get_file_hexdigest()` and `get_text_hexdigest()`, which compute md5sums for files or text."

```suggestion
Two other useful functions are `get_file_hexdigest()` and `get_text_hexdigest()`, which compute md5sums for files or text. Those can be used to validate the state recorded in the log-file matches results at a later date, e.g. `output_file()` records the path and md5sum of an output file.
```
</issue_to_address>

### Comment 6
<location path="README.md" line_range="183" />
<code_context>
+$ uv build
+```
+
+[^1]: The hexdigest serves as a unique signature of a files contents.
</code_context>
<issue_to_address>
**suggestion (typo):** Use the possessive form "file's contents" in the footnote.

Change this to "a file's contents" for correct grammar.
</issue_to_address>

### Comment 7
<location path="src/scitrack/__init__.py" line_range="157" />
<code_context>
     return vn


+_REQ_NAME_RE = re.compile(r"^\s*([A-Za-z0-9][A-Za-z0-9._-]*)")
+_EXTRA_CLAUSE_RE = re.compile(r"""^\s*extra\s*==\s*['\"]([^'\"]+)['\"]\s*$""")
+
</code_context>
<issue_to_address>
**issue (complexity):** Consider extracting and simplifying the new dependency/metadata/log parsing helpers (and optionally using `packaging`) so `__init__.py` is smaller and each feature has clearer, more focused code paths.

You’ve added several valuable features, but a few spots can be simplified without losing functionality.

### 1. Custom PEP 508 / dependency parsing

The stack `_split_requirement` / `_extract_extra` / `_marker_env` / `_version_tuple` / `_eval_atom` / `_evaluate_marker` makes `__init__.py` heavy and brittle. If you can depend on `packaging`, you can drop most of this and rely on `packaging.requirements.Requirement` and its marker evaluation:

```python
# requirements.py (new module)
from __future__ import annotations

from collections.abc import Iterable
from importlib import metadata
from packaging.requirements import Requirement

def get_package_dependencies(
    package: str,
    *,
    if_installed: bool = False,
) -> dict[str, list[str]]:
    try:
        raw = metadata.requires(package)
    except metadata.PackageNotFoundError:
        return {}
    if not raw:
        return {}

    result: dict[str, list[str]] = {}
    installed_cache: dict[str, bool] = {}

    for line in raw:
        req = Requirement(line)
        # Skip environment‑specific deps that don't apply
        if req.marker is not None and not req.marker.evaluate():
            continue

        name = req.name
        if if_installed:
            if name not in installed_cache:
                try:
                    metadata.distribution(name)
                except metadata.PackageNotFoundError:
                    installed_cache[name] = False
                else:
                    installed_cache[name] = True
            if not installed_cache[name]:
                continue

        key = next(iter(req.extras), "core") if req.extras else "core"
        result.setdefault(key, []).append(name)

    return result
```

Then in `__init__.py` you can:

```python
# __init__.py
from .requirements import get_package_dependencies
```

and delete `_REQ_NAME_RE`, `_EXTRA_CLAUSE_RE`, `_split_requirement`, `_extract_extra`, `_marker_env`, `_version_tuple`, `_ATOM_RE`, `_VERSION_VARS`, `_compare`, `_eval_atom`, `_evaluate_marker`, and the current `get_package_dependencies` body. This keeps the behavior but isolates and simplifies PEP 508 handling.

If you must stay stdlib-only, moving the existing logic into `requirements.py` (as above, but without `packaging`) still trims `__init__.py` and makes the parsing easier to test in isolation.

---

### 2. Over‑generalised `_log_metadata`

`_log_metadata` mixes: frame inspection, package normalization, dependency expansion, and logging behavior controlled via multiple flags. With only two call sites, you can inline two clearer flows and keep only a tiny shared helper.

For example:

```python
# __init__.py

def _normalize_package_list(
    packages: list[str] | str | types.ModuleType | None,
    *,
    accept_modules: bool,
) -> set[str]:
    if packages is None:
        return set()
    if isinstance(packages, str) or (accept_modules and inspect.ismodule(packages)):
        items = [packages]
    else:
        items = list(packages)

    return {
        (p.__name__.split(".")[0] if inspect.ismodule(p) else p)
        for p in items
    }

class CachingLogger:
    ...

    def log_versions(self, packages: list[str] | str | None = None) -> None:
        frame = inspect.currentframe()
        parent = frame.f_back if frame is not None else None
        caller_name = (
            _installed_package_from_globals(parent.f_globals)
            if parent is not None
            else ""
        )

        # resolve caller first
        entries: list[tuple[str, str | None]] = []
        if caller_name:
            try:
                caller_value = get_version_for_package(caller_name)
            except importlib.metadata.PackageNotFoundError:
                caller_name = ""
            else:
                entries.append((caller_name, caller_value))

        user_names = _normalize_package_list(
            packages,
            accept_modules=True,
        )

        deps = (
            get_package_dependencies(caller_name, if_installed=True)
            if caller_name
            else {}
        )
        dep_names: set[str] = {n for names in deps.values() for n in names}

        for pkg in sorted((dep_names | user_names) - {caller_name}):
            entries.append((pkg, get_version_for_package(pkg)))

        for name, value in entries:
            self.log_message(f"{name}=={value}", label=LogLabel.VERSION)

    def log_licenses(self, packages: list[str] | str | None = None) -> None:
        frame = inspect.currentframe()
        parent = frame.f_back if frame is not None else None
        caller_name = (
            _installed_package_from_globals(parent.f_globals)
            if parent is not None
            else ""
        )

        entries: list[tuple[str, str]] = []
        if caller_name:
            entries.append((caller_name, _license_for_package(caller_name)))

        user_names = _normalize_package_list(
            packages,
            accept_modules=False,
        )

        deps = (
            get_package_dependencies(caller_name, if_installed=True)
            if caller_name
            else {}
        )
        dep_names: set[str] = {n for names in deps.values() for n in names}

        for pkg in sorted((dep_names | user_names) - {caller_name}):
            entries.append((pkg, _license_for_package(pkg)))

        for name, value in entries:
            self.log_message(f"{name}=={value}", label=LogLabel.LICENSE)
```

This:

- Keeps the behavior: caller package + installed deps + user packages, versions/licenses, ordered the same way.
- Removes the highly parameterized `_log_metadata` and its behavioral flags.
- Leaves a small `_normalize_package_list` helper for shared list handling, which is easy to follow.

---

### 3. `log_summary` responsibilities

`log_summary` currently parses the log line format, manages reserved keys, and aggregates values. Splitting the parsing into a tiny helper reduces nesting and makes semantics clearer:

```python
def _parse_log_line(line: str) -> tuple[str, str, str, str] | None:
    line = line.rstrip("\n")
    if not line:
        return None
    try:
        ts, hostpid, level, message = line.split("\t", 3)
    except ValueError:
        return None
    try:
        label, value = message.split(" : ", 1)
    except ValueError:
        return None
    return ts, hostpid, label, value

def log_summary(...):
    ...
    with Path(path).open(encoding="utf-8") as fh:
        for raw in fh:
            parsed = _parse_log_line(raw)
            if parsed is None:
                continue
            ts, hostpid, label, value = parsed
            ...
```

The rest of the aggregation logic (reserved labels, `all_labels`, `LogLabel`‑based recognition) stays the same but reads more linearly.

---

These changes keep all new capabilities but move complexity out of `__init__.py`, leverage battle‑tested parsing, and reduce indirection in the logging API.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-06-24T01:31:22Z

+def _version_via_metadata(name: str) -> str | None:
+    """resolve version from installed distribution metadata (PEP 566).
+
+    Returns None if the distribution is not installed, the lookup raises
+    any exception, or the recorded version is empty.
+    """
+    try:
+        version = importlib.metadata.version(name)
+    except Exception:  # noqa: BLE001


issue (bug_risk): Catching bare Exception when resolving distribution versions may hide unexpected errors.

This will also mask unrelated errors (e.g. ValueError from importlib.metadata.version or environmental issues) by turning them into None, which makes real problems harder to diagnose. Prefer catching only the expected exceptions (e.g. PackageNotFoundError, and ValueError/KeyError if they are known failure modes) and allow other errors to propagate instead of treating them as “no version”.

sourcery-ai · 2026-06-24T01:31:22Z

+
+# About `scitrack`
+
+One of the critical challenges in scientific analysis is to track all the elements involved. This includes the arguments provided to a specific application (including default values), input data files referenced by those arguments and output data generated by the application. In addition to this, tracking a minimal set of system specific information.


suggestion (typo): Consider fixing the sentence fragment and adding a hyphen in "system-specific".

For example, you could rephrase it to: "In addition to this, it is important to track a minimal set of system-specific information."

Suggested change

One of the critical challenges in scientific analysis is to track all the elements involved. This includes the arguments provided to a specific application (including default values), input data files referenced by those arguments and output data generated by the application. In addition to this, tracking a minimal set of system specific information.

One of the critical challenges in scientific analysis is to track all the elements involved. This includes the arguments provided to a specific application (including default values), input data files referenced by those arguments and output data generated by the application. In addition to this, it is important to track a minimal set of system-specific information.

sourcery-ai · 2026-06-24T01:31:22Z

+
+> **Warning**
+>
+> Once set, a loggers `.log_file_path` cannot be changed.


suggestion (typo): Use the possessive form "logger's" instead of "loggers".

Change this sentence to: Once set, a logger's ".log_file_path" cannot be changed.

Suggested change

> Once set, a loggers `.log_file_path` cannot be changed.

> Once set, a logger's ".log_file_path" cannot be changed.

sourcery-ai · 2026-06-24T01:31:22Z

+>
+> Once set, a loggers `.log_file_path` cannot be changed.
+
+## Capturing a programs arguments and options


suggestion (typo): Add a possessive apostrophe in "program's".

Consider “Capturing a program's arguments and options.” for the heading.

Suggested change

## Capturing a programs arguments and options

## Capturing a program's arguments and options

sourcery-ai · 2026-06-24T01:31:22Z

+
+## Other useful functions
+
+Two other useful functions are `get_file_hexdigest()` and `get_text_hexdigest()` compute md5sum for files or text. Those can be used to validate the state recorded in the log-file matches results at a later date, e.g. `output_file()` records the path and md5sum of an output file.


suggestion (typo): Clarify the sentence by adding a missing "that"/"which" before "compute md5sum".

The wording "...and get_text_hexdigest() compute md5sum" is ungrammatical; consider restructuring the sentence, e.g. "Two other useful functions are get_file_hexdigest() and get_text_hexdigest(), which compute md5sums for files or text."

Suggested change

Two other useful functions are `get_file_hexdigest()` and `get_text_hexdigest()` compute md5sum for files or text. Those can be used to validate the state recorded in the log-file matches results at a later date, e.g. `output_file()` records the path and md5sum of an output file.

Two other useful functions are `get_file_hexdigest()` and `get_text_hexdigest()`, which compute md5sums for files or text. Those can be used to validate the state recorded in the log-file matches results at a later date, e.g. `output_file()` records the path and md5sum of an output file.

sourcery-ai · 2026-06-24T01:31:22Z

+$ uv build
+```
+
+[^1]: The hexdigest serves as a unique signature of a files contents.


suggestion (typo): Use the possessive form "file's contents" in the footnote.

Change this to "a file's contents" for correct grammar.

sourcery-ai · 2026-06-24T01:31:22Z

    return vn


+_REQ_NAME_RE = re.compile(r"^\s*([A-Za-z0-9][A-Za-z0-9._-]*)")


issue (complexity): Consider extracting and simplifying the new dependency/metadata/log parsing helpers (and optionally using packaging) so __init__.py is smaller and each feature has clearer, more focused code paths.

You’ve added several valuable features, but a few spots can be simplified without losing functionality.

1. Custom PEP 508 / dependency parsing

The stack _split_requirement / _extract_extra / _marker_env / _version_tuple / _eval_atom / _evaluate_marker makes __init__.py heavy and brittle. If you can depend on packaging, you can drop most of this and rely on packaging.requirements.Requirement and its marker evaluation:

# requirements.py (new module) from __future__ import annotations from collections.abc import Iterable from importlib import metadata from packaging.requirements import Requirement def get_package_dependencies( package: str, *, if_installed: bool = False, ) -> dict[str, list[str]]: try: raw = metadata.requires(package) except metadata.PackageNotFoundError: return {} if not raw: return {} result: dict[str, list[str]] = {} installed_cache: dict[str, bool] = {} for line in raw: req = Requirement(line) # Skip environment‑specific deps that don't apply if req.marker is not None and not req.marker.evaluate(): continue name = req.name if if_installed: if name not in installed_cache: try: metadata.distribution(name) except metadata.PackageNotFoundError: installed_cache[name] = False else: installed_cache[name] = True if not installed_cache[name]: continue key = next(iter(req.extras), "core") if req.extras else "core" result.setdefault(key, []).append(name) return result

Then in __init__.py you can:

# __init__.py from .requirements import get_package_dependencies

and delete _REQ_NAME_RE, _EXTRA_CLAUSE_RE, _split_requirement, _extract_extra, _marker_env, _version_tuple, _ATOM_RE, _VERSION_VARS, _compare, _eval_atom, _evaluate_marker, and the current get_package_dependencies body. This keeps the behavior but isolates and simplifies PEP 508 handling.

If you must stay stdlib-only, moving the existing logic into requirements.py (as above, but without packaging) still trims __init__.py and makes the parsing easier to test in isolation.

2. Over‑generalised _log_metadata

_log_metadata mixes: frame inspection, package normalization, dependency expansion, and logging behavior controlled via multiple flags. With only two call sites, you can inline two clearer flows and keep only a tiny shared helper.

For example:

# __init__.py def _normalize_package_list( packages: list[str] | str | types.ModuleType | None, *, accept_modules: bool, ) -> set[str]: if packages is None: return set() if isinstance(packages, str) or (accept_modules and inspect.ismodule(packages)): items = [packages] else: items = list(packages) return { (p.__name__.split(".")[0] if inspect.ismodule(p) else p) for p in items } class CachingLogger: ... def log_versions(self, packages: list[str] | str | None = None) -> None: frame = inspect.currentframe() parent = frame.f_back if frame is not None else None caller_name = ( _installed_package_from_globals(parent.f_globals) if parent is not None else "" ) # resolve caller first entries: list[tuple[str, str | None]] = [] if caller_name: try: caller_value = get_version_for_package(caller_name) except importlib.metadata.PackageNotFoundError: caller_name = "" else: entries.append((caller_name, caller_value)) user_names = _normalize_package_list( packages, accept_modules=True, ) deps = ( get_package_dependencies(caller_name, if_installed=True) if caller_name else {} ) dep_names: set[str] = {n for names in deps.values() for n in names} for pkg in sorted((dep_names | user_names) - {caller_name}): entries.append((pkg, get_version_for_package(pkg))) for name, value in entries: self.log_message(f"{name}=={value}", label=LogLabel.VERSION) def log_licenses(self, packages: list[str] | str | None = None) -> None: frame = inspect.currentframe() parent = frame.f_back if frame is not None else None caller_name = ( _installed_package_from_globals(parent.f_globals) if parent is not None else "" ) entries: list[tuple[str, str]] = [] if caller_name: entries.append((caller_name, _license_for_package(caller_name))) user_names = _normalize_package_list( packages, accept_modules=False, ) deps = ( get_package_dependencies(caller_name, if_installed=True) if caller_name else {} ) dep_names: set[str] = {n for names in deps.values() for n in names} for pkg in sorted((dep_names | user_names) - {caller_name}): entries.append((pkg, _license_for_package(pkg))) for name, value in entries: self.log_message(f"{name}=={value}", label=LogLabel.LICENSE)

This:

Keeps the behavior: caller package + installed deps + user packages, versions/licenses, ordered the same way.

Removes the highly parameterized _log_metadata and its behavioral flags.

Leaves a small _normalize_package_list helper for shared list handling, which is easy to follow.

3. log_summary responsibilities

log_summary currently parses the log line format, manages reserved keys, and aggregates values. Splitting the parsing into a tiny helper reduces nesting and makes semantics clearer:

def _parse_log_line(line: str) -> tuple[str, str, str, str] | None: line = line.rstrip("\n") if not line: return None try: ts, hostpid, level, message = line.split("\t", 3) except ValueError: return None try: label, value = message.split(" : ", 1) except ValueError: return None return ts, hostpid, label, value def log_summary(...): ... with Path(path).open(encoding="utf-8") as fh: for raw in fh: parsed = _parse_log_line(raw) if parsed is None: continue ts, hostpid, label, value = parsed ...

The rest of the aggregation logic (reserved labels, all_labels, LogLabel‑based recognition) stays the same but reads more linearly.

These changes keep all new capabilities but move complexity out of __init__.py, leverage battle‑tested parsing, and reduce indirection in the logging API.

github-actions · 2026-06-24T01:31:40Z

Coverage Report for CI Build 28068855788

Coverage decreased (-91.5%) to 8.525%

Details

Coverage decreased (-91.5%) from the base build.
Patch coverage: 11 uncovered changes across 1 file (221 of 232 lines covered, 95.26%).
No coverage regressions found.

Uncovered Changes

File	Changed	Covered	%
src/scitrack/init.py	232	221	95.26%

Coverage Regressions

No coverage regressions found.

Coverage Stats


Relevant Lines:	5619
Covered Lines:	479
Line Coverage:	8.52%
Coverage Strength:	0.4 hits per line

💛 - Coveralls

GavinHuttley added 17 commits June 9, 2026 08:59

DEV: ignore ruff pattern

b22b149

ENH: add usage of package metadata for extracting version info

39568b2

ENH: use Enum for logging label consistency

b0d3859

ENH: add a log_summary function

240510b

[NEW] returns log file entries grouped by label. Useful for client code that wants to generate summaries.

DEV: converted README to markdown

46947b6

DOC: changed README to markdown

2716fe0

DOC: describe the new log_summary function

2441c73

ENH: get_package_name() infers caller's installed package when no obj…

2eaf55b

… is given

ENH: added get_package_dependencies()

a0ead94

Returns a dict keyed by install option ("core" plus one entry per extra), with values being base package names. Non-installed packages yield {}; non-extra env markers are evaluated against the current interpreter so non-applicable deps are dropped.

ENH: get_package_dependencies() gains if_installed kwarg

e76016a

When True, returns only installed dependencies; each distinct name is probed at most once per call via a local cache.

ENH: log_versions tracks the caller's installed dependencies

74ba3c8

[CHANGED] CachingLogger.log_versions() now also emits a version line for each installed dependency of the caller's package, as reported by get_package_dependencies(if_installed=True).

ENH: new CachingLogger.log_licenses() and get_package_licenses()

108f831

[NEW] for license tracking. The log_licenses() method mirrors log_versions(). License resolution prefers PEP 639 License-Expression over the legacy License field. Missing licenses are recorded as "UNKNOWN".

TST: test showing log_versions not correctly resolved

8ef48b7

FIX: log_versions/log_licenses resolve the caller's package, not scit…

204c66d

…rack Capture the caller frame at the public-method boundary where f_back is unambiguously the call site.

MAINT: fix coverage including stdlib packages

c0be381

GavinHuttley merged commit 54fc447 into HuttleyLab:develop Jun 24, 2026
9 checks passed

sourcery-ai Bot reviewed Jun 24, 2026

View reviewed changes


		# About `scitrack`

		One of the critical challenges in scientific analysis is to track all the elements involved. This includes the arguments provided to a specific application (including default values), input data files referenced by those arguments and output data generated by the application. In addition to this, tracking a minimal set of system specific information.

	> Once set, a loggers `.log_file_path` cannot be changed.
	> Once set, a logger's ".log_file_path" cannot be changed.

	## Capturing a programs arguments and options
	## Capturing a program's arguments and options


		## Other useful functions

		Two other useful functions are `get_file_hexdigest()` and `get_text_hexdigest()` compute md5sum for files or text. Those can be used to validate the state recorded in the log-file matches results at a later date, e.g. `output_file()` records the path and md5sum of an output file.

		return vn


		_REQ_NAME_RE = re.compile(r"^\s([A-Za-z0-9][A-Za-z0-9._-])")

Uh oh!

Conversation

GavinHuttley commented Jun 24, 2026 • edited by sourcery-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for enhanced log_versions and log_licenses metadata logging

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

codacy-production Bot commented Jun 24, 2026

Up to standards ✅

Uh oh!

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Jun 24, 2026

Choose a reason for hiding this comment

1. Custom PEP 508 / dependency parsing

2. Over‑generalised _log_metadata

3. log_summary responsibilities

Uh oh!

github-actions Bot commented Jun 24, 2026

Coverage Report for CI Build 28068855788

Coverage decreased (-91.5%) to 8.525%

Details

Uncovered Changes

Coverage Regressions

Coverage Stats

💛 - Coveralls

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

GavinHuttley commented Jun 24, 2026 •

edited by sourcery-ai Bot

Loading

sourcery-ai Bot commented Jun 24, 2026 •

edited

Loading

1. Custom PEP 508 / dependency parsing

2. Over‑generalised `_log_metadata`

3. `log_summary` responsibilities