Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 11 additions & 3 deletions architecture/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,12 @@

The command-line surface and its contract with callers (shells, CI, pre-commit).

`main()` is a thin adapter over `fix_directory`: it parses args, guards
`is_dir` (exit 2), delegates the walk to `fix_directory`, renders one
`Fixing <path>` line per returned path, and exits `1`/`0`. It holds no
file-level logic. See [eof-normalization](eof-normalization.md) for how
`fix_directory` and `fix_file` do the actual work.

## Interface

- `eof-fixer <path>` — fix every non-ignored text file under `<path>` in place.
Expand All @@ -19,12 +25,14 @@ visited file.
## Output

For each file that needs (or would need) fixing, the tool writes `Fixing
<filename>\n` to stdout — the relative path as discovery yielded it. Output goes
through `sys.stdout.write`, not `print`.
<filename>\n` to stdout — the relative path as discovery yielded it. Output
goes through `sys.stdout.write`, not `print`. Output is batch-rendered:
all `Fixing` lines are written after `fix_directory` returns, not streamed
during the walk.

## Exit code

`main()` accumulates a result by OR-ing each file's outcome:
`main()` derives its result from the list returned by `fix_directory`:

- **0** — every file already ended correctly (nothing changed / nothing would
change).
Expand Down
29 changes: 22 additions & 7 deletions architecture/eof-normalization.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,24 @@
# EOF normalization

The core capability: given a single open file, decide whether its end-of-file
terminator is correct and, in fix mode, make it so. "Correct" means the file
ends with **exactly one** line terminator — no missing newline, no trailing
blank lines.
The core capability lives in `eof_fixer/fixer.py` and is exposed through two
public functions:

- **`fix_file(file_obj, *, check) -> bool`** — given a single open binary
file, decide whether its end-of-file terminator is correct and, in fix mode,
make it so. Returns `True` if the file was (or, under `check`, would be)
fixed.
- **`fix_directory(root, *, check=False, extra_excludes=()) -> list[pathlib.Path]`**
— drives `discovery.iter_text_files`, owns the default-skip policy
(`DEFAULT_EXCLUDES = (".cache", ".uv-cache")`), opens each file `rb` under
check else `rb+`, calls `fix_file`, and returns the relative paths
fixed (or would-be-fixed) in walk order.

`main()` no longer holds the file-level logic — it delegates entirely to
`fix_directory` and renders the result. See [cli](cli.md) for the adapter
layer.

"Correct" means the file ends with **exactly one** line terminator — no
missing newline, no trailing blank lines.

## Binary skip

Expand Down Expand Up @@ -50,6 +65,6 @@ In fix mode:

In check mode no bytes are written; the action is computed and reported only.

Either way, a file that needed changing contributes a non-zero result to the
caller — see [cli](cli.md) for how that becomes the process exit code and the
`Fixing <file>` line.
Either way, `fix_file` returns `True` if the file needed changing;
`fix_directory` collects those into the returned path list. See [cli](cli.md)
for how that list becomes the process exit code and the `Fixing <file>` lines.
104 changes: 104 additions & 0 deletions eof_fixer/fixer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
import os
import pathlib
from collections.abc import Iterator, Sequence
from typing import IO

from eof_fixer.discovery import iter_text_files


DEFAULT_EXCLUDES = (".cache", ".uv-cache")
_BINARY_SAMPLE_SIZE = 1024


def _is_binary(file_obj: IO[bytes]) -> bool:
current_pos = file_obj.tell()
file_obj.seek(0)
sample = file_obj.read(_BINARY_SAMPLE_SIZE)
file_obj.seek(current_pos)
return b"\x00" in sample


def _detect_trailing(file_obj: IO[bytes]) -> tuple[str, int]:
"""Inspect the end of `file_obj` and return the action needed.

Returns one of:
- ("none", 0) — file is empty, or already ends with exactly one terminator.
- ("append_lf", 0) — file lacks a trailing newline; an LF should be appended.
- ("truncate", offset) — file has excess trailing newlines; truncate to `offset`
(0 means truncate to empty).
"""
try:
file_obj.seek(-1, os.SEEK_END)
except OSError:
return ("none", 0)

last_character = file_obj.read(1)
if not last_character:
return ("none", 0)
if last_character not in {b"\n", b"\r"}:
return ("append_lf", 0)

while last_character in {b"\n", b"\r"}:
if file_obj.tell() == 1:
# All bytes are line terminators — truncate to empty.
return ("truncate", 0)
file_obj.seek(-2, os.SEEK_CUR)
last_character = file_obj.read(1)

position = file_obj.tell()
remaining = file_obj.read()
for sequence in (b"\n", b"\r\n", b"\r"):
if remaining == sequence:
return ("none", 0)
if remaining.startswith(sequence):
return ("truncate", position + len(sequence))

raise AssertionError("unreachable") # pragma: no cover # noqa: EM101


def fix_file(file_obj: IO[bytes], *, check: bool) -> bool:
"""Normalize one open binary file to end with exactly one terminator.

Returns True if the file was fixed (or, under `check`, would be fixed).
Binary files (null byte in the first sample) are skipped, returning False.
"""
if _is_binary(file_obj):
return False

action, offset = _detect_trailing(file_obj)
if action == "none":
return False

if not check:
if action == "append_lf":
# Needs this seek for windows, otherwise IOError
file_obj.seek(0, os.SEEK_END)
file_obj.write(b"\n")
else: # action == "truncate"
file_obj.seek(offset)
file_obj.truncate()

return True


def fix_directory(
root: pathlib.Path,
*,
check: bool = False,
extra_excludes: Sequence[str] = (),
) -> list[pathlib.Path]:
"""Fix every non-ignored text file under `root`; return the relative paths fixed.

`.git` and `DEFAULT_EXCLUDES` (.cache/.uv-cache) are always skipped;
`extra_excludes` adds more skip names on top. Under `check`, nothing is
written but the would-be-fixed paths are still returned.
"""
excludes = [*DEFAULT_EXCLUDES, *extra_excludes]
open_mode = "rb" if check else "rb+"
fixed: list[pathlib.Path] = []
paths: Iterator[pathlib.Path] = iter_text_files(root, excludes)
for relative_path in paths:
with (root / relative_path).open(open_mode) as file_obj:
if fix_file(file_obj, check=check):
fixed.append(relative_path)
return fixed
86 changes: 5 additions & 81 deletions eof_fixer/main.py
Original file line number Diff line number Diff line change
@@ -1,74 +1,8 @@
import argparse
import os
import pathlib
import sys
from typing import IO

from eof_fixer.discovery import iter_text_files


def _is_binary(file_obj: IO[bytes]) -> bool:
current_pos = file_obj.tell()
file_obj.seek(0)
sample = file_obj.read(1024)
file_obj.seek(current_pos)
return b"\x00" in sample


def _detect_trailing(file_obj: IO[bytes]) -> tuple[str, int]:
"""Inspect the end of `file_obj` and return the action needed.

Returns one of:
- ("none", 0) — file is empty, or already ends with exactly one terminator.
- ("append_lf", 0) — file lacks a trailing newline; an LF should be appended.
- ("truncate", offset) — file has excess trailing newlines; truncate to `offset`
(0 means truncate to empty).
"""
try:
file_obj.seek(-1, os.SEEK_END)
except OSError:
return ("none", 0)

last_character = file_obj.read(1)
if last_character not in {b"\n", b"\r"}:
return ("append_lf", 0)

while last_character in {b"\n", b"\r"}:
if file_obj.tell() == 1:
# All bytes are line terminators — truncate to empty.
return ("truncate", 0)
file_obj.seek(-2, os.SEEK_CUR)
last_character = file_obj.read(1)

position = file_obj.tell()
remaining = file_obj.read()
for sequence in (b"\n", b"\r\n", b"\r"):
if remaining == sequence:
return ("none", 0)
if remaining.startswith(sequence):
return ("truncate", position + len(sequence))

return ("none", 0) # pragma: no cover


def _fix_file(file_obj: IO[bytes], check: bool) -> int:
if _is_binary(file_obj):
return 0

action, offset = _detect_trailing(file_obj)
if action == "none":
return 0

if not check:
if action == "append_lf":
# Needs this seek for windows, otherwise IOError
file_obj.seek(0, os.SEEK_END)
file_obj.write(b"\n")
else: # action == "truncate"
file_obj.seek(offset)
file_obj.truncate()

return 1
from eof_fixer.fixer import fix_directory


def main() -> int:
Expand All @@ -85,20 +19,10 @@ def main() -> int:
args = parser.parse_args()

path: pathlib.Path = args.path
check: bool = args.check

if not path.is_dir():
parser.error(f"path is not a directory: {path}")

extra_excludes = [".cache", ".uv-cache", *(args.exclude or [])]
open_mode = "rb" if check else "rb+"

retv = 0
for relative_path in iter_text_files(path, extra_excludes):
with (path / relative_path).open(open_mode) as f:
ret_for_file = _fix_file(f, check=check)
if ret_for_file:
sys.stdout.write(f"Fixing {relative_path}\n")
retv |= ret_for_file

return retv
fixed = fix_directory(path, check=args.check, extra_excludes=args.exclude or [])
for relative_path in fixed:
sys.stdout.write(f"Fixing {relative_path}\n")
return 1 if fixed else 0
102 changes: 102 additions & 0 deletions planning/changes/2026-06-26.02-deepen-fixer/design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
---
summary: Fix behavior now lives behind eof_fixer/fixer.py (fix_file + fix_directory); main() is a thin adapter and tests target the interface instead of the argv/stdout/cwd harness.
---

# Design: Deepen the fix into `fixer.py`; `main()` becomes an adapter

## Summary

Today `main()` fuses three jobs — CLI parsing, the fix-loop, and `stdout`/exit
rendering — so the fix behavior has no interface of its own. Every one of the 16
tests reaches it through the process: swapping `sys.argv`, `sys.stdout`, and
`os.chdir` via a `_run_main_in` harness, even to assert pure byte-level facts
like "`\r\n\r\n` truncates to `\r\n`." This change extracts the EOF-fixing
capability into a deep module `eof_fixer/fixer.py` exposing two honest
interfaces, leaving `main()` a thin adapter. The interface becomes the test
surface.

## Goal

Move the test surface off the global-state harness onto a direct interface,
improving testability and locality. No change to *what* the tool does (which
files it fixes, the `Fixing` lines, the exit codes) — only *where the behavior
lives* and one non-semantic output-timing detail.

## Design

### New module `eof_fixer/fixer.py`

Owns the eof-normalization capability and the directory orchestration that
drives it:

```python
DEFAULT_EXCLUDES = (".cache", ".uv-cache") # local named constant
_BINARY_SAMPLE_SIZE = 1024 # local named constant

_is_binary(file_obj) -> bool # private internal (uses _BINARY_SAMPLE_SIZE)
_detect_trailing(file_obj) -> tuple[str, int] # private internal (unchanged)
fix_file(file_obj, *, check) -> bool # public: fix one open file; True if (would be) fixed
fix_directory(root, *, check=False, extra_excludes=()) -> list[pathlib.Path]
```

`fix_directory` builds `[*DEFAULT_EXCLUDES, *extra_excludes]`, calls
`discovery.iter_text_files`, opens each yielded file (`rb` under `check`, else
`rb+`), runs `fix_file`, and returns the relative paths it fixed (or, under
`check`, would fix) in walk order. The file-level helpers `_is_binary` /
`_detect_trailing` / the old `_fix_file` move here from `main.py`; `fix_file` is
the de-underscored, `bool`-returning form of `_fix_file`.

### `main.py` becomes a thin adapter

```python
parse args (path, --check, --exclude)
if not path.is_dir(): parser.error(...) # CLI-shaped, exit 2 — stays here
fixed = fix_directory(path, check=args.check, extra_excludes=args.exclude or [])
for p in fixed: sys.stdout.write(f"Fixing {p}\n")
return 1 if fixed else 0
```

`main()` no longer imports `os`/`IO`; it holds only argparse, the `is_dir`
guard, the render loop, and the exit code.

## Key decisions (from the design conversation)

- **Return shape:** `list[pathlib.Path]`, walk order — both the `Fixing` lines
and the exit code derive from it. No `FixReport` wrapper (YAGNI); promote
later if a counts consumer appears.
- **Silent module, batch output:** `fix_directory` touches no `stdout`; `main()`
renders after the walk. Output shifts streaming → batch — same lines, order,
and exit code; only timing differs. Acceptable for a fast CI/pre-commit gate.
- **Default-skip policy lives in `fixer.py`** (`DEFAULT_EXCLUDES`), so tests get
production defaults for free and `main()` carries no behavior policy.
`extra_excludes` means "user additions"; `.git` stays inside `discovery`.
- **Constants stay local**, no central settings module — see
`planning/decisions/2026-06-26-no-central-settings-module.md`.
- **Absorbs the open-mode "leak"** (review Candidate 3): the open and the fix
loop now live in one module, so no rule crosses a `main` ↔ `_fix_file` seam.
- The stringly-typed `_detect_trailing` seam (review Candidate 2) is **left
as-is** — out of scope for this change.

## Testing — three layers

- **Content** → `fix_file(BytesIO(...), check=...)`: CRLF/CR/BOM/null-byte/
perfect-unchanged, no filesystem.
- **Orchestration** → `fix_directory(tmp_path, ...)`: gitignore respected,
binary skipped, symlink, `--exclude`, check-vs-fix, cwd-independence — assert
on the returned list and on-disk bytes.
- **Adapter** → keep ~2 through `main()`: path-rejection (exit 2) and one test
that `main()` renders `Fixing` lines + the exit code from the list.
`_run_main_in` shrinks to serve only these.

## Architecture promotion

In the implementing PR: `architecture/eof-normalization.md` (capability now
behind `fixer.fix_file` / `fix_directory`) and `architecture/cli.md` (`main()`
is an adapter over `fix_directory`).

## Risk

- **Output timing change (streaming → batch):** non-semantic; noted in the
release notes when next shipped.
- **Test migration churn:** large but mechanical; the full suite must stay green
at each task. Coverage stays 100%.
Loading