Skip to content

📝 Add docstrings to feat/default-models#2

Open
coderabbitai[bot] wants to merge 1 commit into
mainfrom
coderabbitai/docstrings/c695478
Open

📝 Add docstrings to feat/default-models#2
coderabbitai[bot] wants to merge 1 commit into
mainfrom
coderabbitai/docstrings/c695478

Conversation

@coderabbitai

@coderabbitai coderabbitai Bot commented Mar 3, 2026

Copy link
Copy Markdown

Docstrings generation was requested by @Reallyeasy1.

The following files were modified:

  • build_combined.py
These file types are not supported
  • .gitignore
  • docs/dataset_audit.md
  • docs/experiment_plan.md
  • docs/research_summary.md
  • notebooks/01_data_preparation.ipynb
  • notebooks/02_tfidf_lr_baseline.ipynb
  • notebooks/03_naive_bayes_baseline.ipynb
  • notebooks/04_bert_classification.ipynb
  • notebooks/05_error_analysis.ipynb
  • requirements.txt
ℹ️ Note

CodeRabbit cannot perform edits on its own pull requests yet.

Summary by CodeRabbit

  • Chores
    • Added build tooling to generate a consolidated analysis notebook combining multiple analysis sections.

Docstrings generation was requested by @Reallyeasy1.

* #1 (comment)

The following files were modified:

* `build_combined.py`
@coderabbitai coderabbitai Bot requested a review from Reallyeasy1 March 3, 2026 08:41
@coderabbitai coderabbitai Bot mentioned this pull request Mar 3, 2026
@coderabbitai

coderabbitai Bot commented Mar 3, 2026

Copy link
Copy Markdown
Author

Important

Review skipped

This PR was authored by the user configured for CodeRabbit reviews. By default, CodeRabbit skips reviewing PRs authored by this user. It's recommended to use a dedicated user account to post CodeRabbit review feedback.

To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • ✅ Review completed - (🔄 Check again to review again)

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
build_combined.py (2)

8-10: Anchor project paths to the script file, not process CWD.

Using Path(".") makes notebook discovery/output location depend on where the script is launched from. That can break execution outside repo root.

💡 Suggested change
-ROOT     = Path(".")
+ROOT     = Path(__file__).resolve().parent
 NB_DIR   = ROOT / "notebooks"
 OUT_DIR  = ROOT / "colab_package"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@build_combined.py` around lines 8 - 10, The project paths currently use ROOT
= Path(".") which ties discovery to the current working directory; change ROOT
to be anchored to the script file location by setting ROOT to
Path(__file__).resolve().parent (or equivalent) so NB_DIR and OUT_DIR (the
NB_DIR and OUT_DIR variables) are computed relative to the script's directory
rather than the process CWD.

251-262: Remove the unused out_dir_val parameter from content_cells_nb.

out_dir_val is never used (Line 258), and all callsites pass a redundant third argument. Keeping it increases API noise.

💡 Suggested change
-def content_cells_nb(cells, out_dir_var, out_dir_val):
+def content_cells_nb(cells, out_dir_var):
@@
-        out_dir_val (str): Unused parameter retained for compatibility; it does not affect processing.
@@
-for c in content_cells_nb(nb02, "OUT_TFIDF", "OUT_TFIDF"):
+for c in content_cells_nb(nb02, "OUT_TFIDF"):
@@
-for c in content_cells_nb(nb03, "OUT_NB", "OUT_NB"):
+for c in content_cells_nb(nb03, "OUT_NB"):
@@
-for c in content_cells_nb(nb04, "BERT_OUT", "BERT_OUT"):
+for c in content_cells_nb(nb04, "BERT_OUT"):
@@
-for c in content_cells_nb(nb05, "REPORTS_DIR", "REPORTS_DIR"):
+for c in content_cells_nb(nb05, "REPORTS_DIR"):

Also applies to: 302-326

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@build_combined.py` around lines 251 - 262, The function content_cells_nb
currently accepts an unused parameter out_dir_val; remove out_dir_val from the
function signature and docstring and update any callers that pass three
arguments to call content_cells_nb with only (cells, out_dir_var); ensure the
substitution logic still uses out_dir_var only and adjust any related function
definitions or tests that expect the old 3-arg signature (also apply the same
removal for the other similar function mentioned around the later block).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@build_combined.py`:
- Around line 343-349: The current syntax-check printing block (the branch that
iterates over errors using the errors variable and prints "syntax error(s)" and
the corresponding lines) must abort execution after reporting instead of
continuing to write the combined notebook; update that block to raise an
exception (e.g. raise RuntimeError or SystemExit with a clear message)
immediately after printing the errors so the process fails fast and no combined
artifact is written; apply the same change to the other identical
syntax-reporting block (the second print branch that currently prints "All ...
cells: syntax OK" or the errors) so both places use the same fail-fast behavior.
- Around line 371-375: The three print statements that use f-strings but have no
interpolations should be changed to plain strings to satisfy Ruff F541: update
the prints that output "\ncolab_package/ contents:", "\nAlso copy into
colab_package/:", and "  sarcasm_pairs_step35_clean.jsonl  (from
data/processed/)" by removing the leading f prefix so the calls in
build_combined.py simply call print("<string>") instead of print(f"<string>");
keep the existing string contents and formatting unchanged.

---

Nitpick comments:
In `@build_combined.py`:
- Around line 8-10: The project paths currently use ROOT = Path(".") which ties
discovery to the current working directory; change ROOT to be anchored to the
script file location by setting ROOT to Path(__file__).resolve().parent (or
equivalent) so NB_DIR and OUT_DIR (the NB_DIR and OUT_DIR variables) are
computed relative to the script's directory rather than the process CWD.
- Around line 251-262: The function content_cells_nb currently accepts an unused
parameter out_dir_val; remove out_dir_val from the function signature and
docstring and update any callers that pass three arguments to call
content_cells_nb with only (cells, out_dir_var); ensure the substitution logic
still uses out_dir_var only and adjust any related function definitions or tests
that expect the old 3-arg signature (also apply the same removal for the other
similar function mentioned around the later block).

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c695478 and b0fa464.

📒 Files selected for processing (1)
  • build_combined.py

Comment thread build_combined.py
Comment on lines +343 to +349
if errors:
print(f"\n{len(errors)} syntax error(s):")
for i, ln, msg, line in errors:
print(f" Cell {i:3d} line {ln}: {line!r}")
print(f" {msg}")
else:
print(f"All {len(all_cells)} cells: syntax OK")

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fail fast when syntax validation finds errors.

Right now, syntax errors are printed but execution still writes the combined notebook. That can ship a broken artifact. After reporting errors, raise and stop.

🐛 Suggested fix
 if errors:
     print(f"\n{len(errors)} syntax error(s):")
     for i, ln, msg, line in errors:
         print(f"  Cell {i:3d} line {ln}: {line!r}")
         print(f"           {msg}")
+    raise SyntaxError(f"Aborting write: {len(errors)} syntax error(s) detected in combined notebook cells.")
 else:
     print(f"All {len(all_cells)} cells: syntax OK")

Also applies to: 352-369

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@build_combined.py` around lines 343 - 349, The current syntax-check printing
block (the branch that iterates over errors using the errors variable and prints
"syntax error(s)" and the corresponding lines) must abort execution after
reporting instead of continuing to write the combined notebook; update that
block to raise an exception (e.g. raise RuntimeError or SystemExit with a clear
message) immediately after printing the errors so the process fails fast and no
combined artifact is written; apply the same change to the other identical
syntax-reporting block (the second print branch that currently prints "All ...
cells: syntax OK" or the errors) so both places use the same fail-fast behavior.

Comment thread build_combined.py
Comment on lines +371 to +375
print(f"\ncolab_package/ contents:")
for p in sorted(OUT_DIR.iterdir()):
print(f" {p.name}")
print(f"\nAlso copy into colab_package/:")
print(f" sarcasm_pairs_step35_clean.jsonl (from data/processed/)")

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, check if build_combined.py exists and verify the specific lines
if [ -f "build_combined.py" ]; then
  echo "=== Lines 371-375 from build_combined.py ==="
  sed -n '371,375p' build_combined.py
else
  echo "File not found. Searching for it..."
  find . -name "build_combined.py" -type f
fi

Repository: SeeYangZhi/Project-LLMao

Length of output: 322


🏁 Script executed:

# Run the verification script from the review comment
python - <<'PY'
from pathlib import Path
p = Path("build_combined.py")
if p.exists():
    for i, line in enumerate(p.read_text(encoding="utf-8").splitlines(), 1):
        s = line.strip()
        if s.startswith("print(f\"") and "{" not in s:
            print(f"Line {i}: {s}")
else:
    print(f"File {p} not found")
PY

Repository: SeeYangZhi/Project-LLMao

Length of output: 246


Fix Ruff F541: remove f prefix from non-interpolated strings.

Lines 371, 374, and 375 are plain strings and should not be f-strings.

💡 Suggested change
-print(f"\ncolab_package/ contents:")
+print("\ncolab_package/ contents:")
@@
-print(f"\nAlso copy into colab_package/:")
-print(f"  sarcasm_pairs_step35_clean.jsonl  (from data/processed/)")
+print("\nAlso copy into colab_package/:")
+print("  sarcasm_pairs_step35_clean.jsonl  (from data/processed/)")
🧰 Tools
🪛 Ruff (0.15.2)

[error] 371-371: f-string without any placeholders

Remove extraneous f prefix

(F541)


[error] 374-374: f-string without any placeholders

Remove extraneous f prefix

(F541)


[error] 375-375: f-string without any placeholders

Remove extraneous f prefix

(F541)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@build_combined.py` around lines 371 - 375, The three print statements that
use f-strings but have no interpolations should be changed to plain strings to
satisfy Ruff F541: update the prints that output "\ncolab_package/ contents:",
"\nAlso copy into colab_package/:", and "  sarcasm_pairs_step35_clean.jsonl 
(from data/processed/)" by removing the leading f prefix so the calls in
build_combined.py simply call print("<string>") instead of print(f"<string>");
keep the existing string contents and formatting unchanged.

@SeeYangZhi SeeYangZhi force-pushed the main branch 4 times, most recently from a51c032 to 1ac8cca Compare March 4, 2026 02:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant