Skip to content

Fix/summary typeerror 476#503

Open
suhaniiz wants to merge 3 commits into
param20h:devfrom
suhaniiz:fix/summary-typeerror-476
Open

Fix/summary typeerror 476#503
suhaniiz wants to merge 3 commits into
param20h:devfrom
suhaniiz:fix/summary-typeerror-476

Conversation

@suhaniiz
Copy link
Copy Markdown

@suhaniiz suhaniiz commented Jun 7, 2026

📋 PR Checklist

Thank you for contributing to PDF-Assistant-RAG! 🎉
Please fill out this template before submitting. PRs without it filled in will be closed.


🔗 Related Issue

Closes #476


📝 What does this PR do?

Defensively validates the data type extracted from document chunks in generate_document_summary.

Previously, if a chunk was malformed and contained a non-string type (like an integer or boolean) that evaluated to True, it would bypass the if text: check and get appended to chunk_texts. This caused a TypeError: sequence item: expected str instance, NoneType/int/bool found when " ".join(chunk_texts) was executed, crashing the summarization process.

This PR replaces the implicit truthy check with an explicit isinstance(text, str) validation and uses text.strip() to ensure blank whitespace chunks are also safely skipped.


🗂️ Type of Change

  • 🐛 Bug fix
  • ✨ New feature
  • 🔧 Refactor / code cleanup
  • 📝 Documentation update
  • 🎨 UI / styling change
  • ⚙️ CI / tooling / config change
  • 🧪 Tests

🧪 How was this tested?

  • Tested the affected API endpoints manually by mocking a document input with malformed/missing text keys to verify that the function bypasses them gracefully without throwing a TypeError.

📸 Screenshots (if UI change)

N/A


⚠️ Anything to flag for reviewers?

The fix was kept tightly scoped to just the loop block inside generate_document_summary to maintain a clean git diff and prevent any unnecessary file formatting churn.


✅ Self-Review Checklist

  • My branch is based on dev, not main
  • I have not added any secrets / API keys
  • I have not modified main branch or any HuggingFace deployment config
  • My code follows the existing style (no unnecessary formatting changes)
  • I have updated relevant docs / comments if needed

@suhaniiz suhaniiz requested a review from param20h as a code owner June 7, 2026 07:36
@suhaniiz
Copy link
Copy Markdown
Author

suhaniiz commented Jun 7, 2026

@param20h , this pr is under GSSoC 2026!
kindly review it and lemme know if any changes are to be made!

@suhaniiz
Copy link
Copy Markdown
Author

suhaniiz commented Jun 7, 2026

Hi @param20h ,

It looks like the Playwright E2E tests workflow failed during the authentication flow (auth-and-chat.spec.ts).

I wanted to confirm that this failure is unrelated to the changes in this branch. This PR is strictly scoped to adding defensive isinstance(text, str) data-type validation inside the backend's generate_document_summary function. Because it doesn't touch any frontend routing, login selectors, or authentication flows, these changes are completely safe and will not cause any harm to the codebase.

The failure appears to be due to an intermittent CI environment timeout or an existing issue on the base branch. Since I don't have write permissions to trigger a workflow re-run on this repository, feel free to either restart the failed job or proceed with merging the changes directly.

Thank you!

@suhaniiz
Copy link
Copy Markdown
Author

suhaniiz commented Jun 7, 2026

this pr is under GSSoC 2026, so you can freely merge it and add the relevant labels!
@param20h

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Fix potential TypeError in generate_document_summary when chunk text is None

2 participants