修复 read 工具读取 PDF/图片失败并接入 OCR/视觉附件#5
Open
CharlieZiChen wants to merge 5 commits into
Open
Conversation
Collaborator
建议结论暂不建议合并。这个 PR 的修复方向是对的:PDF/图片不应再被当作 UTF-8 文本直接读取并抛错;但当前实现把 主要问题
建议修复方式
验证要求合并前建议补充单元测试覆盖:
当前验证结果我在临时 PR 副本中跑过 |
13049a8 to
bfa43f7
Compare
Author
改进目标本 PR 修复 workspace read 工具读取 PDF/图片时被当作 UTF-8 文本解析导致失败的问题,同时根据评审意见收敛 实现内容
主要改动文件
验证覆盖已补充并通过单元测试,覆盖评审要求:
已运行验证pnpm --filter @internshannon/sidecar test -- kernel-runtime-config.builder.spec.ts kernel-message-file-context.service.spec.ts workspace-ocr.service.spec.ts local-file.storage.spec.ts --runInBand
pnpm --filter @internshannon/sidecar build
pnpm --filter @internshannon/web desktop:build
结果均通过。 |
Author
|
本次更新基于最新 本次实现内容
当前如何处理 read 工具报错这次没有禁用 需要说明的是,聊天中的 SDK 原生
已验证
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR fixes the desktop sidecar read flow for PDF and image files, and adds multimodal image support for workspace file mentions.
Problems Fixed
readFile()failing on binary files such as PDF/PNG with UTF-8 decode errors like:stream did not contain valid UTF-8Failed to read file ...pdfFailed to read file ...png@a3s-lab/ocrsettings.@/path/to/image.pngworkspace file mentions, allowing vision-capable models to analyze non-text visual content.Files Changed
apps/sidecar/src/modules/kernel/infrastructure/workspace-storage/local-file.storage.tspdf-parse.apps/sidecar/src/modules/kernel/application/kernel-message-file-context.service.ts@/absolute/path.apps/sidecar/src/modules/kernel/application/kernel-message-run-intake.service.tsapps/sidecar/src/runtime/desktop/desktop-kernel-runtime.module.tsKernelMessageFileContextService.LocalFileStorageregistration to Nest-managed injection so OCR settings can be read.Verification
Ran targeted TypeScript checks for the modified storage and file-context logic.
Verified Markdown/text files still return original UTF-8 content.
Verified PNG reads no longer throw UTF-8 errors and return metadata such as type, size, and dimensions.
Verified PDF reads no longer throw UTF-8 errors and extract text from a 20-page PDF.
Verified invalid/binary files return a clear non-UTF-8/binary-file message instead of crashing.
Verified image file mentions generate multimodal attachments:
image/pngattachmentDimensions: 800x1066Verified OCR integration paths:
OCR text:outputBuilt the sidecar successfully with:
pnpm --filter @internshannon/sidecar buildStarted local sidecar and web dev server for manual testing:
127.0.0.1:29653127.0.0.1:5000Manually confirmed the improved read/image behavior is effective in the running app.