Skip to content

为Windows用户增加内置工具,根据用户需要进行识别剪切板图片;增加思考模式的显示方便过程透明化#277

Open
Mcy0618 wants to merge 18 commits into
HKUDS:mainfrom
Mcy0618:main
Open

为Windows用户增加内置工具,根据用户需要进行识别剪切板图片;增加思考模式的显示方便过程透明化#277
Mcy0618 wants to merge 18 commits into
HKUDS:mainfrom
Mcy0618:main

Conversation

@Mcy0618
Copy link
Copy Markdown
Contributor

@Mcy0618 Mcy0618 commented May 24, 2026

  • 在 Windows 系统上读取系统剪贴板中的图像(Pillow/PowerShell),
    macOS(PIL/osascript)和 Linux(PIL/xclip/wl-paste)
  • 三种输出模式:base64、文件、文本(通过视觉模型自动描述)
  • 设置回退:当元数据未注入时,从设置中加载视觉配置
  • 23 个单元测试,涵盖所有平台和回退层级
  • 为 run_task_worker 添加缺失的 show_thinking 参数

Mcy0618 and others added 18 commits May 17, 2026 15:47
- 在 build_runtime_system_prompt() 中新增 PLAN 模式感知段落
- 在 refresh_runtime_client() 中重建 system prompt
- 解决LLM在PLAN模式下不知道自身状态、反复尝试调用被拒工具的问题
- Read images from system clipboard on Windows (Pillow/PowerShell),
  macOS (PIL/osascript), and Linux (PIL/xclip/wl-paste)
- Three output modes: base64, file, text (auto-describe via vision model)
- Settings fallback: when metadata not injected, load vision config from settings
- 23 unit tests covering all platforms and fallback tiers
- pyproject.toml: add Pillow>=10.0.0 to [project.optional-dependencies] dev
  so 'uv sync --extra dev' installs it and CI has the expected dependency
- tests: _fake_png_bytes() now catches ImportError and calls pytest.skip()
  instead of letting ImportError propagate as a hard test failure
- ensures test file can be imported in CI environments without Pillow
- remove unused import subprocess from test file (ruff F401)
- drop unused 	ool variable in test_input_model_is_pydantic (ruff F841)
- fix test_powershell_image_found: explicitly set mock stdout to 'OK' string,
  add mock for os.close to avoid OSError on fd=999 in Linux CI
- all 23 clipboard_screenshot tests pass locally
- MagicMock(stdout='OK') fails because .strip() returns a MagicMock on CI
  instead of a plain string, causing the 'OK' comparison to fail
- Replace with subprocess.CompletedProcess which has real str attributes
- move subprocess import into the test function to keep ruff F401 clean
…lict in runtime.py refresh_runtime_client()
Copilot AI review requested due to automatic review settings May 24, 2026 10:44
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a cross-platform clipboard screenshot tool and introduces optional “thinking” streaming support end-to-end (API client → engine events → backends/frontends), plus small prompt updates.

Changes:

  • Added clipboard_screenshot tool with platform-specific clipboard readers and unit tests.
  • Added show_thinking setting/CLI flag and new thinking delta event plumbing across engine, UI backends, and React terminal frontend.
  • Added PLAN-mode section to the runtime system prompt.

Reviewed changes

Copilot reviewed 27 out of 27 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
tests/test_tools/test_clipboard_screenshot_tool.py Adds unit tests covering clipboard screenshot tool behaviors and platform helpers.
src/openharness/ui/textual_app.py Renders AssistantThinkingDelta events in the Textual UI.
src/openharness/ui/runtime.py Threads show_thinking CLI override into settings merge.
src/openharness/ui/protocol.py Extends transcript roles and backend event types to include thinking.
src/openharness/ui/output.py Adds console rendering for AssistantThinkingDelta with separation from normal text output.
src/openharness/ui/backend_host.py Emits thinking_delta backend events for the React frontend.
src/openharness/ui/app.py Adds show_thinking parameters to entrypoints and streams thinking deltas in print/worker modes.
src/openharness/tools/clipboard_screenshot_tool.py Implements the new clipboard screenshot tool (Windows/macOS/Linux + vision description mode).
src/openharness/tools/init.py Registers ClipboardScreenshotTool in the default tool registry.
src/openharness/prompts/context.py Adds explicit PLAN mode guidance to the system prompt builder.
src/openharness/engine/stream_events.py Introduces AssistantThinkingDelta event type and adds it to StreamEvent.
src/openharness/engine/query_engine.py Propagates settings.show_thinking into query runs.
src/openharness/engine/query.py Adds show_thinking to QueryContext and maps API thinking events to engine events.
src/openharness/config/settings.py Adds show_thinking setting and OPENHARNESS_SHOW_THINKING env override.
src/openharness/commands/registry.py Adds /thinking slash command to toggle thinking display.
src/openharness/cli.py Adds --show-thinking CLI flag and passes override into runtime entrypoints.
src/openharness/channels/adapter.py Ignores thinking deltas in channel replies.
src/openharness/autopilot/service.py Ignores thinking deltas when collecting assistant output.
src/openharness/api/openai_client.py Adds thinking delta streaming and converts <think>...</think> blocks into thinking events when enabled.
src/openharness/api/copilot_client.py Forwards show_thinking into the inner API request.
src/openharness/api/client.py Adds show_thinking to ApiMessageRequest and introduces ApiThinkingDeltaEvent.
pyproject.toml Adds Pillow to dev dependencies for test/support tooling.
frontend/terminal/src/types.ts Extends TranscriptItem.role to include thinking.
frontend/terminal/src/hooks/useBackendSession.ts Buffers thinking deltas and flushes them into transcript items before assistant output.
frontend/terminal/src/components/TranscriptPane.tsx Adds label/color handling for thinking transcript entries.
TODO.md Adds/updates implementation checklist items for plan mode and clipboard screenshot tool.
.catpaw/rules/python-launcher.md Adds a repo rule document about using py on Windows.
Comments suppressed due to low confidence (2)

src/openharness/ui/textual_app.py:1

  • Thinking deltas and assistant text deltas both append into self._assistant_buffer, so once a real assistant response starts, it will include prior thinking text (and vice-versa). Use a dedicated buffer for thinking (e.g., self._thinking_buffer) and/or reset the appropriate buffer when transitioning from thinking → assistant text.
"""Default Textual terminal UI for OpenHarness."""

src/openharness/ui/app.py:1

  • run_repl now accepts show_thinking but (per the diff) it is not threaded into either the backend_only path (run_backend_host(...)) or the React TUI launcher path, so --show-thinking likely has no effect in normal interactive mode. Pass show_thinking through to the runtime build/host config (and add it to run_backend_host / launch_react_tui inputs as needed) so the setting is honored consistently.
"""Interactive session entry points."""

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +84 to +86
def is_read_only(self, arguments: ClipboardScreenshotToolInput) -> bool:
del arguments
return True
Comment on lines +237 to +247
script = (
f"Add-Type -AssemblyName System.Windows.Forms;"
f"Add-Type -AssemblyName System.Drawing;"
f"$img = [System.Windows.Forms.Clipboard]::GetImage();"
f'if ($img -ne $null) {{'
f' $img.Save("{tmp_path}", [System.Drawing.Imaging.ImageFormat]::Png);'
f' Write-Output "OK"'
f"}} else {{"
f' Write-Output "NO_IMAGE"'
f"}}"
)
Comment on lines 275 to +279
async def _render_event(event: StreamEvent) -> None:
if isinstance(event, AssistantThinkingDelta):
print(f"[DEBUG] Sending thinking_delta: {event.text[:50]}...", file=sys.stderr)
await self._emit(BackendEvent(type="thinking_delta", message=event.text))
return
Comment thread src/openharness/cli.py
permission_mode=permission_mode,
max_turns=max_turns,
effort=effort,
show_thinking=show_thinking or None,
Comment thread src/openharness/ui/app.py
Comment on lines +260 to +268
elif isinstance(event, AssistantThinkingDelta):
collected_text += event.text
if output_format == "text":
sys.stderr.write(event.text)
sys.stderr.flush()
elif output_format == "stream-json":
obj = {"type": "thinking_delta", "text": event.text}
print(json.dumps(obj), flush=True)
events_list.append(obj)
finish_reason: str | None = None
usage_data: dict[str, int] = {}
# Buffer to strip inline <think>…</think> blocks across streaming chunks.
# Buffer to strip inline blocks across streaming chunks.
Comment on lines +467 to +468
# Matches complete blocks (DOTALL so newlines are included).
_THINK_RE = re.compile(r"<think>(.*?)</think>", re.DOTALL)
Comment on lines 473 to +476
def _strip_think_blocks(buf: str) -> tuple[str, str]:
"""Strip complete ``<think>…</think>`` blocks and return ``(visible_text, leftover)``.
"""Strip complete ``...`` blocks and return ``(visible_text, leftover)``.

Complete pairs are removed via regex. An unclosed ``<think>`` is held in
Complete pairs are removed via regex. An unclosed ```` is held in
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants