Add example agents and integration test workflow by kovtcharov · Pull Request #340 · amd/gaia

kovtcharov · 2026-02-12T01:33:03Z

Summary

Add 3 new example agents showcasing GAIA capabilities
Add real integration tests that execute agents and validate responses
Add CI/CD workflow running on Strix with Lemonade server
Update docs homepage with professional, technical messaging

New Examples

weather_agent.py - Real-time weather via MCP server integration
rag_doc_agent.py - Document Q&A using RAG for private data
product_mockup_agent.py - HTML landing page generator for rapid prototyping

All examples use Qwen3-4B-GGUF for faster inference.

Testing

tests/integration/test_example_agents.py - Real execution tests with response validation
.github/workflows/test_examples.yml - CI/CD on Strix runner with Lemonade server

Test coverage: 10/10 examples (100%)

Tests that actually run:

NotesAgent: Creates notes, validates database operations
ProductMockupAgent: Generates HTML, validates file creation
FileWatcherAgent: Watches directories, validates event handling
Structure tests for MCP-based agents (require external servers)

Documentation Updates

Updated docs homepage (docs/index.mdx)
Replaced marketing slogan with technical value prop: "Agent SDK for AMD Ryzen AI"
Added MCP to list of key capabilities
Added Computer Use Agents (CUA) as use case
More professional, technical tone

CI/CD Workflow

Runs on self-hosted Strix runner (stx label)
Starts Lemonade server with Qwen3-4B-GGUF
Executes agents and validates responses
5-minute timeout per test
Skips copyright header validation (allows external contributions)

All examples are verified, copy-paste ready, and validated in CI/CD pipeline.

- Add weather_agent.py for MCP weather integration - Add rag_doc_agent.py for document Q&A with RAG - Add product_mockup_agent.py for HTML landing page generation - Add integration tests for all example agents - Add CI/CD workflow to validate examples on every PR

- Add tests for mcp_time_server_agent.py - Add tests for mcp_windows_system_health_agent.py - Add tests for sd_agent_example.py - Coverage now 10/10 (100%) of example files

External contributors can submit examples without AMD copyright

- Use Qwen3-4B-GGUF model for faster inference in examples - Update workflow to run on self-hosted Strix runner with Lemonade server - Convert integration tests to actually execute agents and validate responses - Add Lemonade server startup to CI/CD workflow

- Replace marketing slogan with technical value prop - Update headline to 'Agent SDK for AMD Ryzen AI' - Add MCP to list of key capabilities - Add Computer Use Agents (CUA) to use cases - More professional tone per leadership feedback

# Conflicts: # docs/index.mdx

- Convert test_examples workflow steps to PowerShell for Windows CI - Make Docker check non-blocking in MCP server tests - Use python -m pytest instead of bare pytest - Fix MCPClientMixin import path in weather_agent example - Improve file watcher test with retry loop to reduce flakiness - Clean up test formatting (black/isort compliance)

The previous test_examples.yml failed on the stx runner because it ran the Linux-only ./.github/actions/free-disk-space action (requires bash/df) and used `shell: pwsh` — neither of which exist on the self-hosted Windows runner. The workflow is now split: * test-examples-unit runs on ubuntu-latest, validates syntax for every example, and runs the structure/import tests from tests/integration/test_example_agents.py. Tests decorated with @requires_lemonade auto-skip here. * test-examples-integration runs on stx using ./.github/actions/setup-venv and ./.github/actions/install-lemonade, starts Lemonade via the shared start-lemonade.ps1 helper with Qwen3-4B-Instruct-2507-GGUF, and executes the full pytest suite including the LLM-backed tests. Example agent modernization for current SDK: * rag_doc_agent.py no longer pulls in the ChatAgent-specific RAGToolsMixin (whose tools require many ChatAgent-only attributes). It now registers a single `query_documents` tool bound to RAGSDK.query() and allow-lists the index directory so the SDK's path validator lets it read local files. * weather_agent.py follows the idiomatic Agent+MCPClientMixin pattern used by the builder template: _mcp_manager is wired up before super().__init__() and MCP tool registration happens in _register_tools, so a fresh client manager is available when Agent.__init__ composes the system prompt. * product_mockup_agent.py is unchanged apart from an automated black reformat of a long line. All structure/import tests pass locally (7 passed, 3 skipped when Lemonade is not running) and black/isort report clean.

…leanup The stx integration run failed because TestNotesAgent instantiated NotesAgent with the default model_id, which resolves to Qwen3.5-35B-A3B-GGUF — a model we intentionally do not pull on the runner. Lemonade returned HTTP 422 ("model_name=... not registered") and the test then tripped a Windows PermissionError (WinError 32) when pytest tried to delete the tempdir while the SQLite connection was still open. Changes: * Introduce TEST_MODEL_ID (env-override via GAIA_TEST_MODEL, default Qwen3-4B-Instruct-2507-GGUF) and thread it through every LLM-backed test: NotesAgent, ProductMockupAgent, FileWatcherAgent. This matches the model our workflow pulls via start-lemonade.ps1. * Wrap the NotesAgent and FileWatcherAgent assertions in try/finally so close_db() / stop_all_watchers() runs before TemporaryDirectory tries to remove the directory, preventing the Windows file-lock error. * Switch weather_agent.py to the free open-meteo-mcp server (no API key, vs the PyPI mcp-server-weather package which requires --api_key).

ProductMockupAgent and DocAgent passed model_id as an explicit kwarg to super().__init__(**kwargs), which crashed with a duplicate-keyword TypeError when callers (like the integration tests) also passed model_id=... themselves. Switch both to kwargs.setdefault("model_id", ...) so callers can override without colliding. Also updated weather_agent.py's connection-failure hint to reference open-meteo-mcp instead of the stale mcp-server-weather package name.

Users running rag_doc_agent.py without ''[rag]'' extras installed hit an obscure ImportError from RAGSDK ("Missing required RAG dependencies: pypdf, sentence-transformers, faiss-cpu"). Add the install hint to the docstring so the fix is discoverable.

The ''_split_text_with_llm'' chunking helper and ''_extract_text_from_json'' both re-imported ''json'' locally even though it is already imported at module scope (line 12). This triggered pylint W0404 (''Reimport'') which is what caused the 'Run Code Quality Checks' workflow to fail with a non-zero exit code on every PR.

Per Tomasz's review on PR #340, the original wording over-promised that GAIA has "no cloud dependency" at all. The accurate statement is that the core runtime needs no cloud (so sensitive data stays on-device), but individual agents can still opt into external services when a use case requires it — the weather_agent.py example in this same PR is exactly such a case. Changes: * Hero copy now qualifies the cloud-free claim and explicitly lists opt-in services (weather APIs, Jira, MCP servers). * Rename the "No Cloud Dependency" card to "Cloud-Optional" with matching clarification. Tomasz's comment thread on docs/index.mdx: #340 (comment)

Two related bugs surfaced during end-to-end testing of all 3 new example agents with a local Lemonade server: 1. The examples defaulted to ''Qwen3-4B-GGUF'', which is the base (non-instruct) model. With it, the LLM hallucinates tool usage instead of actually invoking the registered tools — DocAgent confidently answered "60% Product A / 40% Product B" for a document that explicitly said "70% product, 20% engineering, 10% marketing". The instruct-tuned variant ''Qwen3-4B-Instruct-2507-GGUF'' (already used by the CI workflow and test_rag.yml) follows tool-use instructions correctly and returns the grounded answer. 2. DocAgent passed ''model_id'' to Agent but the inner RAGSDK was constructed with an empty RAGConfig, silently falling back to the framework default ''Qwen3.5-35B-A3B-GGUF'' for its answer-synthesis call. On a runner that does not have the 35B model pulled, this raises HTTP 422 "not registered with Lemonade Server". Plumb the agent's resolved ''model_id'' into RAGConfig.model so both code paths hit the same loaded model. Verified end-to-end against local Lemonade + Qwen3-4B-Instruct-2507-GGUF: ProductMockupAgent → generated testapp.html with Tailwind + all 3 features WeatherAgent → connected to open-meteo-mcp, answered real Tokyo weather DocAgent → returned "70% to product, 20% to engineering, 10% to marketing"

@kovtcharov-amd

# GAIA v0.17.3 Release Notes GAIA v0.17.3 is an extensibility and resilience release. You can now package your own agents into a custom GAIA installer and seed them on first launch, point GAIA at alternative OpenAI-compatible inference servers from the C++ library (Ollama, for example), and start from three new reference agents (weather, RAG Q&A, HTML mockup) that execute against real Lemonade hardware in CI. It also hardens the RAG cache against an insecure-deserialization class of bug (CWE-502) — all users should upgrade. **Why upgrade:** - **Ship your own GAIA** — Export and import agents between machines, follow a new guide to produce a custom installer that seeds your agents on first launch, and on Windows install everything in one step because the installer now includes the Lemonade Server MSI. - **Work with alternative inference backends** — The C++ library now preserves OpenAI-compatible `/v1` base URLs instead of rewriting them to `/api/v1`, so servers that expose the standard `/v1` path (Ollama, for example) work out of the box. - **Start from a working example** — Three new reference agents (weather via MCP, RAG document Q&A, HTML landing-page generator) with integration tests that actually execute against Lemonade on a Strix CI runner. - **Safer RAG cache** — Replaces `pickle` deserialization with JSON + HMAC-SHA256 (CWE-502). Unsigned or tampered caches are rejected and transparently rebuilt on the next query. - **Better document handling** — Encrypted or corrupted PDFs now produce distinct, actionable errors (`EncryptedPDFError`, `CorruptedPDFError`) instead of generic failures, and the RAG index is hardened for concurrent queries. --- ## What's New ### Custom Installers and Agent Portability You can now package a custom GAIA installer that ships with your own agents pre-loaded, and move agents between machines with export/import (PR #795). On Windows, the official installer now includes the Lemonade Server MSI and runs it during install, so a fresh machine has the complete local-LLM stack after a single download (PR #781). **What you can do:** - Export an agent from `~/.gaia/agents/` to a portable bundle with `gaia agents export` and import it on another machine with `gaia agents import` - Follow the new custom-installer playbook at [`docs/playbooks/custom-installer/index.mdx`](/playbooks/custom-installer) to distribute GAIA with your agents pre-loaded — useful for workshops, team deployments, and internal tooling - On Windows, the installer now includes Lemonade Server — no separate download for a complete first-run experience **Under the hood:** - `gaia agents export` / `gaia agents import` CLI commands round-trip agents between machines as portable bundles - First-launch agent seeder (`src/gaia/apps/webui/services/agent-seeder.cjs`) copies `<resourcesPath>/agents/<id>/` into `~/.gaia/agents/<id>/` the first time the app starts - Windows NSIS installer embeds `lemonade-server-minimal.msi` into `$PLUGINSDIR` and runs it via `msiexec /i ... /qn /norestart` during install (auto-cleaned on exit) --- ### Broader Backend Compatibility in the C++ Library The C++ library now preserves OpenAI-compatible `/v1` base URLs (PR #773) instead of rewriting them to `/api/v1`. That means inference servers that expose the standard OpenAI `/v1` path — for example, Ollama at `http://localhost:11434/v1` — work out of the box without needing a special adapter. --- ### Reference Agents and Real-Hardware Integration Tests Three new example agents and a Strix-runner CI workflow land together (PR #340). **What you can do:** - Copy `examples/weather_agent.py`, `examples/rag_doc_agent.py`, or `examples/product_mockup_agent.py` as a starting point for your own agents - Run the new integration tests locally against Lemonade to validate agents end-to-end, not just structurally **Under the hood:** - `tests/integration/test_example_agents.py` executes agents and validates responses with a 5-minute-per-test timeout - `.github/workflows/test_examples.yml` runs on the self-hosted Strix runner (`stx` label) with Lemonade serving `Qwen3-4B-Instruct-2507-GGUF` - Docs homepage refreshed with a technical value prop ("Agent SDK for AMD Ryzen AI") and MCP / CUA added to the capabilities list --- ### Smarter PDF Handling in RAG Encrypted and corrupted PDFs now surface as distinct, actionable errors (`EncryptedPDFError`, `CorruptedPDFError`, `EmptyPDFError`) instead of generic failures or silent 0-chunk indexes (PR #784, closes #451). Encrypted PDFs are detected before extraction; corrupted PDFs are caught during extraction with a clear message. Combined with the indexing-failure surfacing in PR #723, you get a visible indexing-failed status the moment a document fails — and the RAG index itself is now thread-safe under concurrent queries (PR #746). --- ## Security ### RAG Cache Deserialization Replaced with JSON + HMAC Fixes an insecure-deserialization issue in the RAG cache (CWE-502, PR #768). Previously, cached document indexes were serialized with Python `pickle`; if an attacker could write to `~/.gaia/` — via a shared drive, a sync conflict, or a malicious extension — loading that cache could execute arbitrary code. v0.17.3 replaces `pickle` with signed JSON: caches are now serialized as JSON and authenticated with HMAC-SHA256 using a per-install key stored at `~/.gaia/cache/hmac.key`. Unsigned or tampered caches are rejected and transparently rebuilt on the next query. Old `.pkl` caches from previous GAIA versions are ignored and re-indexed the next time you query a document. **You should upgrade if you** share `~/.gaia/` across machines (Dropbox, iCloud, network home directories), run GAIA in a multi-user environment, or have ever imported RAG caches from another source. --- ## Bug Fixes - **Ask Agent attaches files before sending to chat** (PR #725) — Dropped files are indexed into RAG and attached to the active session before the prompt is consumed, so the model sees the document on the first turn instead of the second. - **Document indexing failures are surfaced** (PR #723) — A document that produces 0 chunks now raises `RuntimeError` in the SDK and surfaces as `indexing_status: failed` in the UI, instead of looking like a silent success. Covers RAG SDK, background indexing, and re-index paths. - **Encrypted or corrupted PDFs produce actionable errors** (PR #784, closes #451) — RAG now raises distinct `EncryptedPDFError` and `CorruptedPDFError` exceptions instead of generic failures, so you see exactly what went wrong. - **RAG index thread safety hardened** (PR #746) — Adds `RLock` protection around index mutation paths and rebuilds chunk/index state atomically before publishing it, so concurrent queries read consistent snapshots and failed rebuilds no longer leak partial state. - **MCP JSON-RPC handler guards against non-dict bodies** (PR #803) — A malformed JSON-RPC payload (array, string, null) now returns HTTP 400 `Invalid Request: expected JSON object` instead of an HTTP 500 from a `TypeError`. - **File-search count aligned with accessible results** (PR #754) — The returned count now matches the number of files the tool actually surfaces, instead of a pre-filter total that over-reported results the caller could not access. - **Tracked block cursor replaces misplaced decorative cursor** (PR #727) — Fixes the mis-positioned blinking cursor in the chat input box, which now tracks the actual caret position via a mirror-div technique. - **Ad-hoc sign the macOS app bundle instead of skipping code signing** (PR #765) — The `.app` bundle inside the DMG now carries an ad-hoc signature, so Gatekeeper presents a single "Open Anyway" bypass in System Settings instead of the unrecoverable "is damaged" error. Full Apple Developer ID signing is still being finalized. --- ## Release & CI - **Publish workflow: single approval gate, no legacy Electron apps** (PR #758) — Removed the legacy jira and example standalone Electron apps from the publish pipeline; a single `publish` environment gate governs PyPI, npm, and installer publishing. - **Claude CI modernization** (PR #797, PR #799, PR #783) — Migrated all four `claude-code-action` call sites to `v1.0.99` (pinned by SHA, fixes an issue-handler hang), bumped `--max-turns` from 20 to 50 on both `pr-review` and `pr-comment` for deeper analysis, upgraded to Opus 4.7, standardized 23 subagent definitions with explicit when-to-use sections and tool allowlists, and added agent-builder tooling (manifest schema, `lint.py --agents`, BuilderAgent mixins). --- ## Docs - **Roadmap overhaul** (PR #710) — Milestone-aligned plans with voice-first as P0 and 9 new plan documents for upcoming initiatives. - **Plan: email triage agent** (PR #796) — Specification for an upcoming email triage agent. - **Docs/source drift resolved** (PR #794) — Fixed broken SDK examples across 15 docs, rewrote 5 spec files against the current source (including two that documented entire APIs that don't exist in code), added 20+ missing CLI flags to the CLI reference, and removed 2 already-shipped plan documents (installer, mcp-client). - **FAQ: data-privacy answer clarified for external LLM providers** (PR #798) — Sharper guidance on what leaves your machine when you point GAIA at Claude or OpenAI. --- ## Full Changelog **21 commits** since v0.17.2: - `6d3f3f71` — fix: replace misplaced decorative cursor with tracked terminal block cursor (#727) - `874cf2a3` — fix: Ask Agent indexes and attaches files before sending to chat (#725) - `4fa121e2` — fix: surface document indexing failures instead of silent 0-chunk success (#723) - `34b1d06e` — fix(ci): ad-hoc sign macOS DMG instead of skipping code signing (#765) - `7188b83c` — Roadmap overhaul: milestone-aligned plans with voice-first P0 and 9 new plan documents (#710) - `1beddac5` — cpp: support Ollama-compatible /v1 endpoints (#773) - `cf9ac995` — fix: harden rag index thread safety (#746) - `1c55c31b` — fix(ci): remove legacy electron apps from publish, single approval gate (#758) - `52946a7a` — feat(installer): bundle Lemonade Server MSI into Windows installer (#774) (#781) - `e96b3686` — ci(claude): review infra + conventions + subagent overhaul + agent-builder tooling (#783) - `058674b5` — fix(rag): detect encrypted and corrupted PDFs with actionable errors (#451) (#784) - `7bcb5d51` — fix: replace insecure pickle deserialization with JSON + HMAC in RAG cache (CWE-502) (#768) - `a5167e5f` — fix: keep file-search count aligned with accessible results (#754) - `da5ba458` — ci(claude): migrate to claude-code-action v1.0.99 + fix issue-handler hang (#797) - `03f546b9` — ci(claude): bump pr-review and pr-comment --max-turns 20 -> 50 (#799) - `4119d564` — docs(faq): clarify data-privacy answer re: external LLM providers (#798) - `0cfbcf41` — Add example agents and integration test workflow (#340) - `c4bd15fb` — docs: fix drift between docs and source (docs review pass 1 + 2) (#794) - `407ed5b8` — docs(plans): add email triage agent spec (#796) - `06fb04a4` — fix(mcp): guard JSON-RPC handler against non-dict body (#803) - `880ad603` — feat(installer): custom installer guide, agent export/import, first-launch seeder (#795) Full Changelog: [v0.17.2...v0.17.3](v0.17.2...v0.17.3) --- ## Release checklist - [x] `util/validate_release_notes.py docs/releases/v0.17.3.mdx --tag v0.17.3` passes - [x] `src/gaia/version.py` → `0.17.3` - [x] `src/gaia/apps/webui/package.json` → `0.17.3` - [x] Navbar label in `docs/docs.json` → `v0.17.3 · Lemonade 10.0.0` - [x] All 21 PRs in the range (v0.17.2..HEAD) are represented in the notes - [ ] Review from @kovtcharov-amd addressed

kovtcharov requested a review from kovtcharov-amd as a code owner February 12, 2026 01:33

github-actions bot added devops DevOps/infrastructure changes tests Test changes labels Feb 12, 2026

Claude Code added 4 commits February 11, 2026 17:34

Add tests for remaining example agents

739a32f

- Add tests for mcp_time_server_agent.py - Add tests for mcp_windows_system_health_agent.py - Add tests for sd_agent_example.py - Coverage now 10/10 (100%) of example files

Remove copyright header validation from examples workflow

391e1be

External contributors can submit examples without AMD copyright

Update docs homepage with professional messaging

4f5aaa4

- Replace marketing slogan with technical value prop - Update headline to 'Agent SDK for AMD Ryzen AI' - Add MCP to list of key capabilities - Add Computer Use Agents (CUA) to use cases - More professional tone per leadership feedback

github-actions bot added the documentation Documentation changes label Feb 12, 2026

itomek approved these changes Feb 12, 2026

View reviewed changes

Comment thread docs/index.mdx Outdated

Comment thread docs/index.mdx Outdated

kovtcharov-amd and others added 2 commits February 19, 2026 10:18

Merge branch 'main' into kalin/examples

ac9c083

Merge branch 'main' into kalin/examples

ae23be8

kovtcharov self-assigned this Feb 27, 2026

kovtcharov-amd approved these changes Mar 2, 2026

View reviewed changes

Claude Code added 3 commits March 5, 2026 14:07

Merge branch 'main' into kalin/examples

93221c8

Merge remote-tracking branch 'origin/main' into kalin/examples

a060086

# Conflicts: # docs/index.mdx

kovtcharov added this to the Futures milestone Mar 16, 2026

kovtcharov added 6 commits April 17, 2026 10:34

Merge remote-tracking branch 'origin/main' into kalin/examples

5727fc2

github-actions bot added rag RAG system changes performance Performance-critical changes labels Apr 17, 2026

kovtcharov added 2 commits April 17, 2026 11:27

kovtcharov modified the milestones: vFutures, v0.17.3 — RAG bug fixes and security hardening [OSS] Apr 17, 2026

kovtcharov added this pull request to the merge queue Apr 17, 2026

Merged via the queue into main with commit 0cfbcf4 Apr 17, 2026
37 checks passed

kovtcharov deleted the kalin/examples branch April 17, 2026 21:59

itomek mentioned this pull request Apr 20, 2026

Release v0.17.3 #831

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add example agents and integration test workflow#340

Add example agents and integration test workflow#340
kovtcharov merged 18 commits intomainfrom
kalin/examples

kovtcharov commented Feb 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kovtcharov commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New Examples

Testing

Documentation Updates

CI/CD Workflow

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kovtcharov commented Feb 12, 2026 •

edited

Loading