fix: support llama.cpp as direct backend without Lemonade by stampby · Pull Request #684 · amd/gaia

stampby · 2026-04-01T04:37:48Z

Summary

LemonadeClient: Don't append /api/v1 when base_url already has a path (e.g. http://localhost:8081/v1 stays as-is). Bare URLs without a path still get /api/v1 appended for Lemonade compatibility.
LemonadeClient: Skip _ensure_model_loaded for non-Lemonade backends, detected by absence of all_models_loaded/model_loaded fields in /health response.
System status: Fallback to /v1/models for llama.cpp-style model detection when /health doesn't report model_loaded. Checks both {"models": [...]} (llama.cpp) and {"data": [...]} (OpenAI) formats.

Problem

When using gaia chat --ui --base-url http://localhost:8081/v1 with llama.cpp directly (no Lemonade), three issues caused 404 errors:

base_url normalization unconditionally appended /api/v1, turning http://localhost:8081/v1 into http://localhost:8081/v1/api/v1 — all API calls hit wrong paths
_ensure_model_loaded called Lemonade-specific load_model endpoint which doesn't exist on llama.cpp
System status couldn't detect loaded models because llama.cpp's /health returns {"status":"ok"} without model_loaded field

Test plan

gaia chat --ui --base-url http://localhost:8081/v1 --no-lemonade-check works with llama.cpp
gaia chat --ui with default Lemonade backend still works (no regression)
Bare --base-url http://localhost:8080 still gets /api/v1 appended
System status correctly shows model_loaded from llama.cpp /v1/models
Existing unit tests pass

🤖 Generated with Claude Code

- LemonadeClient: don't append /api/v1 when base_url already has a path (e.g. http://localhost:8081/v1 stays as-is, bare URLs still get /api/v1) - LemonadeClient: skip _ensure_model_loaded for non-Lemonade backends (detected by absence of all_models_loaded/model_loaded in /health) - System status: fallback to /v1/models for llama.cpp-style model detection when /health doesn't report model_loaded Fixes 404 errors when using Gaia Agent UI with llama.cpp directly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

kovtcharov · 2026-04-01T05:03:38Z

@claude please review this PR and make sure it's backwards compatible.

## Summary - Fix `fatal: a branch named 'main' already exists` error when reviewing fork PRs where the contributor's head branch is `main` (their default branch) - Fix corrupted diff base — after redirecting origin to the fork, `origin/$BASE_BRANCH` gets overwritten, so save the base SHA before any URL changes - Use fixed branch name `pr-head` with `-B` (force-create) instead of `-b $HEAD_REF` to avoid name collisions ## Root Cause When a fork PR uses `main` as its head branch ([PR #684](#684)): 1. `actions/checkout` creates local `main` branch from the base repo 2. `git checkout -b main origin/main` fails because `main` already exists 3. Even if that succeeded, `git fetch origin main` (from fork) would overwrite `origin/main`, making the diff compare fork-vs-fork instead of base-vs-fork Failed run: https://github.com/amd/gaia/actions/runs/23859641383/job/69562473226 ## Test plan - [ ] Trigger Claude review on [PR #684](#684) (fork PR with `main` as head branch) — should succeed - [ ] Trigger Claude review on a non-fork PR — should still work as before - [ ] Verify the generated diff in logs shows correct base-vs-fork comparison Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

kovtcharov · 2026-04-15T09:11:26Z

@claude what changes are needed in this PR to get all tests in CI to pass?

github-actions · 2026-04-15T09:12:14Z

Claude finished @kovtcharov's task —— View job

CI Test Failure Analysis

Summary

The PR introduces two changes to lemonade_client.py:

_ensure_model_loaded now calls health_check() first to detect non-Lemonade backends
URL path handling - only appends /api/v1 to bare URLs without a path

Required Test Fixes

The existing unit tests in tests/unit/test_lemonade_model_loading.py will fail because they mock get_status and load_model but don't mock health_check(). The new code calls health_check() before get_status().

Tests that need updating:

Test	File	Issue
`test_calls_load_when_model_not_loaded`	`test_lemonade_model_loading.py:15`	Missing `health_check` mock
`test_skips_load_when_model_already_loaded`	`test_lemonade_model_loading.py:33`	Missing `health_check` mock
`test_handles_status_check_error_gracefully`	`test_lemonade_model_loading.py:65`	Missing `health_check` mock
`test_ensure_model_loaded_passes_prompt_false`	`test_lemonade_model_loading.py:175`	Missing `health_check` mock
`test_model_loaded_when_not_present`	`test_lemonade_model_loading.py:201`	Missing `health_check` mock
`test_model_not_loaded_when_already_present`	`test_lemonade_model_loading.py:245`	Missing `health_check` mock

Fix Required:

Add @patch.object(LemonadeClient, "health_check") to each test and configure the mock to return a Lemonade-style response:

@patch.object(LemonadeClient, "health_check")
@patch.object(LemonadeClient, "get_status")
@patch.object(LemonadeClient, "load_model")
def test_calls_load_when_model_not_loaded(self, mock_load, mock_status, mock_health):
    # Mock health_check to return Lemonade-style response (so it continues to get_status)
    mock_health.return_value = {"status": "ok", "model_loaded": "some-model"}
    # ... rest of test unchanged

Additional Test Needed:

Add a new test to verify the non-Lemonade backend detection works:

@patch.object(LemonadeClient, "health_check")
@patch.object(LemonadeClient, "get_status")
@patch.object(LemonadeClient, "load_model")
def test_skips_load_for_non_lemonade_backend(self, mock_load, mock_status, mock_health):
    """Verify _ensure_model_loaded skips load for non-Lemonade backends (llama.cpp)."""
    client = LemonadeClient(host="localhost", port=8000)
    # Return llama.cpp-style health response (no model_loaded field)
    mock_health.return_value = {"status": "ok"}
    
    client._ensure_model_loaded("model-a", auto_download=True)
    
    # Should NOT call get_status or load_model for non-Lemonade backends
    mock_status.assert_not_called()
    mock_load.assert_not_called()

Backwards Compatibility ✅

The changes are backwards compatible:

URLs with paths (e.g., http://localhost:8081/v1) are preserved as-is
Bare URLs (e.g., http://localhost:8080) still get /api/v1 appended
Lemonade backends continue to work normally (health check returns model_loaded field)

@kovtcharov

Tests were missing health_check() mock after _ensure_model_loaded was updated to detect non-Lemonade backends via /health response. Fixes: - Add @patch.object(LemonadeClient, "health_check") to 6 existing tests - Add mock_health parameter and Lemonade-style return values - Add TestNonLemonadeBackend class with llama.cpp detection test Per CI analysis by @kovtcharov Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

This test returns early before health_check is called (auto_download=False), so the extra mock parameter caused a positional argument mismatch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

itomek

Thanks for the PR, @stampby! One ask before merge: a parametrized test for the __init__ URL normalization. Something like:

  @pytest.mark.parametrize("base_url,expected", [
    ("http://localhost:8000", "http://localhost:8000/api/v1"), # bare → appended
    ("http://localhost:8081/v1", "http://localhost:8081/v1"), # llama.cpp — the fix
    ("http://localhost:8000/api/v1", "http://localhost:8000/api/v1"), # already normalized
    ("http://proxy.example.com/lemonade", ???), # see question below
  ])

  def test_base_url_normalization(base_url, expected):

  assert LemonadeClient(base_url=base_url).base_url == expected

stampby requested a review from kovtcharov-amd as a code owner April 1, 2026 04:37

github-actions bot added llm LLM backend changes performance Performance-critical changes labels Apr 1, 2026

kovtcharov mentioned this pull request Apr 1, 2026

fix: Claude workflow fails on fork PRs when head branch is 'main' #685

Merged

3 tasks

github-actions bot added the tests Test changes label Apr 15, 2026

fix: remove health_check patch from auto_download_disabled test

e401b5a

This test returns early before health_check is called (auto_download=False), so the extra mock parameter caused a positional argument mismatch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

itomek requested changes Apr 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: support llama.cpp as direct backend without Lemonade#684

fix: support llama.cpp as direct backend without Lemonade#684
stampby wants to merge 3 commits intoamd:mainfrom
stampby:main

stampby commented Apr 1, 2026

Uh oh!

kovtcharov commented Apr 1, 2026 •

edited

Loading

Uh oh!

kovtcharov commented Apr 15, 2026

Uh oh!

github-actions bot commented Apr 15, 2026 •

edited

Loading

Uh oh!

itomek left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

stampby commented Apr 1, 2026

Summary

Problem

Test plan

Uh oh!

kovtcharov commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kovtcharov commented Apr 15, 2026

Uh oh!

github-actions bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI Test Failure Analysis

Summary

Required Test Fixes

Tests that need updating:

Fix Required:

Additional Test Needed:

Backwards Compatibility ✅

Uh oh!

itomek left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kovtcharov commented Apr 1, 2026 •

edited

Loading

github-actions bot commented Apr 15, 2026 •

edited

Loading