Skip to content

fix: support llama.cpp as direct backend without Lemonade#684

Open
stampby wants to merge 3 commits intoamd:mainfrom
stampby:main
Open

fix: support llama.cpp as direct backend without Lemonade#684
stampby wants to merge 3 commits intoamd:mainfrom
stampby:main

Conversation

@stampby
Copy link
Copy Markdown

@stampby stampby commented Apr 1, 2026

Summary

  • LemonadeClient: Don't append /api/v1 when base_url already has a path (e.g. http://localhost:8081/v1 stays as-is). Bare URLs without a path still get /api/v1 appended for Lemonade compatibility.
  • LemonadeClient: Skip _ensure_model_loaded for non-Lemonade backends, detected by absence of all_models_loaded/model_loaded fields in /health response.
  • System status: Fallback to /v1/models for llama.cpp-style model detection when /health doesn't report model_loaded. Checks both {"models": [...]} (llama.cpp) and {"data": [...]} (OpenAI) formats.

Problem

When using gaia chat --ui --base-url http://localhost:8081/v1 with llama.cpp directly (no Lemonade), three issues caused 404 errors:

  1. base_url normalization unconditionally appended /api/v1, turning http://localhost:8081/v1 into http://localhost:8081/v1/api/v1 — all API calls hit wrong paths
  2. _ensure_model_loaded called Lemonade-specific load_model endpoint which doesn't exist on llama.cpp
  3. System status couldn't detect loaded models because llama.cpp's /health returns {"status":"ok"} without model_loaded field

Test plan

  • gaia chat --ui --base-url http://localhost:8081/v1 --no-lemonade-check works with llama.cpp
  • gaia chat --ui with default Lemonade backend still works (no regression)
  • Bare --base-url http://localhost:8080 still gets /api/v1 appended
  • System status correctly shows model_loaded from llama.cpp /v1/models
  • Existing unit tests pass

🤖 Generated with Claude Code

- LemonadeClient: don't append /api/v1 when base_url already has a path
  (e.g. http://localhost:8081/v1 stays as-is, bare URLs still get /api/v1)
- LemonadeClient: skip _ensure_model_loaded for non-Lemonade backends
  (detected by absence of all_models_loaded/model_loaded in /health)
- System status: fallback to /v1/models for llama.cpp-style model detection
  when /health doesn't report model_loaded

Fixes 404 errors when using Gaia Agent UI with llama.cpp directly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@stampby stampby requested a review from kovtcharov-amd as a code owner April 1, 2026 04:37
@github-actions github-actions bot added llm LLM backend changes performance Performance-critical changes labels Apr 1, 2026
@kovtcharov
Copy link
Copy Markdown
Collaborator

kovtcharov commented Apr 1, 2026

@claude please review this PR and make sure it's backwards compatible.

github-merge-queue bot pushed a commit that referenced this pull request Apr 1, 2026
## Summary

- Fix `fatal: a branch named 'main' already exists` error when reviewing
fork PRs where the contributor's head branch is `main` (their default
branch)
- Fix corrupted diff base — after redirecting origin to the fork,
`origin/$BASE_BRANCH` gets overwritten, so save the base SHA before any
URL changes
- Use fixed branch name `pr-head` with `-B` (force-create) instead of
`-b $HEAD_REF` to avoid name collisions

## Root Cause

When a fork PR uses `main` as its head branch ([PR
#684](#684)):
1. `actions/checkout` creates local `main` branch from the base repo
2. `git checkout -b main origin/main` fails because `main` already
exists
3. Even if that succeeded, `git fetch origin main` (from fork) would
overwrite `origin/main`, making the diff compare fork-vs-fork instead of
base-vs-fork

Failed run:
https://github.com/amd/gaia/actions/runs/23859641383/job/69562473226

## Test plan

- [ ] Trigger Claude review on [PR
#684](#684) (fork PR with `main` as head
branch) — should succeed
- [ ] Trigger Claude review on a non-fork PR — should still work as
before
- [ ] Verify the generated diff in logs shows correct base-vs-fork
comparison

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@kovtcharov
Copy link
Copy Markdown
Collaborator

@claude what changes are needed in this PR to get all tests in CI to pass?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 15, 2026

Claude finished @kovtcharov's task —— View job


CI Test Failure Analysis

  • Read PR diff to understand changes
  • Check PR branch code
  • Run lint checks on PR code
  • Analyze unit tests
  • Identify failing tests and root causes
  • Provide specific fixes needed

Summary

The PR introduces two changes to lemonade_client.py:

  1. _ensure_model_loaded now calls health_check() first to detect non-Lemonade backends
  2. URL path handling - only appends /api/v1 to bare URLs without a path

Required Test Fixes

The existing unit tests in tests/unit/test_lemonade_model_loading.py will fail because they mock get_status and load_model but don't mock health_check(). The new code calls health_check() before get_status().

Tests that need updating:

Test File Issue
test_calls_load_when_model_not_loaded test_lemonade_model_loading.py:15 Missing health_check mock
test_skips_load_when_model_already_loaded test_lemonade_model_loading.py:33 Missing health_check mock
test_handles_status_check_error_gracefully test_lemonade_model_loading.py:65 Missing health_check mock
test_ensure_model_loaded_passes_prompt_false test_lemonade_model_loading.py:175 Missing health_check mock
test_model_loaded_when_not_present test_lemonade_model_loading.py:201 Missing health_check mock
test_model_not_loaded_when_already_present test_lemonade_model_loading.py:245 Missing health_check mock

Fix Required:

Add @patch.object(LemonadeClient, "health_check") to each test and configure the mock to return a Lemonade-style response:

@patch.object(LemonadeClient, "health_check")
@patch.object(LemonadeClient, "get_status")
@patch.object(LemonadeClient, "load_model")
def test_calls_load_when_model_not_loaded(self, mock_load, mock_status, mock_health):
    # Mock health_check to return Lemonade-style response (so it continues to get_status)
    mock_health.return_value = {"status": "ok", "model_loaded": "some-model"}
    # ... rest of test unchanged

Additional Test Needed:

Add a new test to verify the non-Lemonade backend detection works:

@patch.object(LemonadeClient, "health_check")
@patch.object(LemonadeClient, "get_status")
@patch.object(LemonadeClient, "load_model")
def test_skips_load_for_non_lemonade_backend(self, mock_load, mock_status, mock_health):
    """Verify _ensure_model_loaded skips load for non-Lemonade backends (llama.cpp)."""
    client = LemonadeClient(host="localhost", port=8000)
    # Return llama.cpp-style health response (no model_loaded field)
    mock_health.return_value = {"status": "ok"}
    
    client._ensure_model_loaded("model-a", auto_download=True)
    
    # Should NOT call get_status or load_model for non-Lemonade backends
    mock_status.assert_not_called()
    mock_load.assert_not_called()

Backwards Compatibility ✅

The changes are backwards compatible:

  • URLs with paths (e.g., http://localhost:8081/v1) are preserved as-is
  • Bare URLs (e.g., http://localhost:8080) still get /api/v1 appended
  • Lemonade backends continue to work normally (health check returns model_loaded field)

Tests were missing health_check() mock after _ensure_model_loaded
was updated to detect non-Lemonade backends via /health response.

Fixes:
- Add @patch.object(LemonadeClient, "health_check") to 6 existing tests
- Add mock_health parameter and Lemonade-style return values
- Add TestNonLemonadeBackend class with llama.cpp detection test

Per CI analysis by @kovtcharov

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added the tests Test changes label Apr 15, 2026
This test returns early before health_check is called (auto_download=False),
so the extra mock parameter caused a positional argument mismatch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@itomek itomek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, @stampby! One ask before merge: a parametrized test for the __init__ URL normalization. Something like:

  @pytest.mark.parametrize("base_url,expected", [
    ("http://localhost:8000", "http://localhost:8000/api/v1"), # bare → appended
    ("http://localhost:8081/v1", "http://localhost:8081/v1"), # llama.cpp — the fix
    ("http://localhost:8000/api/v1", "http://localhost:8000/api/v1"), # already normalized
    ("http://proxy.example.com/lemonade", ???), # see question below
  ])

  def test_base_url_normalization(base_url, expected):

  assert LemonadeClient(base_url=base_url).base_url == expected

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

llm LLM backend changes performance Performance-critical changes tests Test changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants