Skip to content

Eval bug: Qwen 3.6 27B forcing full prompt re-processing due to lack of cache data #22746

@tarunkapadia-2266

Description

@tarunkapadia-2266

Name and Version

PS D:\llm\llama-b8977-bin-win-hip-radeon-x64> .\llama-cli --version
HIP Library Path: C:\WINDOWS\SYSTEM32\amdhip64_7.dll
ggml_cuda_init: found 1 ROCm devices (Total VRAM: 16368 MiB):
Device 0: AMD Radeon RX 7800 XT, gfx1101 (0x1101), VMM: no, Wave Size: 32, VRAM: 16368 MiB
load_backend: loaded ROCm backend from D:\llm\llama-b8977-bin-win-hip-radeon-x64\ggml-hip.dll
load_backend: loaded RPC backend from D:\llm\llama-b8977-bin-win-hip-radeon-x64\ggml-rpc.dll
load_backend: loaded CPU backend from D:\llm\llama-b8977-bin-win-hip-radeon-x64\ggml-cpu-zen4.dll
version: 8977 (b1d5f5b)
built with Clang 19.1.5 for Windows x86_64

Operating systems

Windows

GGML backends

HIP

Hardware

PC Spec:
Ryzen 7700x
32GB DDR5 5200 Mhz
16 GB 7800 XT
2TB nvme SSD

Models

Qwen 3.6 27B @ IQ4_XS / Q4_K_M / UD-IQ3_XXS / Q3_K_M

Problem description & steps to reproduce

llama server start up script... also i am using opencode as the harness

#!/bin/bash

QWEN_3_6_27_Q3_K_M='models\\Qwen3.6-27B-Q3_K_M.gguf'
QWEN_3_6_27_UD-IQ3_XXS='models\\Qwen3.6-27B-UD-IQ3_XXS.gguf'
QWEN_3_6_27_Q4_K_M='models\\Qwen3.6-27B-Q4_K_M.gguf'
QWEN_3_6_27_IQ4_XS='models\\Qwen3.6-27B-IQ4_XS.gguf'

# --- CORE PATHS ---
LLAMA_BIN="llama-b8977-bin-win-hip-radeon-x64/llama-server.exe"
MODELS_PATH="models/"
SLOT_PATH="models/cache"
PROMPT_CACHE='models/cache/main_cache.bin'

# --- CORE SETTINGS ---
CTX_SIZE=131072 #65336
GPU_LAYERS=99
THREADS=16
K_TYPE="q8_0"
V_TYPE="q8_0"
PORT=8081
ADDR="0.0.0.0"

# --- AGENTIC TUNING ---
TEMP="0.6"
MIN_P="0.0"
TOP_P="0.95"
TOP_K="20"
PRESENCE_PENALTY="0.0"
REPEAT_PENALTY="1.0"

# Optimized for Branching/Multi-Agent (OpenCode)
BATCH_SIZE=512
UBATCH_SIZE=512
CACHE_RAM_SIZE=18432 # 18GB RAM for snapshots
CTX_CHECKPOINTS=64 # Sufficient for 18GB
CHECKPOINT_INTERVAL=8192 # Smaller interval = Faster switching between agents

# --- CONSTRUCT COMMAND ---

CMD="$LLAMA_BIN \
  --model \"$QWEN_3_6_27_Q3_K_M\" \
  --models-max 1 \
  --ctx-size $CTX_SIZE \
  --n-gpu-layers $GPU_LAYERS \
  --threads $THREADS \
  --flash-attn on \
  --cache-type-k $K_TYPE \
  --cache-type-v $V_TYPE \
  --batch-size $BATCH_SIZE \
  --ubatch-size $UBATCH_SIZE \
  --cache-ram $CACHE_RAM_SIZE \
  --checkpoint-every-n-tokens $CHECKPOINT_INTERVAL \
  --cache-reuse 1 \
  --ctx-checkpoints $CTX_CHECKPOINTS \
  --log-colors on \
  --slot-prompt-similarity 0.1 \
  --metrics \
  --reasoning on \
  --parallel 2 \
  --cache-prompt \
  --no-prefill-assistant \
  --cont-batching \
  --port $PORT \
  --host $ADDR \
  --jinja \
  --no-mmap \
  --embeddings \
  --webui \
  --chat-template-kwargs '{\"enable_thinking\": true, \"preserve_thinking\": true}'"

# --- ADD SAMPLING ---
CMD="$CMD --temp $TEMP --min-p $MIN_P --top-p $TOP_P --top-k $TOP_K"
CMD="$CMD --presence-penalty $PRESENCE_PENALTY --repeat-penalty $REPEAT_PENALTY"

echo "-------------------------------------------------------"
echo "Launching Llama.cpp Server"
echo "-------------------------------------------------------"
eval $CMD

First Bad Commit

No response

Relevant log output

Logs
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-native
slot get_availabl: id  1 | task -1 | selected slot by LCP similarity, sim_best = 0.965 (> 0.100 thold), f_keep = 0.967
slot launch_slot_: id  1 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> ?top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  1 | task 93797 | processing task, is_child = 0
slot update_slots: id  1 | task 93797 | new prompt, n_ctx_slot = 65536, n_keep = 0, task.n_tokens = 9287
slot update_slots: id  1 | task 93797 | n_past = 8962, slot.prompt.tokens.size() = 9267, seq_id = 1, pos_min = 9266, n_swa = 0
slot update_slots: id  1 | task 93797 | Checking checkpoint with [8886, 8886] against 8962...
slot update_slots: id  1 | task 93797 | restored context checkpoint (pos_min = 8886, pos_max = 8886, n_tokens = 8887, n_past = 8887, size = 149.626 MiB)
slot update_slots: id  1 | task 93797 | n_tokens = 8887, memory_seq_rm [8887, end)
slot update_slots: id  1 | task 93797 | prompt processing progress, n_tokens = 9283, batch.n_tokens = 396, progress = 0.999569
slot update_slots: id  1 | task 93797 | n_tokens = 9283, memory_seq_rm [9283, end)
reasoning-budget: activated, budget=2147483647 tokens
reasoning-budget: deactivated (natural end)
reasoning-budget: re-activated on new start tag, budget=2147483647 tokens
reasoning-budget: deactivated (natural end)
reasoning-budget: re-activated on new start tag, budget=2147483647 tokens
reasoning-budget: deactivated (natural end)
reasoning-budget: re-activated on new start tag, budget=2147483647 tokens
reasoning-budget: deactivated (natural end)
reasoning-budget: re-activated on new start tag, budget=2147483647 tokens
reasoning-budget: deactivated (natural end)
reasoning-budget: re-activated on new start tag, budget=2147483647 tokens
reasoning-budget: deactivated (natural end)
reasoning-budget: re-activated on new start tag, budget=2147483647 tokens
reasoning-budget: deactivated (natural end)
reasoning-budget: re-activated on new start tag, budget=2147483647 tokens
slot init_sampler: id  1 | task 93797 | init sampler, took 0.99 ms, tokens: text = 9287, total = 9287
slot update_slots: id  1 | task 93797 | prompt processing done, n_tokens = 9287, batch.n_tokens = 4
slot create_check: id  1 | task 93797 | created context checkpoint 8 of 64 (pos_min = 9282, pos_max = 9282, n_tokens = 9283, size = 149.626 MiB)
srv  log_server_r: done request: POST /v1/chat/completions 127.0.0.1 200
reasoning-budget: deactivated (natural end)
slot print_timing: id  1 | task 93797 |
prompt eval time =    2374.07 ms /   400 tokens (    5.94 ms per token,   168.49 tokens per second)
       eval time =   10536.28 ms /   111 tokens (   94.92 ms per token,    10.54 tokens per second)
      total time =   12910.35 ms /   511 tokens
slot      release: id  1 | task 93797 | stop processing: n_tokens = 9397, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-native
slot get_availabl: id  1 | task -1 | selected slot by LCP similarity, sim_best = 0.928 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  1 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> ?top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  1 | task 93910 | processing task, is_child = 0
slot update_slots: id  1 | task 93910 | new prompt, n_ctx_slot = 65536, n_keep = 0, task.n_tokens = 10121
slot update_slots: id  1 | task 93910 | n_tokens = 9397, memory_seq_rm [9397, end)
slot update_slots: id  1 | task 93910 | prompt processing progress, n_tokens = 9609, batch.n_tokens = 212, progress = 0.949412
slot update_slots: id  1 | task 93910 | n_tokens = 9609, memory_seq_rm [9609, end)
slot update_slots: id  1 | task 93910 | prompt processing progress, n_tokens = 10117, batch.n_tokens = 508, progress = 0.999605
slot create_check: id  1 | task 93910 | created context checkpoint 9 of 64 (pos_min = 9608, pos_max = 9608, n_tokens = 9609, size = 149.626 MiB)
slot update_slots: id  1 | task 93910 | n_tokens = 10117, memory_seq_rm [10117, end)
reasoning-budget: activated, budget=2147483647 tokens
reasoning-budget: deactivated (natural end)
reasoning-budget: re-activated on new start tag, budget=2147483647 tokens
reasoning-budget: deactivated (natural end)
reasoning-budget: re-activated on new start tag, budget=2147483647 tokens
reasoning-budget: deactivated (natural end)
reasoning-budget: re-activated on new start tag, budget=2147483647 tokens
reasoning-budget: deactivated (natural end)
reasoning-budget: re-activated on new start tag, budget=2147483647 tokens
reasoning-budget: deactivated (natural end)
reasoning-budget: re-activated on new start tag, budget=2147483647 tokens
reasoning-budget: deactivated (natural end)
reasoning-budget: re-activated on new start tag, budget=2147483647 tokens
reasoning-budget: deactivated (natural end)
reasoning-budget: re-activated on new start tag, budget=2147483647 tokens
reasoning-budget: deactivated (natural end)
reasoning-budget: re-activated on new start tag, budget=2147483647 tokens
slot init_sampler: id  1 | task 93910 | init sampler, took 1.08 ms, tokens: text = 10121, total = 10121
slot update_slots: id  1 | task 93910 | prompt processing done, n_tokens = 10121, batch.n_tokens = 4
slot create_check: id  1 | task 93910 | created context checkpoint 10 of 64 (pos_min = 10116, pos_max = 10116, n_tokens = 10117, size = 149.626 MiB)
srv  log_server_r: done request: POST /v1/chat/completions 127.0.0.1 200
reasoning-budget: deactivated (natural end)
slot print_timing: id  1 | task 93910 |
prompt eval time =    4096.00 ms /   724 tokens (    5.66 ms per token,   176.76 tokens per second)
       eval time =   24213.29 ms /   250 tokens (   96.85 ms per token,    10.32 tokens per second)
      total time =   28309.29 ms /   974 tokens
slot      release: id  1 | task 93910 | stop processing: n_tokens = 10370, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-native
slot get_availabl: id  1 | task -1 | selected slot by LCP similarity, sim_best = 0.902 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  1 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> ?top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  1 | task 94163 | processing task, is_child = 0
slot update_slots: id  1 | task 94163 | new prompt, n_ctx_slot = 65536, n_keep = 0, task.n_tokens = 11502
slot update_slots: id  1 | task 94163 | n_tokens = 10370, memory_seq_rm [10370, end)
slot update_slots: id  1 | task 94163 | prompt processing progress, n_tokens = 10882, batch.n_tokens = 512, progress = 0.946096
slot update_slots: id  1 | task 94163 | n_tokens = 10882, memory_seq_rm [10882, end)
slot update_slots: id  1 | task 94163 | prompt processing progress, n_tokens = 10990, batch.n_tokens = 108, progress = 0.955486
slot update_slots: id  1 | task 94163 | n_tokens = 10990, memory_seq_rm [10990, end)
slot update_slots: id  1 | task 94163 | prompt processing progress, n_tokens = 11498, batch.n_tokens = 508, progress = 0.999652
slot create_check: id  1 | task 94163 | created context checkpoint 11 of 64 (pos_min = 10989, pos_max = 10989, n_tokens = 10990, size = 149.626 MiB)
slot update_slots: id  1 | task 94163 | n_tokens = 11498, memory_seq_rm [11498, end)
reasoning-budget: activated, budget=2147483647 tokens
reasoning-budget: deactivated (natural end)
reasoning-budget: re-activated on new start tag, budget=2147483647 tokens
reasoning-budget: deactivated (natural end)
reasoning-budget: re-activated on new start tag, budget=2147483647 tokens
reasoning-budget: deactivated (natural end)
reasoning-budget: re-activated on new start tag, budget=2147483647 tokens
reasoning-budget: deactivated (natural end)
reasoning-budget: re-activated on new start tag, budget=2147483647 tokens
reasoning-budget: deactivated (natural end)
reasoning-budget: re-activated on new start tag, budget=2147483647 tokens
reasoning-budget: deactivated (natural end)
reasoning-budget: re-activated on new start tag, budget=2147483647 tokens
reasoning-budget: deactivated (natural end)
reasoning-budget: re-activated on new start tag, budget=2147483647 tokens
reasoning-budget: deactivated (natural end)
reasoning-budget: re-activated on new start tag, budget=2147483647 tokens
reasoning-budget: deactivated (natural end)
reasoning-budget: re-activated on new start tag, budget=2147483647 tokens
slot init_sampler: id  1 | task 94163 | init sampler, took 1.22 ms, tokens: text = 11502, total = 11502
slot update_slots: id  1 | task 94163 | prompt processing done, n_tokens = 11502, batch.n_tokens = 4
slot create_check: id  1 | task 94163 | created context checkpoint 12 of 64 (pos_min = 11497, pos_max = 11497, n_tokens = 11498, size = 149.626 MiB)
srv  log_server_r: done request: POST /v1/chat/completions 127.0.0.1 200
reasoning-budget: deactivated (natural end)
slot print_timing: id  1 | task 94163 |
prompt eval time =    6487.16 ms /  1132 tokens (    5.73 ms per token,   174.50 tokens per second)
       eval time =   17400.56 ms /   177 tokens (   98.31 ms per token,    10.17 tokens per second)
      total time =   23887.72 ms /  1309 tokens
slot      release: id  1 | task 94163 | stop processing: n_tokens = 11678, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-native
slot get_availabl: id  1 | task -1 | selected slot by LCP similarity, sim_best = 0.998 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  1 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> ?top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  1 | task 94344 | processing task, is_child = 0
slot update_slots: id  1 | task 94344 | new prompt, n_ctx_slot = 65536, n_keep = 0, task.n_tokens = 11697
slot update_slots: id  1 | task 94344 | n_tokens = 11678, memory_seq_rm [11678, end)
slot update_slots: id  1 | task 94344 | prompt processing progress, n_tokens = 11693, batch.n_tokens = 15, progress = 0.999658
slot create_check: id  1 | task 94344 | created context checkpoint 13 of 64 (pos_min = 11677, pos_max = 11677, n_tokens = 11678, size = 149.626 MiB)
slot update_slots: id  1 | task 94344 | n_tokens = 11693, memory_seq_rm [11693, end)
reasoning-budget: activated, budget=2147483647 tokens
reasoning-budget: deactivated (natural end)
reasoning-budget: re-activated on new start tag, budget=2147483647 tokens
reasoning-budget: deactivated (natural end)
reasoning-budget: re-activated on new start tag, budget=2147483647 tokens
reasoning-budget: deactivated (natural end)
reasoning-budget: re-activated on new start tag, budget=2147483647 tokens
reasoning-budget: deactivated (natural end)
reasoning-budget: re-activated on new start tag, budget=2147483647 tokens
reasoning-budget: deactivated (natural end)
reasoning-budget: re-activated on new start tag, budget=2147483647 tokens
reasoning-budget: deactivated (natural end)
reasoning-budget: re-activated on new start tag, budget=2147483647 tokens
reasoning-budget: deactivated (natural end)
reasoning-budget: re-activated on new start tag, budget=2147483647 tokens
reasoning-budget: deactivated (natural end)
reasoning-budget: re-activated on new start tag, budget=2147483647 tokens
reasoning-budget: deactivated (natural end)
reasoning-budget: re-activated on new start tag, budget=2147483647 tokens
reasoning-budget: deactivated (natural end)
reasoning-budget: re-activated on new start tag, budget=2147483647 tokens
slot init_sampler: id  1 | task 94344 | init sampler, took 1.24 ms, tokens: text = 11697, total = 11697
slot update_slots: id  1 | task 94344 | prompt processing done, n_tokens = 11697, batch.n_tokens = 4
srv  log_server_r: done request: POST /v1/chat/completions 127.0.0.1 200
reasoning-budget: deactivated (natural end)
slot print_timing: id  1 | task 94344 |
prompt eval time =     400.78 ms /    19 tokens (   21.09 ms per token,    47.41 tokens per second)
       eval time =    8571.52 ms /    87 tokens (   98.52 ms per token,    10.15 tokens per second)
      total time =    8972.30 ms /   106 tokens
slot      release: id  1 | task 94344 | stop processing: n_tokens = 11783, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-native
slot get_availabl: id  1 | task -1 | selected slot by LCP similarity, sim_best = 0.986 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  1 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> ?top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  1 | task 94433 | processing task, is_child = 0
slot update_slots: id  1 | task 94433 | new prompt, n_ctx_slot = 65536, n_keep = 0, task.n_tokens = 11956
slot update_slots: id  1 | task 94433 | n_tokens = 11783, memory_seq_rm [11783, end)
slot update_slots: id  1 | task 94433 | prompt processing progress, n_tokens = 11952, batch.n_tokens = 169, progress = 0.999665
slot create_check: id  1 | task 94433 | created context checkpoint 14 of 64 (pos_min = 11782, pos_max = 11782, n_tokens = 11783, size = 149.626 MiB)
slot update_slots: id  1 | task 94433 | n_tokens = 11952, memory_seq_rm [11952, end)
reasoning-budget: activated, budget=2147483647 tokens
reasoning-budget: deactivated (natural end)
reasoning-budget: re-activated on new start tag, budget=2147483647 tokens
reasoning-budget: deactivated (natural end)
reasoning-budget: re-activated on new start tag, budget=2147483647 tokens
reasoning-budget: deactivated (natural end)
reasoning-budget: re-activated on new start tag, budget=2147483647 tokens
reasoning-budget: deactivated (natural end)
reasoning-budget: re-activated on new start tag, budget=2147483647 tokens
reasoning-budget: deactivated (natural end)
reasoning-budget: re-activated on new start tag, budget=2147483647 tokens
reasoning-budget: deactivated (natural end)
reasoning-budget: re-activated on new start tag, budget=2147483647 tokens
reasoning-budget: deactivated (natural end)
reasoning-budget: re-activated on new start tag, budget=2147483647 tokens
reasoning-budget: deactivated (natural end)
reasoning-budget: re-activated on new start tag, budget=2147483647 tokens
reasoning-budget: deactivated (natural end)
reasoning-budget: re-activated on new start tag, budget=2147483647 tokens
reasoning-budget: deactivated (natural end)
reasoning-budget: re-activated on new start tag, budget=2147483647 tokens
reasoning-budget: deactivated (natural end)
reasoning-budget: re-activated on new start tag, budget=2147483647 tokens
slot init_sampler: id  1 | task 94433 | init sampler, took 1.15 ms, tokens: text = 11956, total = 11956
slot update_slots: id  1 | task 94433 | prompt processing done, n_tokens = 11956, batch.n_tokens = 4
slot create_check: id  1 | task 94433 | created context checkpoint 15 of 64 (pos_min = 11951, pos_max = 11951, n_tokens = 11952, size = 149.626 MiB)
srv  log_server_r: done request: POST /v1/chat/completions 127.0.0.1 200
reasoning-budget: deactivated (natural end)
slot print_timing: id  1 | task 94433 |
prompt eval time =    1289.55 ms /   173 tokens (    7.45 ms per token,   134.16 tokens per second)
       eval time =   17754.35 ms /   179 tokens (   99.19 ms per token,    10.08 tokens per second)
      total time =   19043.89 ms /   352 tokens
slot      release: id  1 | task 94433 | stop processing: n_tokens = 12134, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-native
slot get_availabl: id  0 | task -1 | selected slot by LRU, t_last = 65026105309
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> ?top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 94614 | processing task, is_child = 0
slot update_slots: id  0 | task 94614 | new prompt, n_ctx_slot = 65536, n_keep = 0, task.n_tokens = 86082
srv    send_error: task id = 94614, error: request (86082 tokens) exceeds the available context size (65536 tokens), try increasing it
slot      release: id  0 | task 94614 | stop processing: n_tokens = 12947, truncated = 0
srv          stop: cancel task, id_task = 94614
srv  update_slots: no tokens to decode
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/chat/completions 127.0.0.1 400
srv  params_from_: Chat format: peg-native
slot get_availabl: id  1 | task -1 | selected slot by LRU, t_last = 65147626039
slot launch_slot_: id  1 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> ?top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  1 | task 94617 | processing task, is_child = 0
slot update_slots: id  1 | task 94617 | new prompt, n_ctx_slot = 65536, n_keep = 0, task.n_tokens = 53564
slot update_slots: id  1 | task 94617 | n_past = 3, slot.prompt.tokens.size() = 12134, seq_id = 1, pos_min = 12133, n_swa = 0
slot update_slots: id  1 | task 94617 | Checking checkpoint with [11951, 11951] against 3...
slot update_slots: id  1 | task 94617 | Checking checkpoint with [11782, 11782] against 3...
slot update_slots: id  1 | task 94617 | Checking checkpoint with [11677, 11677] against 3...
slot update_slots: id  1 | task 94617 | Checking checkpoint with [11497, 11497] against 3...
slot update_slots: id  1 | task 94617 | Checking checkpoint with [10989, 10989] against 3...
slot update_slots: id  1 | task 94617 | Checking checkpoint with [10116, 10116] against 3...
slot update_slots: id  1 | task 94617 | Checking checkpoint with [9608, 9608] against 3...
slot update_slots: id  1 | task 94617 | Checking checkpoint with [9282, 9282] against 3...
slot update_slots: id  1 | task 94617 | Checking checkpoint with [8886, 8886] against 3...
slot update_slots: id  1 | task 94617 | Checking checkpoint with [8590, 8590] against 3...
slot update_slots: id  1 | task 94617 | Checking checkpoint with [8447, 8447] against 3...
slot update_slots: id  1 | task 94617 | Checking checkpoint with [8249, 8249] against 3...
slot update_slots: id  1 | task 94617 | Checking checkpoint with [8162, 8162] against 3...
slot update_slots: id  1 | task 94617 | Checking checkpoint with [7909, 7909] against 3...
slot update_slots: id  1 | task 94617 | Checking checkpoint with [7401, 7401] against 3...
slot update_slots: id  1 | task 94617 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
slot update_slots: id  1 | task 94617 | erased invalidated context checkpoint (pos_min = 7401, pos_max = 7401, n_tokens = 7402, n_swa = 0, pos_next = 0, size = 149.626 MiB)
slot update_slots: id  1 | task 94617 | erased invalidated context checkpoint (pos_min = 7909, pos_max = 7909, n_tokens = 7910, n_swa = 0, pos_next = 0, size = 149.626 MiB)
slot update_slots: id  1 | task 94617 | erased invalidated context checkpoint (pos_min = 8162, pos_max = 8162, n_tokens = 8163, n_swa = 0, pos_next = 0, size = 149.626 MiB)
slot update_slots: id  1 | task 94617 | erased invalidated context checkpoint (pos_min = 8249, pos_max = 8249, n_tokens = 8250, n_swa = 0, pos_next = 0, size = 149.626 MiB)
slot update_slots: id  1 | task 94617 | erased invalidated context checkpoint (pos_min = 8447, pos_max = 8447, n_tokens = 8448, n_swa = 0, pos_next = 0, size = 149.626 MiB)
slot update_slots: id  1 | task 94617 | erased invalidated context checkpoint (pos_min = 8590, pos_max = 8590, n_tokens = 8591, n_swa = 0, pos_next = 0, size = 149.626 MiB)
slot update_slots: id  1 | task 94617 | erased invalidated context checkpoint (pos_min = 8886, pos_max = 8886, n_tokens = 8887, n_swa = 0, pos_next = 0, size = 149.626 MiB)
slot update_slots: id  1 | task 94617 | erased invalidated context checkpoint (pos_min = 9282, pos_max = 9282, n_tokens = 9283, n_swa = 0, pos_next = 0, size = 149.626 MiB)
slot update_slots: id  1 | task 94617 | erased invalidated context checkpoint (pos_min = 9608, pos_max = 9608, n_tokens = 9609, n_swa = 0, pos_next = 0, size = 149.626 MiB)
slot update_slots: id  1 | task 94617 | erased invalidated context checkpoint (pos_min = 10116, pos_max = 10116, n_tokens = 10117, n_swa = 0, pos_next = 0, size = 149.626 MiB)
slot update_slots: id  1 | task 94617 | erased invalidated context checkpoint (pos_min = 10989, pos_max = 10989, n_tokens = 10990, n_swa = 0, pos_next = 0, size = 149.626 MiB)
slot update_slots: id  1 | task 94617 | erased invalidated context checkpoint (pos_min = 11497, pos_max = 11497, n_tokens = 11498, n_swa = 0, pos_next = 0, size = 149.626 MiB)
slot update_slots: id  1 | task 94617 | erased invalidated context checkpoint (pos_min = 11677, pos_max = 11677, n_tokens = 11678, n_swa = 0, pos_next = 0, size = 149.626 MiB)
slot update_slots: id  1 | task 94617 | erased invalidated context checkpoint (pos_min = 11782, pos_max = 11782, n_tokens = 11783, n_swa = 0, pos_next = 0, size = 149.626 MiB)
slot update_slots: id  1 | task 94617 | erased invalidated context checkpoint (pos_min = 11951, pos_max = 11951, n_tokens = 11952, n_swa = 0, pos_next = 0, size = 149.626 MiB)
slot update_slots: id  1 | task 94617 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id  1 | task 94617 | prompt processing progress, n_tokens = 512, batch.n_tokens = 512, progress = 0.009559
slot update_slots: id  1 | task 94617 | n_tokens = 512, memory_seq_rm [512, end)
slot update_slots: id  1 | task 94617 | prompt processing progress, n_tokens = 1024, batch.n_tokens = 512, progress = 0.019117
slot update_slots: id  1 | task 94617 | n_tokens = 1024, memory_seq_rm [1024, end)
slot update_slots: id  1 | task 94617 | prompt processing progress, n_tokens = 1536, batch.n_tokens = 512, progress = 0.028676
slot update_slots: id  1 | task 94617 | n_tokens = 1536, memory_seq_rm [1536, end)
slot update_slots: id  1 | task 94617 | prompt processing progress, n_tokens = 2048, batch.n_tokens = 512, progress = 0.03823

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions