Skip to content

Pull requests: abetlen/llama-cpp-python

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

Add chat template for gemma models
#2183 opened Apr 13, 2026 by C00kieFact0ry Loading…
perf: vectorize KV cache prefix matching with numpy
#2179 opened Apr 11, 2026 by nausicaalii Loading…
4 tasks done
build: disable soname to reduce binary size
#2177 opened Apr 9, 2026 by Bing-su Loading…
fix(ci): avoid duplicate py3-none release builds
#2172 opened Apr 3, 2026 by abetlen Owner Loading…
feat: add reasoning_effort to chat completions API
#2167 opened Mar 30, 2026 by abetlen Owner Loading…
ci: refactor cpu wheel build workflow
#2164 opened Mar 26, 2026 by Bing-su Loading…
fix: correct typo 'seperated' to 'separated'
#2120 opened Feb 7, 2026 by thecaptain789 Loading…
feat: support Granite-Docling model
#2109 opened Jan 4, 2026 by dhdaines Loading…
Update to llama.cpp 2026-01-01
#2108 opened Jan 1, 2026 by avion23 Loading…
Fixed issue #1938
#2085 opened Nov 5, 2025 by TNing Loading…
Include x64 directory for CUDA DLLs on Windows
#2083 opened Oct 24, 2025 by ajparsons Loading…
ProTip! What’s not been updated in a month: updated:<2026-03-13.