Skip to content

hypertide/LlamaFIM

Repository files navigation

LlamaFIM – Local In‑Line Completion for VS Code

Overview

LlamaFIM is a VS Code extension that provides real‑time inline completions powered by a local Llama.CPP server. It works great for developers who want an on‑premises LLM assistant without sending data to the cloud.

The extension implements the inline completion provider API introduced in VS Code 1.78 and forwards user input to a running Llama.CPP endpoint. The server returns an infill response containing the next chunk of text, which the extension then displays as an inline suggestion.

Features

  • Modern inline completion experience (no separate suggestion list).
  • Lightweight client – all heavy lifting occurs on the local Llama.CPP server.
  • Configurable debounce delay, timeout, and server URL.
  • Automatic request cancellation when the cursor moves.
  • Built‑in request timeout to avoid hanging requests.

Installation

  1. Install the extension from the VS Code Marketplace or copy the repository into a folder and use code . to open it.
  2. Ensure a Llama.CPP server is running and reachable. By default the extension expects the endpoint at http://localhost:8080. The server can be started with:
    ./llama.cpp/main -m <model.gguf> --port 8080
  3. Reload VS Code or run Reload Window.

Configuration

The extension exposes a handful of workspace settings under the namespace llamafim. Open settings.json and add any of the following options:

Setting Type Default Description
enabled boolean true Disable the extension entirely.
debouncedelay number 250 Milliseconds to debounce inline completion requests.
url string http://localhost:8080 Base URL of the Llama.CPP server (without /infill).
timeout number 3500 Request timeout in milliseconds.
contextsize number 4096 Maximum context size (in tokens) sent to the Llama.CPP server.

Example:

{
    "llamafim.enabled": true,
    "llamafim.debouncedelay": 200,
    "llamafim.url": "http://127.0.0.1:8080",
    "llamafim.timeout": 5000,
    "llamafim.contextsize": 4096
}

Usage

Once configured, simply type in any file. After you pause for the debounce delay, the extension will send the surrounding context to the server and display the returned text as an inline suggestion. The suggestion can be accepted by pressing Tab or rejected by continuing to type.

The provider is registered for all languages ({ pattern: '**' }).

Status bar interaction

When the extension is active a status bar item appears on the right. Clicking it toggles the provider’s enabled state – the next inline suggestion will be shown or suppressed accordingly. This toggle is runtime only; the setting llamafim.enabled only sets the initial value when VS Code starts.

Development

A quick start guide for contributing:

# Install dependencies
npm install

# Compile TypeScript
npm run compile

# Run tests (if available)
npm test

# Start a watch build while you edit
npm run watch

The project uses ESBuild for bundling and TSLint/ESLint for linting.

Files of Interest

  • src/extension.ts – Entry point, registers the provider.
  • src/provider.ts – Implements request logic and cancellation.
  • src/config.ts – Reads and normalises VS Code settings.
  • src/defs.ts – Type definitions for the Llama.CPP response.

Testing

The test suite lives under test/. It uses Mocha and chai. To run the tests:

npm test

The current tests cover configuration parsing and inline completion logic with mocked fetch responses.

Contributing

Pull requests are welcome! Please:

  1. Fork the repository.
  2. Create a feature branch.
  3. Run the test suite and ensure all tests pass.
  4. Submit a pull request.

Before submitting, run the linter:

npm run lint

License

This project is licensed under the MIT License. See the file for details.

Acknowledgements


Tip – If you experience performance issues, consider lowering n_predict or increasing debouncedelay in the settings.

About

Inline completion (Fill-In-the-Middle) extension for Visual Studio Code with Llama.CPP backend

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors