Seekr.

Seekr is a minimal search engine based on ~3 million websites.

Components

Seekr consists of several core subsystems working together:

Inverted Index (CPP, lazy-loaded) — a high-performance inverted index implemented in C++ that loads posting lists on demand from binary files.
Boolean Query Engine — supports AND/OR/NOT evaluation over posting lists.
Phrase Query Engine — supports ordered phrase matching.
Positional Phrase Queries — supports exact positional constraints using positional indexes.
Top-N Scoring — ranks documents using a custom scoring pipeline to retrieve the top-N most relevant results.
Spell Correction (ML) — ML-based correction and normalization of search queries.
Snippeting — extracts meaningful snippets from documents based on matched terms.
ML-based Ranking — integrates machine-learning models for ranking refinement.
Python Core Logic — the main orchestration logic is implemented in Python.
CPP Bindings via Pybind — performance-critical components are exposed to Python through pybind for fast execution.

Prerequisites

uv
Node.js
npm
Docker (for containerized integration/unit tests & deployment)
CMake (building and compiling the CPP components)
Just (command runner)

Entrypoints

Docker

just deploy

Automated build process. Will download the dataset and build the index if it does not exist yet. This preprocessing can take up to 2 hours.

Afterwards, it spins up a frontend and a backend container.

Access search engine frontend via http://localhost:8080.
API-only: http://localhost:8000.

Search Endpoint

GET /search

Query parameters:

Parameter Type Description

q string Search query (1–50 characters)

limit int Maximum number of results (1–500, default: 10)

Manual usage

Download the dataset:

cd src && uv run --project backend python -m backend.search_engine.scripts.download_dataset

Build the index with a memory limit:

just build-index <memory-limit> [<max-docs>]

Start both frontend and backend in dev environment (http://localhost:8080):

just local [<uvicorn-args>]

Query Search Engine from CLI:

cd src && uv run --project backend python -m backend.search_engine.scripts.query --query <query> --limit <limit>

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.deprecated		.deprecated
.github		.github
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
deploy.sh		deploy.sh
docker-compose.yml		docker-compose.yml
justfile		justfile
local.sh		local.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Seekr.

Components

Prerequisites

Entrypoints

Docker

Manual usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Parameter	Type	Description
`q`	string	Search query (1–50 characters)
`limit`	int	Maximum number of results (1–500, default: 10)

Folders and files

Latest commit

History

Repository files navigation

Seekr.

Components

Prerequisites

Entrypoints

Docker

Manual usage

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages