Pollux

Multimodal orchestration for LLM APIs.

You describe what to analyze. Pollux handles source patterns, context caching, deferred delivery, and multimodal content.

Documentation · Getting Started · Building With Deferred Delivery

Quick Start

import asyncio
from pollux import Config, Source, run

result = asyncio.run(
    run(
        "What are the key findings and their implications?",
        source=Source.from_file("earnings-report.pdf"),
        config=Config(provider="gemini", model="gemini-2.5-flash-lite"),
    )
)
print(result.text)
# Revenue grew 18% YoY to $4.2B, driven by cloud services. Operating
# margins improved from 29% to 34%. Management's $2B buyback and raised
# guidance signal confidence in sustained growth.

run() returns an Output: result.text is the answer, with result.structured, result.usage, and other facets alongside it.

To use OpenAI instead: Config(provider="openai", model="gpt-5-nano").
For Anthropic: Config(provider="anthropic", model="claude-haiku-4-5").
For OpenRouter: Config(provider="openrouter", model="google/gemma-3-27b-it:free").
For a self-hosted OpenAI-compatible server (text, image, and audio): Config(provider="local", model="gemma3:4b", base_url="http://localhost:11434/v1"). Single-model servers can omit model.

For a full walkthrough (install, key setup, first result), see Getting Started.

Which Entry Point Should I Use?

If you want to...	Use
Ask one prompt and get an answer now	`run()`
Ask many prompts against shared source(s)	`run_many()`
Hold a multi-turn thread or run a tool-using agent loop	`interact()` / `Session`
Submit non-urgent work and collect it later	`defer()`

Pollux keeps realtime and deferred work on separate entry points. If the result can wait, submit it once, persist the handle, and collect the same ResultEnvelope later.

What Pollux Handles

Say you have a document and ten questions about it. Without orchestration, each API call re-uploads the file, and your code has to manage caching, retries, and concurrency. Pollux uploads once, caches the content when the provider supports it, fans out your prompts concurrently, and hands back results.

The same Source interface handles PDFs, images, video, YouTube URLs, and arXiv papers. Your code does not need per-format upload branches. Gemini-specific video clipping and FPS controls are available via Source.with_gemini_video_settings(...); see the sending-content docs for the intended scope.

Need structured output? Pass a Pydantic model as output and get a validated instance alongside the raw text. Switching providers is a config change: provider="gemini" to provider="openai".

One Upload, Many Prompts

Got three questions about the same paper? run_many() fans them out concurrently:

import asyncio
from pollux import Config, Source, run_many

envelope = asyncio.run(
    run_many(
        ["Summarize the methodology.", "List key findings.", "Identify limitations."],
        sources=[Source.from_file("paper.pdf")],
        config=Config(provider="gemini", model="gemini-2.5-flash-lite"),
    )
)
for answer in envelope.answers:
    print(answer)

run_many() returns an OutputCollection: answers is the per-prompt text in input order, with outputs, structured, usage, and status alongside it.

Add more sources when each prompt should see the same shared context. For per-file collection work, wrap run_many() in your own outer loop over files; that gives you one result record per file while Pollux handles each file's prompt set.

Multi-Turn Threads and Agent Loops

When you need conversation history or tool calls, use interact() over an Environment and Input. A Session reuses one provider across turns:

import asyncio
from pollux import Config, Environment, Input, Session

async def main():
    env = Environment(instructions="Be concise.")
    async with Session(Config(provider="anthropic", model="claude-haiku-4-5")) as session:
        first = await session.interact(env, Input("Name a primary color."))
        print(first.text)

        # Continue the same thread from the prior turn's continuation:
        second = await session.interact(
            env, Input("Now name its complement.", continuation=first.continuation)
        )
        print(second.text)

asyncio.run(main())

Output.tool_calls exposes any tools the model wants to run; return their results on the next Input to drive an agent loop. For tool declarations, dispatch, and streaming, see Building an Agent Loop.

When the Work Can Wait

Deferred delivery is for long fan-out work, backfills, and scheduled analysis where no one is waiting on the answer in the current process.

import asyncio
from pollux import (
    Config,
    Source,
    collect_deferred,
    defer,
    inspect_deferred,
)

config = Config(provider="openai", model="gpt-5-nano")

handle = asyncio.run(
    defer(
        "Summarize the report in five bullets.",
        source=Source.from_file("market-report.pdf"),
        config=config,
    )
)

snapshot = asyncio.run(inspect_deferred(handle))
if snapshot.is_terminal:
    result = asyncio.run(collect_deferred(handle))
    print(result.answers[0])

In production code, persist handle.to_dict() and restore it later with DeferredHandle.from_dict(...). For the full lifecycle, read Submitting Work for Later Collection and Building With Deferred Delivery.

Where Pollux Ends

Pollux owns content delivery, context caching, and provider translation. Prompt design, workflow orchestration, and what you do with results are yours. See Core Concepts for the full boundary model.

Installation

pip install pollux-ai

Set your provider's API key:

export GEMINI_API_KEY="your-key-here"     # or
export OPENAI_API_KEY="your-key-here"     # or
export ANTHROPIC_API_KEY="your-key-here"  # or
export OPENROUTER_API_KEY="your-key-here"

Keys from: Google AI Studio · OpenAI · Anthropic · OpenRouter

For provider="local", no API key is required; point base_url (or POLLUX_LOCAL_BASE_URL) at a self-hosted OpenAI-compatible server.

Documentation

Getting Started: first result in 2 minutes
Core Concepts: mental model and vocabulary
Building an Agent Loop: multi-turn threads, tool calls, and streaming
Submitting Work for Later Collection: deferred lifecycle API
Migrating to 2.0: moving from the v1 API
Building With Deferred Delivery: when deferred is worth it
API Reference: entry points and types
Cookbook: runnable end-to-end recipes

Full docs at polluxlib.dev.

Contributing

See CONTRIBUTING and TESTING.md for guidelines.

Built during Google Summer of Code 2025 with Google DeepMind. Learn more

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 203 Commits
.github		.github
.vscode		.vscode
cookbook		cookbook
docs		docs
hooks		hooks
scripts		scripts
src/pollux		src/pollux
templates		templates
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md
TESTING.md		TESTING.md
codecov.yml		codecov.yml
justfile		justfile
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pollux

Quick Start

Which Entry Point Should I Use?

What Pollux Handles

One Upload, Many Prompts

Multi-Turn Threads and Agent Loops

When the Work Can Wait

Where Pollux Ends

Installation

Documentation

Contributing

License

About

Uh oh!

Releases 15

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Pollux

Quick Start

Which Entry Point Should I Use?

What Pollux Handles

One Upload, Many Prompts

Multi-Turn Threads and Agent Loops

When the Work Can Wait

Where Pollux Ends

Installation

Documentation

Contributing

License

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 15

Contributors

Uh oh!

Languages