Lookup capabilities (context window, modalities, supported features) of various LLM models — fully offline by default.
- Comprehensive Bundled Data: Offline capability data for OpenAI, Anthropic, Google (Gemini), Microsoft (Phi), Amazon (Nova/Titan), Meta (Llama), Mistral, Qwen, DeepSeek, NVIDIA, and Japanese domestic models (NTT tsuzumi, PFN PLaMo, ELYZA, etc. adopted by the Digital Agency's "GENNAI" platform).
- Zero Runtime Dependencies: Built entirely on the Python standard library.
- Alias Resolution: Automatically resolves aliases and provider-specific names (e.g.,
gpt-4o-2024-08-06->gpt-4o,gemini-1.5-pro-preview-0409->gemini-1.5-pro). - Advanced Feature Queries: Check support for
vision,multimodal,chat_completion,responses_api,reasoning_effort,thinking_budget, and specific input/output modalities (e.g.,image_input,image_output,audio_input). - High Performance: Evaluated feature checks are cached internally using memoization to avoid redundant calculations.
- Cost Estimation: Estimate API costs based on input and output token counts.
- Drop-in Replacement Checker: Check if a model can be safely replaced by another model based on context window and required features.
- Tokenizer Mapping: Access tokenizer names (e.g.,
o200k_base) directly from model capabilities. - Extendable: Load your own local JSON model definitions.
- CLI Included: Query and list model capabilities directly from your terminal.
pip install llmcapaOr from source:
pip install .import llmcapa
# Get model capabilities (case-insensitive, alias-resolved)
cap = llmcapa.get("gpt-4o")
print(cap.context_window) # 128000
print(cap.max_output_tokens) # 16384
print(cap.tokenizer_name) # "o200k_base"
# Check feature support (using strings or Feature enum)
from llmcapa import Feature, ReasoningEffort
print(cap.supports(Feature.LLMC_FEAT_VISION)) # True
print(cap.supports(Feature.LLMC_FEAT_RESPONSES_API)) # True
print(cap.supports(Feature.LLMC_FEAT_REASONING_EFFORT)) # False
# Use ReasoningEffort enum for models supporting reasoning_effort
print(ReasoningEffort.LLMC_EFFORT_HIGH) # "high"
# List all supported features
print(cap.features())
# ['chat_completion', 'function_calling', 'image', 'image_input', 'image_output', 'json_mode', 'multimodal', 'responses_api', 'streaming', 'text', 'text_input', 'text_output', 'vision']Roughly estimate the number of tokens for a given text (supporting 30+ major languages) and calculate API costs:
Note
Token estimation is a lightweight, offline approximation. For exact token counts, please use the official APIs or dedicated tokenizers from each provider.
gpt = llmcapa.get("gpt-4o")
# Estimate tokens for multilingual text
# If `tiktoken` is installed, it dynamically uses it for exact OpenAI token counts.
# Otherwise, it falls back to a highly-optimized, standard-library-only estimation.
text = "Hello world! こんにちは世界。"
tokens = gpt.estimate_tokens(text)
print(tokens) # 10 (estimated tokens)
# Estimate API costs based on token counts (returns cost and currency)
res = gpt.estimate_cost(input_tokens=1500, output_tokens=500)
print(res) # {'cost': 0.00875, 'currency': 'USD'}Check if a model can be safely replaced by another model. The replacement model must have a context window at least as large as the target model and support all required features.
gpt4o = llmcapa.get("gpt-4o")
gpt4o_mini = llmcapa.get("gpt-4o-mini")
gemini = llmcapa.get("gemini-3.5-flash")
# gpt-4o-mini has the same context window but lacks image_output (which gpt-4o supports)
print(gpt4o.can_be_replaced_by(gpt4o_mini)) # False
# gemini-3.5-flash has a larger context window but also lacks image_output
print(gpt4o.can_be_replaced_by(gemini)) # False
# If we only require vision and function_calling, gemini-3.5-flash can replace gpt-4o
print(gpt4o.can_be_replaced_by(gemini, required_features=["vision", "function_calling"])) # TrueYou can check specific input/output modalities or general multimodal support using Feature enum:
from llmcapa import Feature
gemini = llmcapa.get("gemini-3.5-flash")
print(gemini.supports(Feature.LLMC_FEAT_MULTIMODAL)) # True (supports multiple modalities)
print(gemini.supports(Feature.LLMC_FEAT_AUDIO_INPUT)) # True
print(gemini.supports(Feature.LLMC_FEAT_IMAGE_OUTPUT)) # FalseDifferentiate between OpenAI-style reasoning_effort and Anthropic-style thinking_budget using Feature enum:
from llmcapa import Feature
o1 = llmcapa.get("o1")
print(o1.supports(Feature.LLMC_FEAT_REASONING_EFFORT)) # True
print(o1.supports(Feature.LLMC_FEAT_THINKING_BUDGET)) # False
claude = llmcapa.get("claude-3-7-sonnet")
print(claude.supports(Feature.LLMC_FEAT_REASONING_EFFORT)) # False
print(claude.supports(Feature.LLMC_FEAT_THINKING_BUDGET)) # True# List all models for a specific provider
for c in llmcapa.list_models(provider="anthropic"):
print(c.model_id, c.context_window)
# Search models by capability criteria
big_reasoning_models = llmcapa.find(
supports_reasoning=True,
min_context_window=200000
)To update model data or fetch the latest pricing, you can optionally fetch and register models from the OpenRouter API on-demand using fetch_openrouter(). The response is cached locally in ~/.llmcapa/openrouter_cache.json and automatically loaded on subsequent imports, keeping the library fully offline during regular usage.
# Fetch and register OpenRouter models dynamically
count = llmcapa.fetch_openrouter()
print(f"Registered {count} models from OpenRouter!")
# Lookup using OpenRouter model ID
cap = llmcapa.get("meta-llama/llama-3.3-70b-instruct")
print(cap.context_window) # 131072
print(cap.pricing) # {'input_per_1m': 0.1, 'output_per_1m': 0.32, 'currency': 'USD'}Load your own model definitions from a local JSON file:
llmcapa.load_extra("my_models.json")my_models.json format:
{
"models": [
{
"provider": "local",
"model_id": "my-custom-model",
"context_window": 32768,
"max_output_tokens": 4096,
"supports_function_calling": true,
"aliases": ["my-model-latest"]
}
]
}For details on how to extend the library, add new providers, or implement new feature flags, please refer to the DEVELOP.md guide.
# Show capabilities of a specific model
llmcapa show gpt-4o
llmcapa show gpt-4o --json
# List all known models
llmcapa list
llmcapa list --provider google
llmcapa list --json --no-deprecated
# List all known providers
llmcapa providers
# Explicitly fetch and update the OpenRouter models cache (forces cache refresh)
llmcapa update- Static Snapshot: Bundled capability data is a static snapshot. While we strive to keep it updated with the latest models (including GPT-5.5, Claude Fable, Gemini 3.5, DeepSeek V4, etc.), providers change limits and pricing frequently. Use
fetch_openrouter()or verify with official documentation when absolute accuracy is critical.
Apache License 2.0