GitHub - kyotoai/SEIMEI: Search-Enhanced Interface for Multi-Expertise Integration (SEIMEI): Realtime-Knowledge-Update AI System with Intelligent Search

SEIMEI

Search-Enhanced Interface for Multi-Expertise Integration

Unlike conventional RL that only optimizes knowledge inside the LLM, SEIMEI jointly optimizes external knowledge, enabling AI to truly absorb domain-specific and tacit expertise. Build much more personalized AI trained only for you with dramatically lower cost and higher adaptability!!
Explore the docs »

View Demo · Report Bug · Request Feature

Table of Contents

About The Project
Quick Start
Usage A. Integrate your own knowledge
Usage B. Train Reward Model To Optimize Knowledge
Usage C. CLI Chat
Contributing
License
Contact
Acknowledgments

About The Project

Search The Best Knowledge For Accurate Thought

Here's the example of how SEIMEI works. Each agent interacts with LLM and document and makes inference. These inferences are automatically integrated by search engine and gives an answer of question.

(back to top)

Quick Start

Installation

You can install SEIMEI using git clone the library

git clone https://github.com/kyotoai/SEIMEI.git
pip install -e SEIMEI/

Set API key

Get KyotoAI API key from https://kyotoai.net
Run

export KYOTOAI_API_KEY="(your_kyotoai_api_key)"

Run SEIMEI

In CLI app

Open seimei terminal app inside your project directory by

seimei

and start asking question.

python code

import asyncio
from seimei import seimei

async def demo_code_act():
    orchestrator = seimei(
        llm_config={"model": "gpt-5-nano"},
        max_tokens_per_question=30000,
    )

    result = await orchestrator(
        messages=[
            {"role": "user", "content": "Analyze the current directory and change."},
        ],
    )

asyncio.run(demo_code_act())

(back to top)

Usage A. Integrate your own knowledge

Overview

Prepare a knowledge file.
Create a CSV with reusable hints for each agent (think, code_act, answer, web_search, or * for all agents).
This becomes your portable memory layer that can be reused across runs.
Run SEIMEI with knowledge loading rules.
Pass knowledge_load_config to load CSV/JSON/JSONL files and to inject inline, step-specific hints.
This lets you control both what knowledge is injected and when it is used.
(Optional) Accumulate new knowledge automatically.
Enable knowledge_generate_config to append run retrospectives into your CSV after each run.
The newly generated rows are returned in the response and immediately reusable.

1. Prepare your knowledge file

Create seimei_knowledge/knowledge.csv (minimum columns: agent, knowledge).

agent,knowledge,tags,step,id
code_act,"Prefer rg before grep when scanning large repos","[\"search\",\"shell\"]",,101
think,"Before choosing next action, summarize the last 2 agent findings in one sentence","[\"planning\"]",">=2",102
answer,"End with a short numbered next-step list when uncertainty remains","[\"response\"]",,103
*,"Always verify file paths before proposing edits","[\"safety\"]",,104

agent: target agent name (* means all agents).
knowledge: guidance text injected into that agent.
tags (optional): JSON list or comma-separated string.
step (optional): step constraint like 2, >=2, <4, or >=1,<=3.
id (optional): stable identifier for tracking and updates.

You can also bootstrap entries with the built-in generator:

python3 -m seimei.knowledge.generate_from_generators \
  --count 25 \
  --output seimei_knowledge/knowledge.csv

2. Run SEIMEI with knowledge loading

import asyncio
from seimei import seimei

async def main():
    orchestrator = seimei(
        llm_config={"model": "gpt-5-nano"},
        allow_code_exec=True,
        max_tokens_per_question=30000,
    )

    result = await orchestrator(
        messages=[
            {"role": "user", "content": "Inspect this repo and suggest a safe cleanup plan."},
        ],
        knowledge_load_config=[
            {"load_knowledge_path": "seimei_knowledge/knowledge.csv"},
            {
                "step": [1, 2],
                "agent": "code_act",
                "text": "Run read-only commands first (pwd, ls, rg) before any edits.",
                "tags": ["safety", "planning"],
            },
            {
                "step": 3,
                "agent": ["think", "answer"],
                "text": "Explicitly list unresolved uncertainties before finalizing.",
                "tags": ["quality"],
            },
        ],
    )
    print(result["output"])

asyncio.run(main())

3. Automatic knowledge accumulation (optional)

Provide knowledge_generate_config when calling the orchestrator to append run retrospectives into a CSV knowledge base:

result = await orchestrator(
    messages=[{"role": "user", "content": "Find clever ways to speed up our ETL pipeline."}],
    knowledge_generate_config={
        "save_knowledge_path": "seimei_knowledge/knowledge.csv",
        "knowledge_generation_prompt_path": "seimei/knowledge/prompts/generate_from_runs.md",
    },
    knowledge_load_config=[
        {"load_knowledge_path": "seimei_knowledge/knowledge.csv"},
    ],
)

The helper seimei.knowledge.generate_from_runs analyses the newly created run directory under seimei_runs/ and appends JSON-normalized rows to the CSV (creating it on first use). The orchestrator reloads the knowledge store so subsequent runs benefit from the fresh guidance. The default retrospection prompt lives at seimei/knowledge/prompts/generate_from_runs.md, but you can point knowledge_generation_prompt_path at an alternative such as seimei/knowledge/prompts/excel.md for domain-specific guidance.

Whenever the generator runs, seimei.__call__ includes both a knowledge_result block (metadata, file paths, usage) and a generated_knowledge list that mirrors the rows added to disk:

if result.get("generated_knowledge"):
    for entry in result["generated_knowledge"]:
        print(f"[{entry['agent']}] {entry['knowledge']} (tags={entry.get('tags', [])})")

This makes it easy to review new heuristics right in your notebook or CLI before they are reused in later runs.

(back to top)

Usage B. Train Reward Model To Optimize Knowledge

Overview

Run inference sampling and scoring.
Use seimei/train/sampling.py to execute repeated no-knowledge vs knowledge-enabled trials and save scored runs.
This creates the base results file used by downstream conversion and training.
Convert results into training dataset files.
Use seimei/train/dataset_converter1.py and seimei/train/dataset_converter2.py to transform sampled runs into dataset_list train/test JSON files.
These files match the input schema expected by reward-model trainers.
Train the reward model.
Launch seimei.train.adpo_lora_rmtrain (or grpo_lora_rmtrain) on the converted dataset list files.
Checkpoints are saved to your output directory and can be deployed behind RMSearch.
Evaluate with the trained reward model.
Re-run sampling using rm_url + klg_sample_mode="rm" and compare summary metrics against baseline sampling output.
This directly measures whether trained retrieval ranking improves final task scores.

1. Inferences sampling

Sampling is a Python API (not a CLI entrypoint), so run it from a small script:

from pathlib import Path
from seimei.train.sampling import Sampling

runner = Sampling(
    dataset_path=Path("exp11_plasma_gkv_v5/dataset.json"),
    output_path=Path("exp11_plasma_gkv_v5/train_v6_results.json"),
    llm_model_name="/workspace/gpt-oss-20b",
    llm_url="https://your-llm-endpoint/v1",
    rm_url=None,           # baseline retrieval
    klg_sample_mode="llm", # knowledge search mode during sampling
    n_no_klg_trials=3,
    n_klg_trials=7,
)
runner.run()

2. Data conversion

Convert the sampling output with the train converters:

python3 seimei/train/dataset_converter1.py \
  --input-path exp11_plasma_gkv_v5/train_v6_results.json \
  --output-path exp11_plasma_gkv_v5/train_v6_results_converted.json \
  --dataset-path exp11_plasma_gkv_v5/dataset.json

python3 seimei/train/dataset_converter2.py \
  --input-path exp11_plasma_gkv_v5/train_v6_results_converted.json \
  --output-path-train exp11_plasma_gkv_v5/train_v6_datasetlist_train.json \
  --output-path-test exp11_plasma_gkv_v5/train_v6_datasetlist_test.json \
  --n-batch-elements 10 \
  --test-ratio 0.1

3. Train reward model

accelerate launch --config_file ./accelerate_config.yaml \
  -m seimei.train.adpo_lora_rmtrain \
  --dataset-list-train ./exp11_plasma_gkv_v5/train_v6_datasetlist_train.json \
  --dataset-list-test ./exp11_plasma_gkv_v5/train_v6_datasetlist_test.json \
  --model-name /workspace/qwen4b-reward \
  --output-dir ./exp11_plasma_gkv_v5/model_adpo

Optional alternative:

accelerate launch --config_file ./accelerate_config.yaml \
  -m seimei.train.grpo_lora_rmtrain \
  --dataset-list-train ./exp11_plasma_gkv_v5/train_v6_datasetlist_train.json \
  --dataset-list-test ./exp11_plasma_gkv_v5/train_v6_datasetlist_test.json \
  --model-name /workspace/qwen4b-reward \
  --output-dir ./exp11_plasma_gkv_v5/model_grpo

4. Evaluate your model

Run sampling again with the trained RMSearch endpoint:

from pathlib import Path
from seimei.train.sampling import Sampling

runner = Sampling(
    dataset_path=Path("exp11_plasma_gkv_v5/dataset_test.json"),
    output_path=Path("exp11_plasma_gkv_v5/train_v6_results_eval_rm.json"),
    llm_model_name="/workspace/gpt-oss-20b",
    llm_url="https://your-llm-endpoint/v1",
    rm_url="http://127.0.0.1:8000/rmsearch",  # your deployed trained RM endpoint
    klg_sample_mode="rm",
)
runner.run()

Then compare the summary blocks in baseline vs RM-enabled result files (for example klg_overall_mean, overall_mean_score_improvement, and win/loss/tie fields).

(back to top)

Usage C. CLI Chat

Prefer to experiment directly from the terminal? Install SEIMEI (pip install -e . inside this repo) and run:

seimei

The CLI spins up the same orchestrator configuration shown above (code-act agent, gpt-5-nano, code execution enabled) and keeps knowledge loading/saving turned on by default (seimei_knowledge/excel.csv with prompt seimei/knowledge/prompts/excel.md). Every turn streams the agent logs live, clears them once an answer is ready, and redraws the transcript so you see a clean you → SEIMEI exchange.

All defaults (model, agent file, knowledge paths, banners, limits, etc.) sit at the top of seimei/cli.py, so you can tweak them without touching the CLI logic. Flags such as --model, --knowledge-file, or --no-knowledge are also available if you prefer overriding values at runtime.

(back to top)

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

Top contributors:

(back to top)

License

Distributed under the Apache-2.0 License. See LICENSE.txt for more information.

(back to top)

Contact

KyotoAI Inc. - office@kyotoai.org
KyotoAI homepage: https://kyotoai.net
Project Link: https://github.com/kyotoai/SEIMEI

(back to top)

Acknowledgments

SEIMEI exists because of KyotoAI collaborators who kept pushing ideas into working systems, from research framing to production-grade implementation and evaluation. Thank you for the honest feedback loops, fast iteration, and deep domain discussions that shaped this project. This repository is the result of your continuous engineering and research partnership.

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 489 Commits
demo		demo
examples		examples
images		images
seimei		seimei
seimei_dataset		seimei_dataset
seimei_knowledge		seimei_knowledge
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
requirements_developer.txt		requirements_developer.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SEIMEI

About The Project

Search The Best Knowledge For Accurate Thought

Quick Start

Installation

Set API key

Run SEIMEI

In CLI app

python code

Usage A. Integrate your own knowledge

Overview

1. Prepare your knowledge file

2. Run SEIMEI with knowledge loading

3. Automatic knowledge accumulation (optional)

Usage B. Train Reward Model To Optimize Knowledge

Overview

1. Inferences sampling

2. Data conversion

3. Train reward model

4. Evaluate your model

Usage C. CLI Chat

Contributing

Top contributors:

License

Contact

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SEIMEI

About The Project

Search The Best Knowledge For Accurate Thought

Quick Start

Installation

Set API key

Run SEIMEI

In CLI app

python code

Usage A. Integrate your own knowledge

Overview

1. Prepare your knowledge file

2. Run SEIMEI with knowledge loading

3. Automatic knowledge accumulation (optional)

Usage B. Train Reward Model To Optimize Knowledge

Overview

1. Inferences sampling

2. Data conversion

3. Train reward model

4. Evaluate your model

Usage C. CLI Chat

Contributing

Top contributors:

License

Contact

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages