AI Imaging Agent (Imaging Plaza)

An intelligent RAG + AI agent system that helps users discover the right imaging software for their images and tasks. Upload an image, describe what you want to do, and get ranked tool recommendations with links to runnable demos.

✨ Key Features

🤖 Conversational AI Agent: Natural language interaction with multi-turn context
🔍 Smart Retrieval: BGE-M3 embeddings + FAISS + CrossEncoder reranking
👁️ Vision-Aware Selection: VLM-based tool selection considering both image content and metadata
🏥 Medical Imaging Focus: Specialized support for CT, MRI, DICOM, NIfTI, and other medical formats
🎯 Format-Aware Matching: IO compatibility scoring based on file formats and dimensions
🚀 Demo Integration: Direct execution of Gradio Space demos on your images
📊 Rich UI: Chat interface with image previews, file management, and execution traces

🚀 Quick Start

Prerequisites

Python 3.10–3.12
OpenAI API key (or compatible API endpoint)
Internet connection for model calls

Installation

# Clone the repository
git clone <your-repo-url>
cd ai-agent

# Create virtual environment
python -m venv .venv

# Activate virtual environment
# On Linux/macOS:
source .venv/bin/activate
# On Windows:
.venv\Scripts\activate

# Install the package
pip install --upgrade pip
pip install -e .

# For development (includes test dependencies)
pip install -e ".[dev]"

Configuration

Create a .env file at the repository root:

# Required: OpenAI API key
OPENAI_API_KEY=sk-xxxx

# Optional: GitHub token for repo info tool
GITHUB_TOKEN=ghp_xxxx

# Optional: Alternative model providers (EPFL, etc.)
EPFL_API_KEY=sk-xxxx
EPFL_API_KEY_EMBEDDER=sk-xxxx

# Software catalog path
SOFTWARE_CATALOG=dataset/catalog.jsonl

# Pipeline configuration
TOP_K=8                # Number of candidates to retrieve
NUM_CHOICES=3          # Number of tools to recommend
AGENT_OUTPUT_RETRIES=3 # Structured output validation retries
EMBED_CATALOG_ON_START=1  # Pre-embed catalog if FAISS is empty

# Logging configuration
LOGLEVEL_CONSOLE=WARNING
LOGLEVEL_FILE=INFO
FILE_LOG=1
LOG_DIR=logs
LOG_PROMPTS=0         # Set to 1 to save prompt snapshots for debugging

# Custom config path
CONFIG_PATH=config.yaml

Model Configuration

The agent model can be configured via config.yaml:

# AI Agent Model Configuration

# Default/fallback model (used for CLI and initial startup)
agent_model:
  name: "gpt-5.1"
  base_url: null                        # null for default OpenAI endpoint
  api_key_env: "OPENAI_API_KEY"

# Available models for UI dropdown
available_models:
  - display_name: "gpt-4o-mini"
    name: "gpt-4o-mini"
    base_url: null
    provider: "OpenAI"
    api_key_env: "OPENAI_API_KEY"
  
  - display_name: "gpt-4o"
    name: "gpt-4o"
    base_url: null
    provider: "OpenAI"
    api_key_env: "OPENAI_API_KEY"
  
  - display_name: "gpt-5-mini"
    name: "gpt-5-mini"
    base_url: null
    provider: "OpenAI"
    api_key_env: "OPENAI_API_KEY"

  - display_name: "gpt-5.1"
    name: "gpt-5.1"
    base_url: null
    provider: "OpenAI"
    api_key_env: "OPENAI_API_KEY"

retrieval:
  embedder:
    backend: "remote"   # "remote" or "local"
    model_name: "Qwen/Qwen3-Embedding-8B"
    base_url: "https://inference-rcp.epfl.ch/v1"
    api_key_env: "EPFL_API_KEY_EMBEDDER"
    timeout_s: 20
    # local example:
    # backend: "local"
    # model_name: "BAAI/bge-m3"
    # device: "cpu" # optional

  reranker:
    backend: "remote"   # "remote" or "local"
    model_name: "BAAI/bge-reranker-v2-m3"
    base_url: "https://inference-rcp.epfl.ch/v1"
    api_key_env: "EPFL_API_KEY_EMBEDDER"
    timeout_s: 20
    # local example:
    # backend: "local"
    # model_name: "BAAI/bge-reranker-v2-m3"
    # device: "cpu" # optional

Running the App

# Start the chat interface
ai_agent chat

# Open your browser to:
# http://127.0.0.1:7860

Try uploading a cat image and asking:

"I want to segment the cat from this image"

💬 Usage

Chat Interface

The chat interface provides a natural conversation flow:

Upload Files: Drop images (PNG, JPG, TIFF, DICOM, NIfTI, etc.) or other supported files
Describe Your Task: Use natural language like "segment the lungs" or "register brain MRI"
Review Recommendations: Get ranked tool suggestions with accuracy scores and explanations
Run Demos: Click "Run demo" to execute tools directly on your uploaded images
Iterate: Ask for alternatives, refine your query, or upload different files

Supported File Formats

Images:

Standard: PNG, JPG, JPEG, WEBP, BMP, GIF
Medical: DICOM (.dcm), NIfTI (.nii, .nii.gz), TIFF stacks
Scientific: Multi-page TIFF, TIFF with metadata

Other Files:

Data: CSV, JSON, XML
Media: MP3, MP4

Example Queries

"Segment the lungs from this CT scan"
"Register these two brain MRI images"
"Extract text from this medical report image"
"Classify what organ is shown in this ultrasound"
"Detect tumors in this MRI scan"
"I need to analyze DICOM files, what tools are available?"

Understanding Results

Each recommendation includes:

Rank: Priority order (1 = best match)
Accuracy Score: Confidence level (0-100%)
Explanation: Why this tool matches your request
Metadata: Supported modalities, dimensions, formats, license
Demo Link: Direct link to runnable example

🏗️ Architecture

Pipeline Overview

The system follows a two-stage architecture:

User Input (Image + Text Query)
        ↓
┌───────────────────────────────┐
│   RETRIEVAL STAGE             │
│  - BGE-M3 Embeddings          │
│  - FAISS Vector Search        │
│  - CrossEncoder Reranking     │
│  - Format Token Matching      │
└───────────────────────────────┘
        ↓ Top-K Candidates
┌───────────────────────────────┐
│   AGENT SELECTION             │
│  - Pydantic AI Agent          │
│  - OpenAI VLM                 │
│  - Image + Metadata Analysis  │
│  - Multi-Tool Reasoning       │
└───────────────────────────────┘
        ↓
   Ranked Recommendations

Retrieval Stage

No LLM calls - purely text-based search:

Query Construction: User task + format tokens from uploaded files
Embedding: BGE-M3 model generates query embedding
Vector Search: FAISS retrieves top candidates
Reranking: CrossEncoder refines results for precision
Retry Broadening: If too few hits, retry with a shorter/broader query

Agent Selection Stage

Single VLM call - multimodal reasoning:

Input Preparation:
- Text: User query + candidate table + file metadata
- Image: PNG preview (converted from any format)
- Context: Original file format, dimensions, modality
Agent Tools:
- search_tools: Search catalog with query
- search_alternative: Find alternatives (iterative)

repo_info: Fetch GitHub documentation via DeepWiki MCP

Output: Ranked tool selections with accuracy scores and explanations

Key Components

api/pipeline.py: RAG retrieval orchestrator
agent/agent.py: Pydantic AI agent with tool definitions
retriever/: Embedding, FAISS indexing, reranking
generator/: Prompts and schema for tool selection
ui/: Gradio chat interface components
utils/: Image processing, metadata extraction, file validation
catalog/: Catalog syncing from GraphDB (optional)

⚙️ Configuration

Environment Variables

Variable	Description	Default	Required
`OPENAI_API_KEY`	OpenAI API key	-	✅
`EPFL_API_KEY_EMBEDDER`	API key for remote embedder and reranker endpoints	-	✅ (when `retrieval.embedder.backend: remote` and/or `retrieval.reranker.backend: remote`)
`GITHUB_TOKEN`	GitHub token for repo info	-	❌
`SOFTWARE_CATALOG`	Path to catalog JSONL	`dataset/catalog.jsonl`	✅
`TOP_K`	Retrieval candidates count	`8`	❌
`NUM_CHOICES`	Tools to recommend	`3`	❌
`AGENT_OUTPUT_RETRIES`	Structured output validation retries	`3`	❌
`EMBED_CATALOG_ON_START`	Pre-embed catalog on startup when FAISS is empty	`1`	❌
`LOGLEVEL_CONSOLE`	Console log level	`WARNING`	❌
`LOGLEVEL_FILE`	File log level	`INFO`	❌
`FILE_LOG`	Enable file logging	`1`	❌
`LOG_DIR`	Log directory	`logs`	❌
`LOG_PROMPTS`	Save prompt snapshots	`0`	❌
`CONFIG_PATH`	Model config file	`config.yaml`	✅

GraphDB Catalog Sync (Optional)

For automatic catalog syncing from a GraphDB instance:

GRAPHDB_URL=https://your-graphdb.example.com
GRAPHDB_GRAPH=your-graph-name
GRAPHDB_USER=username
GRAPHDB_PASSWORD=password
GRAPHDB_QUERY_FILE=/path/to/query.rq
SYNC_EVERY_HOURS=24  # Auto-refresh interval (0 to disable)
OUTPUT_JSONLD=dataset/catalog.jsonld
OUTPUT_JSONL=dataset/catalog.jsonl

Run manual sync:

ai_agent sync

📋 Catalog Format

The catalog is a JSONL file where each line is a SoftwareDoc following schema.org SoftwareSourceCode structure.

Minimal Example

{
  "name": "3d-lungs-segmentation",
  "description": "3D lung segmentation from CT; returns a mask/overlay.",
  
  "applicationCategory": ["Medical Imaging"],
  "featureList": ["segmentation"],
  "imagingModality": ["CT"],
  "dims": [3],
  "anatomy": ["lung"],
  "keywords": ["mask", "overlay", "lung segmentation", "CT"],
  
  "programmingLanguage": "Python",
  "requiresGPU": false,
  "isAccessibleForFree": true,
  "license": "Apache-2.0",
  
  "supportingData": [
    {
      "datasetFormat": "TIFF",
      "bodySite": "lung",
      "imagingModality": "CT",
      "hasDimensionality": 3
    },
    {
      "datasetFormat": "DICOM",
      "bodySite": "lung",
      "imagingModality": "CT",
      "hasDimensionality": 3
    }
  ],
  
  "runnableExample": [
    {
      "hostType": "gradio",
      "url": "https://huggingface.co/spaces/qchapp/3d-lungs-segmentation",
      "name": "HF Space"
    }
  ]
}

Key Fields

name: Unique identifier for the tool
description: Clear explanation of what the tool does
featureList: Operations (e.g., segmentation, registration, classification)
imagingModality: Medical imaging types (CT, MRI, XR, US, PET)
dims: Supported dimensions (2D, 3D, 4D)
anatomy: Body parts/organs
supportingData: Format compatibility information (critical for matching)
runnableExample: Links to live demos (HuggingFace Spaces, notebooks, web apps)

🔧 Development

Project Structure

ai-agent/
├── src/ai_agent/
│   ├── agent/              # Pydantic AI agent and tools
│   │   ├── agent.py        # Agent definition
│   │   ├── models.py       # Agent state models
│   │   ├── tools/          # Agent tool implementations
│   │   │   ├── search_tool.py
│   │   │   ├── search_alternative_tool.py
│   │   │   ├── gradio_space_tool.py
│   │   │   ├── repo_info_tool.py
│   │   │   └── deepwiki_tool.py
│   │   └── utils.py
│   ├── api/                # Pipeline orchestration
│   │   └── pipeline.py     # RAGImagingPipeline
│   ├── retriever/          # Retrieval components
│   │   ├── text_embedder.py
│   │   ├── vector_index.py
│   │   ├── reranker.py
│   │   └── software_doc.py
│   ├── generator/          # Agent prompts and schemas
│   │   ├── prompts.py
│   │   └── schema.py
│   ├── ui/                 # Gradio interface
│   │   ├── app.py
│   │   ├── handlers.py
│   │   ├── components.py
│   │   ├── formatters.py
│   │   ├── state.py
│   │   └── visualizations.py
│   ├── utils/              # Shared utilities
│   │   ├── config.py       # Configuration management
│   │   ├── file_validator.py
│   │   ├── image_meta.py   # Metadata extraction
│   │   ├── image_io.py
│   │   ├── previews.py
│   │   └── tags.py
│   ├── catalog/            # Catalog syncing
│   │   └── sync.py
│   └── cli.py              # CLI entry point
├── tests/                  # Test suite
│   ├── test_retrieval_pipeline.py
│   ├── test_repo_summary.py
│   └── data/
├── artifacts/              # Generated artifacts
│   └── rag_index/          # FAISS index
├── dataset/                # Catalog data
│   └── catalog.jsonl
├── logs/                   # Application logs
├── config.yaml             # Model configuration
├── pyproject.toml          # Project metadata & dependencies
├── Dockerfile              # Production Docker image
├── tools/image/Dockerfile  # Development Docker image
└── justfile                # Task runner commands

Local Development

# Install in development mode
pip install -e ".[dev]"

# Run tests
pytest tests/

Testing

Run the test suite:

# All tests
pytest tests/

# Specific test file
pytest tests/test_retrieval_pipeline.py

# With verbose output
pytest -v tests/

# With coverage
pytest --cov=ai_agent tests/

Logging & Debugging

Console Logs: Set LOGLEVEL_CONSOLE=DEBUG for verbose output

File Logs: Automatically saved to logs/app_YYYYMMDD.log (rotates daily)

Prompt Snapshots: Enable LOG_PROMPTS=1 to save:

logs/vlm_selector_YYYYMMDD_HHMMSS.txt - System/user prompts

📚 API & CLI Reference

CLI Commands

# Launch chat interface
ai_agent chat

# Sync catalog from GraphDB
ai_agent sync

🗺️ Maintainer Guide

For full project documentation with detailed folder responsibilities, environment defaults, and improvement guidelines, see docs/guide.md.

📝 Changelog

See CHANGELOG.md for detailed version history.

Recent Highlights

[1.0.0]

✨ New chat-based interface (ai_agent chat) with rich media and tool integration
🛠️ Fully agent-based architecture replacing legacy pipelines
🔍 Smarter retrieval with automatic retry
🔗 DeepWiki MCP integration for fast GitHub repository documentation access
🔧 YAML configuration (config.yaml) for flexible model and backend setup
🎨 Redesigned UI with Imaging Plaza branding and improved UX
⚡ Performance improvements (pre-embedding, caching, faster startup)
🧹 Major cleanup: removed deprecated code paths, legacy UI, and outdated tests

[0.1.3] - 2025-10-22

Gradio space runner tool
Repository info tool
UI fixes and polish

📄 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

🙏 Credits & Acknowledgments

Developed by: Imaging Plaza Team

Technologies:

Pydantic AI - AI agent framework
OpenAI - GPT vision model
FAISS - Vector search
BGE-M3 - Multilingual embeddings
Gradio - Interactive web UI
DeepWiki - GitHub repository documentation

Medical Imaging Formats:

pydicom - DICOM support
nibabel - NIfTI support

📮 Support

For issues, questions, or contributions, please contact the Imaging Plaza team.

🏥 Medical Disclaimer: This software is a tool recommendation system, not a diagnostic tool. Always consult qualified medical professionals for clinical decisions.

Name		Name	Last commit message	Last commit date
Latest commit History 266 Commits
.devcontainer		.devcontainer
.github		.github
assets		assets
data		data
docs		docs
src/ai_agent		src/ai_agent
tests		tests
tools		tools
.dockerignore		.dockerignore
.env.dist		.env.dist
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
justfile		justfile
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

AI Imaging Agent (Imaging Plaza)

✨ Key Features

🚀 Quick Start

Prerequisites

Installation

Configuration

Model Configuration

Running the App

💬 Usage

Chat Interface

Supported File Formats

Example Queries

Understanding Results

🏗️ Architecture

Pipeline Overview

Retrieval Stage

Agent Selection Stage

Key Components

⚙️ Configuration

Environment Variables

GraphDB Catalog Sync (Optional)

📋 Catalog Format

Minimal Example

Key Fields

🔧 Development

Project Structure

Local Development

Testing

Logging & Debugging

📚 API & CLI Reference

CLI Commands

🗺️ Maintainer Guide

📝 Changelog

Recent Highlights

📄 License

🙏 Credits & Acknowledgments

📮 Support

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages