Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,22 @@
<img src="https://img.shields.io/badge/tests-140-passing-green" alt="140 tests">
</p>

A local AI learning assistant with long-term memory, role-based group chat,
web search, model routing and context-tier management.

## Highlights

- **Multi-provider LLM client**: OpenAI / DeepSeek / OpenRouter / SiliconFlow / local models
- **Model routing** with fast / light / deep / archive context tiers
- **Long-term memory** based on Markdown files and safe-writer persistence
- **Web search pipeline**: RSS fetch → article extraction → LLM digest → source-traced discussion
- **SSRF protection** for article fetching, **detect-secrets** in CI
- **Batched session logging** and multi-layer caching for performance
- **Performance budget**: max_tokens bounds on every LLM call by mode
- **140 tests**, Ruff clean, GitHub Actions CI

---

**一个面向个人学习复盘的本地 AI 学习搭子系统** — 支持角色群聊、联网搜索、长期记忆和课后总结。

> 不是又一个 AI 问答工具,而是一个会记住你学什么的 AI 学习伙伴。
Expand Down
73 changes: 73 additions & 0 deletions docs/ARCHITECTURE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Architecture

```
┌─────────────────────────────────────────────────────────────┐
│ Streamlit Runtime │
│ app.py — entry point, fragment orchestration │
├─────────────────────────────────────────────────────────────┤
│ src/ui/ │
│ ├── sidebar.py Settings, modes, export │
│ ├── status_bar.py Status cards, stats, perf │
│ ├── chat_panel.py Single-chat UI │
│ ├── wechat_panel.py Group-chat UI + news phases │
│ ├── after_session_panel.py Post-session review │
│ ├── session_state.py init / refresh helpers │
│ └── theme.py Catppuccin dark theme │
├─────────────────────────────────────────────────────────────┤
│ src/ │
│ ├── llm_client.py Chat / stream, auto-reconnect │
│ ├── llm_router.py LLM-based routing (JSON mode) │
│ ├── context_builder.py System prompt assembly │
│ ├── config.py Multi-provider config │
│ ├── router.py Route resolution │
│ ├── mode_manager.py Runtime modes, YAML truth │
│ ├── performance_budget.py Max-tokens by mode │
│ ├── role_manager.py Role loading │
│ ├── model_stats.py Usage tracking │
│ │ │
│ ├── memory.py File-based memory with LRU cache │
│ ├── memory_writer.py Structured memory updates │
│ ├── memory_tools.py Read/write tool functions │
│ │ │
│ ├── wechat_format.py Text formatting, role parsing │
│ ├── wechat_state.py Group state I/O │
│ ├── wechat_generator.py LLM generation (opening/reply/ │
│ │ discussion) │
│ ├── wechat_prompt.py Prompt template loading │
│ ├── wechat_memory.py Memory candidate extraction │
│ ├── wechat_service.py High-level orchestration │
│ │ │
│ ├── session_logger.py Session persistence, batch flush │
│ ├── safe_writer.py Atomic writes, retry, backup │
│ ├── health_check.py Read-only health probes │
│ └── news/ News pipeline (see NEWS_PIPELINE) │
├─────────────────────────────────────────────────────────────┤
│ config/runtime_state.yaml — Single source of truth │
│ memory/ — Markdown memory files │
│ chat/ — Group chat transcripts │
│ roles/ — Role definitions │
│ templates/ — Prompt templates │
└─────────────────────────────────────────────────────────────┘
```

## Layers

| Layer | Responsibility |
|---|---|
| **UI** | Streamlit fragments, user interaction, display |
| **Orchestration** | wechat_service.py ties news + memory + generation |
| **LLM** | Client, routing, context assembly, budget control |
| **Memory** | File-based, tiered context groups, safe writer |
| **News** | RSS fetch → article extraction → digest → discussion |
| **State** | YAML truth → Markdown views, synced at runtime |

## Fragment Model

`app.py` uses `@st.fragment` to isolate re-renders:

- `render_sidebar_fragment` — settings, state toggles, actions
- `render_status_fragment` — status cards, stats line
- `render_single_main_fragment` — chat UI
- `render_after_session_fragment` — post-session review

Global-affecting sidebar actions use `st.rerun()` (full page) to refresh all fragments.
44 changes: 44 additions & 0 deletions docs/CONTEXT_TIERS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Context Tiers

The system selects which memory files to include in the LLM context based on the current performance mode. This balances response quality against token usage and latency.

## Tier Definitions

Defined in `src/memory.py` (`CONTEXT_FILE_GROUPS`):

| Tier | Files | Use Case |
|---|---|---|
| **fast** | `index.md`, `current_focus.md` | Quick lookup, simple Q&A |
| **light** | + `summary.md`, `learner_profile.md` | Default daily chat |
| **deep** | + `progress.md`, `project_context.md`, `task_board.md` | Complex reasoning, project review |
| **archive** | + `archive_summary.md`, `agent.md`, `system_detail.md` | Full context, session archive |

## Resolution

`context_mode` is derived from `performance_mode` in `RuntimeModes` (`src/mode_manager.py`):

- `fast` → `fast`
- `standard` → `light`
- `deep` → `deep`
- No direct UI path to `archive` — used programmatically for archival tasks

## Memory Files

Path: `memory/`

| File | Content |
|---|---|
| `index.md` | Learner name, preferred roles, brief background |
| `current_focus.md` | What the learner is currently working on |
| `summary.md` | Session summaries, key learnings |
| `learner_profile.md` | Learning style, strengths, weaknesses |
| `progress.md` | Version-tracked progress log |
| `project_context.md` | Project description, goals, constraints |
| `task_board.md` | Active tasks, backlog |
| `archive_summary.md` | Archived session records |
| `agent.md` | Agent self-configuration notes |
| `system_detail.md` | Technical system context |

## Caching

Memory files are LRU-cached with invalidation on file signature change (`src/memory.py:_read_text_file_cached`). Cache size: 64 entries.
67 changes: 67 additions & 0 deletions docs/MEMORY_SYSTEM.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# Memory System

## Overview

File-based long-term memory using Markdown files, managed through a truth hierarchy. No vector store or external database — designed for zero-infrastructure local operation.

## Truth Hierarchy

```
config/runtime_state.yaml (authoritative)
memory/internal_state.md (human-readable view, synced)
memory/interaction_settings.md
chat/wechat_state.md
```

`mode_manager.py` syncs views from YAML on read. Any write goes through `_write_runtime_state()` which updates YAML, then propagates to view files.

## File Layout

```
memory/
├── index.md Learner identity, preferences
├── current_focus.md Active learning focus
├── summary.md Session summaries
├── learner_profile.md Learning style, strengths
├── progress.md Versioned progress
├── project_context.md Project description
├── task_board.md Task tracking
├── archive_summary.md Archived history
├── agent.md Agent notes
├── system_detail.md Technical context
├── internal_state.md Runtime state view (synced)
├── interaction_settings.md Interaction state view (synced)
└── pending_updates/
├── wechat_memory_candidates.md LLM-extracted candidates
└── wechat_memory_candidates.json Structured candidate data
```

## Memory Operations

### Reading

`memory.py:_read_text_file_cached(path, signature) → str`

- LRU-cached (64 entries), invalidated on file signature change
- Context-mode selection via `CONTEXT_FILE_GROUPS` (see CONTEXT_TIERS.md)
- `extract_core_section()` strips frontmatter for lightweight reads

### Writing

All writes go through `memory_writer.py` → `safe_writer.py`:

1. **Preview**: Generate update suggestions → user reviews
2. **Confirm**: User selects which updates to apply
3. **Write**: `safe_write_text()` with atomic temp-file + retry + backup
4. **Flush**: Updated context available on next memory bundle refresh

### Group Chat Memory Extraction

`wechat_memory.py` extracts memory candidates from group chat discussions:

- Triggered by configurable `memory_capture_mode` (manual/auto)
- LLM extracts structured candidates from chat history
- Results stored as Markdown + JSON in `memory/pending_updates/`
- Candidates reviewed before committing to main memory files
57 changes: 57 additions & 0 deletions docs/MODEL_ROUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Model Routing

## Multi-Provider LLM Client

`src/llm_client.py` provides a unified interface across 5 LLM providers:

| Provider | Env Prefix | Default Base URL |
|---|---|---|
| DeepSeek | `DEEPSEEK_*` | `https://api.deepseek.com/v1` |
| OpenAI | `OPENAI_*` | — |
| OpenRouter | `OPENROUTER_*` | `https://openrouter.ai/api/v1` |
| SiliconFlow | `SILICONFLOW_*` | `https://api.siliconflow.cn/v1` |
| Local | `LOCAL_*` | `http://127.0.0.1:8000/v1` |

Selection via `LLM_PROVIDER_PROFILE` env var. Client instances are cached by config signature and automatically rebuilt when settings change.

## Model Profiles

Two model tiers:

- **flash**: Fast, low-cost model for daily chat and group replies
- **pro**: Higher-quality model for summaries, routing, and complex reasoning

Resolution logic (`src/wechat_generator.py:_resolve_model_profile`):

```
performance_mode = deep → pro
performance_mode = fast → flash
selected_model = pro → pro
default → flash
```

## LLM Router

`src/llm_router.py` performs LLM-based routing when `route_mode == "hybrid"` and `performance_mode != "fast"`. It calls the LLM with a JSON prompt to determine the best role, mode, and model for a user query.

Valid outputs:

- **role**: march7 (casual), keqing (project), nahida (concept), firefly (wrap-up)
- **mode**: 普通, 苏格拉底, 费曼, 项目, 论文, 概念地图
- **model**: flash, pro
- **confidence**: high, medium, low

Route caching via `st.session_state.current_route` — cleared when settings change.

## Performance Budget

All LLM calls are bounded by `src/performance_budget.py`:

| Call Point | Fast | Standard | Deep |
|---|---|---|---|
| Single chat | 700 | 1100 | 1600 |
| Group reply | 520 | 760 | 1050 |
| Opening | 420 | 620 | 850 |
| News digest | 650 | 950 | 1300 |
| News discussion | 520 | 760 | 1000 |
| History lines | 16 | 28 | 40 |
105 changes: 105 additions & 0 deletions docs/NEWS_PIPELINE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
# News Pipeline

Multi-source news aggregation pipeline: search → fetch → extract → digest → discuss → trace.

## Pipeline Stages

```
User query
1. Multi-source RSS fetch
├── Google News RSS
├── Bing News RSS
└── RSSHub (domestic Chinese sources)
2. Dedup + sort + truncate (max 10 items)
3. Link resolution (top N = resolve_top_n, bounded)
4. Article text extraction (top 5 pages, max 5000 chars each)
├── trafilatura (primary)
├── readability-lxml (fallback)
└── raw <p> text (last resort)
5. Digest generation (LLM-summarized)
6. Group discussion (4 roles discuss the news)
7. Source block written to chat transcript
```

## Stage Detail

### 1. RSS Fetch

`src/news/rss_fetcher.py` — parallel multi-source fetch:

- Google News: `https://news.google.com/rss/search?q={query}&hl=zh-CN`
- Bing News: `https://www.bing.com/news/search?q={query}&format=rss`
- RSSHub: Configurable domestic sources
- 600-second article cache per query

### 2. Dedup

Title normalization + set-based dedup. Per-query cache (10 min TTL).

### 3. Link Resolution

`src/news/link_resolver.py` — resolves Google News redirect URLs to actual article URLs. Only resolves top N items (configured via `resolve_top_n`).

### 4. Article Extraction

`src/news/article_fetcher.py` — layered extraction with SSRF protection:

- **Trafilatura**: Fast, accurate extraction for well-formed pages
- **Readability**: Better for complex layouts
- **Raw text**: `<p>` tag concatenation as last resort

Method label tracked per article for quality monitoring.

### 5. Digest

`src/news/digest.py` — LLM generates a structured digest with:

- Article coverage summary (which articles were used)
- Key points from each source
- Token-bounded by `news_digest_max_tokens(performance_mode)`

### 6. Discussion

`wechat_generator.py:generate_wechat_news_discussion()` — 4 characters discuss the digest:

- Bound by `news_discussion_max_tokens(performance_mode)`
- Each character references specific news points
- Group state synced after discussion

### 7. Source Tracing

After each news round, a source block is appended to the group chat transcript:

```
【联网检索】
查询:xxx
1. Title | Source | Date | Body status
URL
```

This ensures all discussion claims are traceable to their sources.

## UI Flow

The entry page (`wechat_panel.py`) provides a 4-phase stepper:

1. **Search** — Enter query, configure max articles
2. **Fetch articles** — Read page text (optional)
3. **Generate digest** — LLM summary
4. **Discuss in group** — 4-role news discussion

Each phase is a separate button enabling incremental progress visibility.
Loading
Loading