Skip to content

cyxzdev/discoindex

Repository files navigation

discoindex

discord data exporter + nia-powered indexing & agent. scrapes your server's messages, normalizes them into clean CSVs, indexes them on nia, and lets you chat with (or build a bot on top of) your indexed data.

built on bun, crustjs, and nia. npm will not work here.

setup

bun install

copy the example config and fill in your credentials:

cp discoindex.toml.example discoindex.toml

you need at minimum a discord bot token. for indexing and agent modes, you also need a nia api key and an llm api key (any openai-compatible provider works).

modes

run bun run dev and you get a menu, or call each mode directly:

bun run dev scrape
bun run dev index
bun run dev agent
bun run dev bot

scrape also accepts flags to skip the interactive prompts:

bun run dev scrape --server 123456789 --channels 111,222,333

scrape

exports discord messages to CSV. give it a server id and channel ids (interactively or via --server/--channels flags), it pulls every message (including threads, both active and archived), normalizes them, and writes two files:

  • messages.csv with every individual message, mentions resolved to names, timestamps converted, embeds and attachments extracted
  • conversations.csv with messages grouped into conversations

the conversation grouping is the interesting part. it does three passes:

  1. threads get grouped as their own conversations. long threads get chunked so they stay within embedding token limits
  2. reply chains (3+ messages deep) get stitched together by walking the referenced_message_id links backwards. 2-message pairs are intentionally left out, they work better with surrounding context
  3. time windows pick up everything that's left. messages within N minutes of each other (default 30) get grouped together, capped at a max per conversation

messages already consumed by threads or reply chains don't appear in time windows, so nothing gets double-counted. my goal was to produce high quality chunks that embed well in nia and surface the right results when searched.

also supports user tokens (self-bot mode) via token_type = "user" in the config, with cookie and x-super-properties headers for browser session spoofing. this violates discord's TOS and is unstable — use at your own risk.

index

takes the CSVs from scrape and uploads them to nia as a local folder source. you pick which files to index, it copies them into a directory in the project root (.{id}-nia-index-local), registers it with the nia cli, and syncs. these directories persist — they're not cleaned up automatically (per design).

agent

interactive chat with your indexed data. picks an llm (configurable, any openai-compatible endpoint), gives it a search_knowledge tool backed by nia search, and runs a tool-use loop. the agent can search multiple times per question if it needs to, up to 5 rounds.

you select which nia sources to chat with on startup. supports local folder sources.

bot

same agent pipeline but running as a discord bot. when someone pings the bot in an enabled channel, it creates a thread and answers there. keeps responding in that thread if pinged again. also supports DMs.

the bot pulls recent conversation history (configurable window) so it has context on what was already discussed. the thinking message while it generates is configurable in the toml.

config

everything lives in discoindex.toml. see discoindex.toml.example for the full reference.

the important sections:

  • [discord] bot token, token type
  • [export] output dir, conversation window settings, bot/system message filters
  • [nia] api key for indexing and search
  • [llm] base url, api key, model (openai-compatible)
  • [agent_prompt] identity, context, voice rules, reference date
  • [support_bot] enabled channels, dm toggle, history window, max history messages, thinking message

build

bun run build

compiles to a standalone binary (discoindex / discoindex.exe).

tests

bun test

About

Index your Discord server, and ask it questions!

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors