Skip to content

TheerapatGunthog/AutoLoggingSummarizeWithRAG

Repository files navigation

Auto Logging Summarization with RAG

Overview

This project implements a Retrieval-Augmented Generation (RAG) pipeline to automatically summarize user logs collected within a one-hour window.
The system analyzes log entries for each IP address, produces a concise summary, and classifies user behavior into:

  • No Harm
  • Harm

RAG enhances contextual understanding by combining relevant document retrieval with large language model (LLM) generation.


Architecture

  1. Input: Hourly log data grouped by IP.
  2. Preprocessing: Metadata extraction and OCR (via Docling) for PDF or text-based logs.
  3. Retriever: Hybrid search combining
    • Metadata filtering (topic, MITRE IDs)
    • Keyword search (BM25 via Whoosh)
    • Semantic vector search (FAISS + SentenceTransformer embeddings)
    • Fusion with Reciprocal Rank Fusion (RRF)
    • De-duplication with Maximal Marginal Relevance (MMR)
  4. LLM Summarization: LangChain-compatible LLM (e.g., Ollama backend) summarizes logs and predicts Harm/No Harm.
  5. Evaluation:
    Precision, Recall, and F1-score are computed by comparing the model outputs with ChatGPT (GPT-5) results as the reference standard.

Results

Metric Without RAG (qwen3:8b) With RAG
Precision ~0.86 ~0.98
Recall ~0.55 ~0.55 (similar)
F1-score ~0.68 ~0.70
  • Scores are close overall, but qualitative analysis shows:
    • Non-RAG tends to produce more “Undetermined” or “No Harm” responses.
    • RAG more accurately detects “Harm” cases when logs contain domain-specific or context-heavy information.

Engineering Insight

  • If logs include technical or domain-specific content with complex structure → RAG provides real benefit through better context retrieval.
  • If logs are short, patterned, or repetitivenon-RAG is sufficient and more efficient in latency and cost.

Requirements

pip install -r requirements.txt

Output

Each output includes:

  • One-hour log summary.
  • Label: No Harm or Harm.
  • Metadata references for transparency.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages