Skip to content

SAYOUNCDR/PolySEE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸŽ“ Multilingual Campus Chatbot

Helping campus teams answer repetitive student questions through a multilingual, retrieval-augmented assistant.


Problem & Goal

  • Students queue at campus offices for answers already buried inside circulars, PDF notices, and policy documents.
  • Staff repeatedly answer the same questions, often in multiple languages, which slows service delivery.
  • Students prefer conversational guidance over searching static documents.

Goal: Provide a campus-friendly chatbot that can ingest official documents, understand queries in multiple languages, and respond accurately while remaining easy for student volunteers to maintain.


Tech Stack

  • Frontend: React 19, Vite 7, Tailwind CSS 4 (via @tailwindcss/vite), Axios, React Router 7.
  • Auth Service: Node.js 20+, Express 5, Mongoose 8, JWT, Winston.
  • Backend API: Python 3.10+/FastAPI, LangChain Community & Text Splitters, ChromaDB, Ollama embeddings (nomic-embed-text), Google Generative AI SDK, pdfplumber/PyPDF2.
  • Database: MongoDB Community Server 6.x (local) for user accounts and chat history.
  • Tooling: npm 10, pip/venv, Ollama runtime, Postman for API checks.

Data flow summary:

  1. Users interact with the React UI. Requests that require authentication are routed through the Express auth gateway backed by MongoDB.
  2. Authenticated chat requests call the FastAPI service, which retrieves supporting context from ChromaDB using embeddings generated by Ollama.
  3. Context is injected into Gemini for response generation, then returned to the frontend along with metadata for dashboards.
  4. Admin uploads invoke asynchronous ingestion pipelines (PDF parse β†’ chunk β†’ embed β†’ persist), with approval gates before student access.

Core Features & Trade-offs

  • Document-grounded answers: Upload PDFs to seed the knowledge base; ensures responses align with official guidelines. Trade-off: Requires initial ingestion step before the bot is useful.
  • Multilingual chat UI: Supports English, Hindi, and regional languages via Gemini’s multilingual capabilities. Trade-off: Dependence on Gemini API availability.
  • Role-based access: JWT-backed auth distinguishes students vs. admins. Trade-off: Requires MongoDB instance and token management.
  • Admin approval flow: Prevents unverified documents from affecting answers. Trade-off: Adds manual review step for admins.
  • Activity & document dashboards: Surfaced to demonstrate retrieval context and recent uploads. Trade-off: Minimal analytics; relies on log polling.

Setup & Run

1. Prerequisites

  • Node.js 20+, npm 10+.
  • Python 3.10 or 3.11 with python -m venv.
  • MongoDB Community Server running locally (mongod).
  • Ollama (Windows installer) with nomic-embed-text model (ollama pull nomic-embed-text).
  • Google API key with Gemini access (GOOGLE_API_KEY).

2. Clone & Install

git clone <repo-url>
cd PolySEE

3. Environment Variables

  • Copy examples and fill secrets (Windows):
    • copy Auth\.env.example Auth\.env
    • copy backend\core\.env.example backend\core\.env
  • Update the following values:
    • Auth/.env: MONGO_URI, JWT_SECRET, FASTAPI_URL, allowed FRONTEND_ORIGIN.
    • backend/core/.env: GOOGLE_API_KEY, optional tuning for chunking and persistence.

4. Backend (FastAPI)

cd backend\core
python -m venv .venv
.venv\Scripts\activate
pip install --upgrade pip
pip install -r requirements.txt
uvicorn app:app --host 127.0.0.1 --port 8000 --reload

Ensure Ollama is running (ollama serve) and the embedding model is downloaded before ingesting documents.

5. Auth Service (Node + MongoDB)

cd Auth
npm install
node index.js   # or: npx nodemon index.js

MongoDB must be reachable at the MONGO_URI; local default is mongodb://localhost:27017/chatbot.

6. Frontend (Vite + React)

cd Frontend
npm install
npm run dev -- --host

Open http://localhost:5173 to access the UI. Start order: MongoDB β†’ Ollama β†’ FastAPI β†’ Auth β†’ Frontend.

7. Postman Smoke Tests (optional but recommended)

Check each service independently before running the full UI flow.

  1. MongoDB – Ensure mongod is running; use MongoDB Compass or mongosh to confirm the chatbot database exists or can be created.
  2. Auth Service (http://localhost:4000)
  • POST /auth/health (if added) or GET /auth/health – verify service is alive.
  • Seed accounts via the dev-only register route:
    1. Set request to POST http://localhost:4000/auth/register.
    2. Header: Content-Type: application/json.
    3. Body for an admin account:
      {
        "regNo": "23CS001",
        "password": "Pass@123",
        "role": "admin"
      }
      Send and confirm HTTP 201.
    4. Optional student account body:
      {
        "regNo": "23CS101",
        "password": "Student@123",
        "role": "user"
      }
  • POST /auth/login (http://localhost:4000/auth/login) with either account to obtain a JWT token. Example student login payload:
    {
      "regNo": "23CS101",
      "password": "Student@123"
    }
  • GET /auth/me with header Authorization: Bearer <token> to confirm the token is valid.
  1. FastAPI Backend (http://127.0.0.1:8000)
  • GET /health – confirms service readiness; expect { "status": "ok", "vectorstore_loaded": false } on a fresh run.
  • POST /test_retrieval with { "message": "library timing" } – expect { "status": "error", "message": "No docs uploaded" } before ingesting documents.
  • POST /chat with { "message": "What time does the hostel gate close?" } – verify fallback behavior prior to ingestion; re-run after uploading a hostel-policy PDF to confirm grounded answers.
  1. Upload Pipeline
  • POST /upload_pdf_async (form-data file upload) using key file and value Hostel_Guidelines.pdf (or similar test doc).
  • Poll GET /upload_status/{upload_id} until status is completed with num_chunks reported.
  • POST /approve_doc/Hostel_Guidelines.pdf to expose the new content to students. Repeat POST /chat test above to see updated response.

After these checks, proceed to use the frontend normally.


API & Data Reference

Auth Service (http://localhost:4000)

  • POST /auth/register – Create user { regNo, password, role }.
  • POST /auth/login – Returns { token, regNo, role }.
  • GET /auth/me – Validate bearer token.
  • POST /api/chat – Authenticated proxy to FastAPI chatbot.
  • GET /api/user/recent-chats – Fetch stored chat history.

MongoDB User model (Auth/models/User.js):

{
	"regNo": "string",
	"passwordHash": "string",
  "role": "user|admin",
	"recentChats": [ { "query": "...", "response": "...", ... } ],
	"uploadedFiles": [ { "filename": "...", "uploadedAt": "..." } ]
}

⚠️ Note: User creation routes are exposed for development/testing only. In production, registration should be locked down to admins or seeded data.

Example dev user payloads:

  • POST /auth/register (admin)
    {
      "regNo": "23CS001",
      "password": "Pass@123",
      "role": "admin"
    }
  • POST /auth/register (student)
    {
      "regNo": "23CS101",
      "password": "Student@123",
      "role": "user"
    }
  • POST /auth/login
    {
      "regNo": "23CS001",
      "password": "Pass@123"
    }
  • POST /auth/login (student)
    {
      "regNo": "23CS101",
      "password": "Student@123"
    }

FastAPI Backend (http://localhost:8000)

  • GET /health – Service readiness + vector store status.
  • POST /chat – Main RAG endpoint returning ChatResponse.
  • POST /chat_stream – Streaming variant emitting progress tokens.
  • POST /admin_chat – Admin view with unapproved docs included.
  • POST /upload_pdf_async – Async PDF ingestion; returns upload_id.
  • GET /upload_status/{id} – Track ingestion progress.
  • POST /approve_doc/{filename} – Mark ingested document as student-visible.
  • DELETE /delete_doc/{filename} – Remove document from vector store.
  • GET /documents – List document approval status.
  • GET /recent_activities – Recent admin actions.
  • POST /test_retrieval – Debug retrieval results without LLM call.
  • GET /logs – Convenience endpoint tailing auth logs (configurable path).

Vector store: Persistent Chroma collection under backend/core/chroma_db/. Remove this directory to reset knowledge state.


Environment Templates

  • Auth/.env.example – All Express service secrets and URLs.
  • backend/core/.env.example – FastAPI, Chroma, and Gemini configuration keys. Copy these files to .env and populate with real values before running.

Deployment

Currently optimized for local/offline demos. No public deployment is configured yet. To deploy:

  • Host the FastAPI service on a VM or container with GPU access optional but not required.
  • Provision a managed MongoDB instance (Atlas) and secure JWT secrets.
  • Run Ollama on the same host or swap to a hosted embedding provider.
  • Build the frontend (npm run build) and serve via a static host (Vercel, Netlify, S3+CloudFront). Deployment automation is pending.

Impact & Metrics

  • Answer accuracy: Grounded responses observed after ingesting policy PDFs; without ingestion, chatbot clearly states missing knowledge.
  • Latency: Chat requests average 1.5–2.5s locally (embedding cache + Gemini). Streaming endpoint improves perceived speed.
  • Scalability assumptions: Chroma is configured for single-node persistence; supports thousands of chunks comfortably on a workstation. MongoDB handles up to a few thousand users in current schema without sharding.
  • Demo workflow: Reset by deleting backend/core/chroma_db/ to showcase before/after ingestion behavior.

What’s Next

  • Improve analytics (conversation success rate, multilingual usage metrics).
  • Add automated tests (unit + integration) for Auth and FastAPI services.
  • Introduce role-based dashboards and approval notifications.
  • Evaluate alternative embedding providers for cloud deployment.
  • Harden security (rate limiting, audit logging, production-ready secrets management).

Contributions and feedback are welcome! Open an issue or submit a pull request with proposed improvements.

About

Multilingual campus chatbot that grounds Gemini responses in uploaded PDFs, built with React, FastAPI, Express, MongoDB, LangChain, ChromaDB, and Ollama.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors