Helping campus teams answer repetitive student questions through a multilingual, retrieval-augmented assistant.
- Students queue at campus offices for answers already buried inside circulars, PDF notices, and policy documents.
- Staff repeatedly answer the same questions, often in multiple languages, which slows service delivery.
- Students prefer conversational guidance over searching static documents.
Goal: Provide a campus-friendly chatbot that can ingest official documents, understand queries in multiple languages, and respond accurately while remaining easy for student volunteers to maintain.
- Frontend: React 19, Vite 7, Tailwind CSS 4 (via
@tailwindcss/vite), Axios, React Router 7. - Auth Service: Node.js 20+, Express 5, Mongoose 8, JWT, Winston.
- Backend API: Python 3.10+/FastAPI, LangChain Community & Text Splitters, ChromaDB, Ollama embeddings (
nomic-embed-text), Google Generative AI SDK, pdfplumber/PyPDF2. - Database: MongoDB Community Server 6.x (local) for user accounts and chat history.
- Tooling: npm 10, pip/venv, Ollama runtime, Postman for API checks.
Data flow summary:
- Users interact with the React UI. Requests that require authentication are routed through the Express auth gateway backed by MongoDB.
- Authenticated chat requests call the FastAPI service, which retrieves supporting context from ChromaDB using embeddings generated by Ollama.
- Context is injected into Gemini for response generation, then returned to the frontend along with metadata for dashboards.
- Admin uploads invoke asynchronous ingestion pipelines (PDF parse β chunk β embed β persist), with approval gates before student access.
- Document-grounded answers: Upload PDFs to seed the knowledge base; ensures responses align with official guidelines. Trade-off: Requires initial ingestion step before the bot is useful.
- Multilingual chat UI: Supports English, Hindi, and regional languages via Geminiβs multilingual capabilities. Trade-off: Dependence on Gemini API availability.
- Role-based access: JWT-backed auth distinguishes students vs. admins. Trade-off: Requires MongoDB instance and token management.
- Admin approval flow: Prevents unverified documents from affecting answers. Trade-off: Adds manual review step for admins.
- Activity & document dashboards: Surfaced to demonstrate retrieval context and recent uploads. Trade-off: Minimal analytics; relies on log polling.
- Node.js 20+, npm 10+.
- Python 3.10 or 3.11 with
python -m venv. - MongoDB Community Server running locally (
mongod). - Ollama (Windows installer) with
nomic-embed-textmodel (ollama pull nomic-embed-text). - Google API key with Gemini access (
GOOGLE_API_KEY).
git clone <repo-url>
cd PolySEE- Copy examples and fill secrets (Windows):
copy Auth\.env.example Auth\.envcopy backend\core\.env.example backend\core\.env
- Update the following values:
Auth/.env:MONGO_URI,JWT_SECRET,FASTAPI_URL, allowedFRONTEND_ORIGIN.backend/core/.env:GOOGLE_API_KEY, optional tuning for chunking and persistence.
cd backend\core
python -m venv .venv
.venv\Scripts\activate
pip install --upgrade pip
pip install -r requirements.txt
uvicorn app:app --host 127.0.0.1 --port 8000 --reloadEnsure Ollama is running (ollama serve) and the embedding model is downloaded before ingesting documents.
cd Auth
npm install
node index.js # or: npx nodemon index.jsMongoDB must be reachable at the MONGO_URI; local default is mongodb://localhost:27017/chatbot.
cd Frontend
npm install
npm run dev -- --hostOpen http://localhost:5173 to access the UI. Start order: MongoDB β Ollama β FastAPI β Auth β Frontend.
Check each service independently before running the full UI flow.
- MongoDB β Ensure
mongodis running; use MongoDB Compass ormongoshto confirm thechatbotdatabase exists or can be created. - Auth Service (
http://localhost:4000)
POST /auth/health(if added) orGET /auth/healthβ verify service is alive.- Seed accounts via the dev-only register route:
- Set request to
POST http://localhost:4000/auth/register. - Header:
Content-Type: application/json. - Body for an admin account:
Send and confirm HTTP 201.
{ "regNo": "23CS001", "password": "Pass@123", "role": "admin" } - Optional student account body:
{ "regNo": "23CS101", "password": "Student@123", "role": "user" }
- Set request to
POST /auth/login(http://localhost:4000/auth/login) with either account to obtain a JWT token. Example student login payload:{ "regNo": "23CS101", "password": "Student@123" }GET /auth/mewith headerAuthorization: Bearer <token>to confirm the token is valid.
- FastAPI Backend (
http://127.0.0.1:8000)
GET /healthβ confirms service readiness; expect{ "status": "ok", "vectorstore_loaded": false }on a fresh run.POST /test_retrievalwith{ "message": "library timing" }β expect{ "status": "error", "message": "No docs uploaded" }before ingesting documents.POST /chatwith{ "message": "What time does the hostel gate close?" }β verify fallback behavior prior to ingestion; re-run after uploading a hostel-policy PDF to confirm grounded answers.
- Upload Pipeline
POST /upload_pdf_async(form-data file upload) using keyfileand valueHostel_Guidelines.pdf(or similar test doc).- Poll
GET /upload_status/{upload_id}until status iscompletedwithnum_chunksreported. POST /approve_doc/Hostel_Guidelines.pdfto expose the new content to students. RepeatPOST /chattest above to see updated response.
After these checks, proceed to use the frontend normally.
POST /auth/registerβ Create user{ regNo, password, role }.POST /auth/loginβ Returns{ token, regNo, role }.GET /auth/meβ Validate bearer token.POST /api/chatβ Authenticated proxy to FastAPI chatbot.GET /api/user/recent-chatsβ Fetch stored chat history.
MongoDB User model (Auth/models/User.js):
{
"regNo": "string",
"passwordHash": "string",
"role": "user|admin",
"recentChats": [ { "query": "...", "response": "...", ... } ],
"uploadedFiles": [ { "filename": "...", "uploadedAt": "..." } ]
}
β οΈ Note: User creation routes are exposed for development/testing only. In production, registration should be locked down to admins or seeded data.
Example dev user payloads:
POST /auth/register(admin){ "regNo": "23CS001", "password": "Pass@123", "role": "admin" }POST /auth/register(student){ "regNo": "23CS101", "password": "Student@123", "role": "user" }POST /auth/login{ "regNo": "23CS001", "password": "Pass@123" }POST /auth/login(student){ "regNo": "23CS101", "password": "Student@123" }
GET /healthβ Service readiness + vector store status.POST /chatβ Main RAG endpoint returningChatResponse.POST /chat_streamβ Streaming variant emitting progress tokens.POST /admin_chatβ Admin view with unapproved docs included.POST /upload_pdf_asyncβ Async PDF ingestion; returnsupload_id.GET /upload_status/{id}β Track ingestion progress.POST /approve_doc/{filename}β Mark ingested document as student-visible.DELETE /delete_doc/{filename}β Remove document from vector store.GET /documentsβ List document approval status.GET /recent_activitiesβ Recent admin actions.POST /test_retrievalβ Debug retrieval results without LLM call.GET /logsβ Convenience endpoint tailing auth logs (configurable path).
Vector store: Persistent Chroma collection under backend/core/chroma_db/. Remove this directory to reset knowledge state.
Auth/.env.exampleβ All Express service secrets and URLs.backend/core/.env.exampleβ FastAPI, Chroma, and Gemini configuration keys. Copy these files to.envand populate with real values before running.
Currently optimized for local/offline demos. No public deployment is configured yet. To deploy:
- Host the FastAPI service on a VM or container with GPU access optional but not required.
- Provision a managed MongoDB instance (Atlas) and secure JWT secrets.
- Run Ollama on the same host or swap to a hosted embedding provider.
- Build the frontend (
npm run build) and serve via a static host (Vercel, Netlify, S3+CloudFront). Deployment automation is pending.
- Answer accuracy: Grounded responses observed after ingesting policy PDFs; without ingestion, chatbot clearly states missing knowledge.
- Latency: Chat requests average 1.5β2.5s locally (embedding cache + Gemini). Streaming endpoint improves perceived speed.
- Scalability assumptions: Chroma is configured for single-node persistence; supports thousands of chunks comfortably on a workstation. MongoDB handles up to a few thousand users in current schema without sharding.
- Demo workflow: Reset by deleting
backend/core/chroma_db/to showcase before/after ingestion behavior.
- Improve analytics (conversation success rate, multilingual usage metrics).
- Add automated tests (unit + integration) for Auth and FastAPI services.
- Introduce role-based dashboards and approval notifications.
- Evaluate alternative embedding providers for cloud deployment.
- Harden security (rate limiting, audit logging, production-ready secrets management).
Contributions and feedback are welcome! Open an issue or submit a pull request with proposed improvements.