An advanced, autonomous AI desktop assistant designed to integrate seamlessly with your workflow.
Explore the docs »
View Demo
·
Report Bug
·
Request Feature
Table of Contents
Kernel Agent is a next-generation desktop automation platform that combines a native Windows application with powerful AI microservices. It is designed to act as a true digital companion, capable of understanding voice commands, seeing your screen, and executing complex tasks across your operating system.
Key features include:
- Holographic Desktop Overlay: A sleek, non-intrusive UI built with WinUI 3 that provides instant access to AI capabilities.
- Multi-Modal AI: Integrates Google Gemini, Computer Vision (OpenCV/EasyOCR), and Voice Recognition (Vosk/Google Speech) for a seamless interaction model.
- Autonomous Agents: Includes specialized agents like "Janitor" for system maintenance and "Sentinel" for security monitoring.
- Extensible Architecture: Built on a microservice architecture allowing for easy addition of new capabilities and agents.
The project is built using a robust stack of modern technologies:
- Desktop Application:
- AI Microservice:
- Frontend / Web Dashboard:
To get a local copy up and running, follow these steps.
Ensure you have the following installed on your development machine:
- Node.js (v18+)
- .NET 9.0 SDK
- Python 3.10+
- Git
-
Clone the repository
git clone https://github.com/ShawnTheCreator/kernalagent.git cd kernalagent -
Setup the AI Microservice
cd Microservice python -m venv venv # Windows .\venv\Scripts\activate # Install dependencies pip install -r requirements.txt
- Create a
.envfile inMicroservice/and add your keys (Gemini API, Firebase Service Account path).
- Create a
-
Setup the Frontend
cd ../Frontend npm install- Create a
.env.localfile with your Firebase and Supabase credentials.
- Create a
-
Setup the Desktop App
- Open
Desktop-App/Kernel Agent.slnin Visual Studio 2022. - Ensure "Kernel Agent" is the startup project.
- Add your
service-account.jsonto the project root and set "Copy to Output Directory" to "Copy if newer".
- Open
-
Run the System
- Terminal 1 (Microservice):
python app/main.py(oruvicorn app.main:app --reload) - Terminal 2 (Frontend):
npm run dev - Visual Studio: Press F5 to build and run the Desktop App.
- Terminal 1 (Microservice):
- Voice Commands: Activate the agent with the wake word (configurable) or by clicking the orb. Try commands like "Open Notepad", "Check system health", or "Summarize this document".
- Agent Forge: Use the "Forge" page in the desktop app to craft custom sub-agents with specific personalities and tool access.
- Dashboard: Access the web dashboard (default
localhost:3000) to view agent analytics, memory logs, and manage installed skills.
The system operates on a hub-and-spoke model:
- The Hub (Desktop App): The central nervous system. It handles user input (Voice/Text), renders the UI, and performs OS-level actions (File I/O, Window Management).
- The Brain (Microservice): A Python FastAPI server that processes complex requests. It handles LLM inference, runs computer vision tasks, and manages the state of long-running autonomous agents.
- The Cloud (Firebase/Supabase): Syncs user preferences, agent memory, and long-term history across devices.
- Initial WinUI 3 Desktop Interface
- Python Microservice with Gemini Integration
- Basic Voice Command Execution
- Advanced Vision: Real-time screen context understanding
- Agent Marketplace: Community-driven agent sharing
- Deep OS Integration: More granular control over Windows settings and registry
- Multi-turn Conversation: Improved context retention for complex tasks
See the open issues for a full list of proposed features (and known issues).
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Distributed under the MIT License. See LICENSE.txt for more information.
Shawn - @ShawnTheCreator
Project Link: https://github.com/ShawnTheCreator/kernalagent