GitHub - ShawnTheCreator/kernalagent

Kernel Agent

An advanced, autonomous AI desktop assistant designed to integrate seamlessly with your workflow.
Explore the docs »

View Demo · Report Bug · Request Feature

Table of Contents

About The Project
- Built With
Getting Started
- Prerequisites
- Installation
Usage
Architecture
Roadmap
Contributing
License
Contact

About The Project

Kernel Agent is a next-generation desktop automation platform that combines a native Windows application with powerful AI microservices. It is designed to act as a true digital companion, capable of understanding voice commands, seeing your screen, and executing complex tasks across your operating system.

Key features include:

Holographic Desktop Overlay: A sleek, non-intrusive UI built with WinUI 3 that provides instant access to AI capabilities.
Multi-Modal AI: Integrates Google Gemini, Computer Vision (OpenCV/EasyOCR), and Voice Recognition (Vosk/Google Speech) for a seamless interaction model.
Autonomous Agents: Includes specialized agents like "Janitor" for system maintenance and "Sentinel" for security monitoring.
Extensible Architecture: Built on a microservice architecture allowing for easy addition of new capabilities and agents.

(back to top)

Built With

The project is built using a robust stack of modern technologies:

Desktop Application:
- .NET 9.0 & WinUI 3
- C# for core application logic and OS integration
AI Microservice:
- FastAPI
- PyTorch & OpenCV for ML and Vision
- Google Gemini for reasoning and generation
Frontend / Web Dashboard:
- Three.js & React Three Fiber for 3D visualizations

(back to top)

Getting Started

To get a local copy up and running, follow these steps.

Prerequisites

Ensure you have the following installed on your development machine:

Node.js (v18+)
.NET 9.0 SDK
Python 3.10+
Git

Installation

Clone the repository

git clone https://github.com/ShawnTheCreator/kernalagent.git
cd kernalagent

Setup the AI Microservice

cd Microservice
python -m venv venv
# Windows
.\venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt

Create a .env file in Microservice/ and add your keys (Gemini API, Firebase Service Account path).

Setup the Frontend
```
cd ../Frontend
npm install
```
- Create a .env.local file with your Firebase and Supabase credentials.
Setup the Desktop App
- Open Desktop-App/Kernel Agent.sln in Visual Studio 2022.
- Ensure "Kernel Agent" is the startup project.
- Add your service-account.json to the project root and set "Copy to Output Directory" to "Copy if newer".
Run the System
- Terminal 1 (Microservice): python app/main.py (or uvicorn app.main:app --reload)
- Terminal 2 (Frontend): npm run dev
- Visual Studio: Press F5 to build and run the Desktop App.

(back to top)

Usage

Voice Commands: Activate the agent with the wake word (configurable) or by clicking the orb. Try commands like "Open Notepad", "Check system health", or "Summarize this document".
Agent Forge: Use the "Forge" page in the desktop app to craft custom sub-agents with specific personalities and tool access.
Dashboard: Access the web dashboard (default localhost:3000) to view agent analytics, memory logs, and manage installed skills.

(back to top)

Architecture

The system operates on a hub-and-spoke model:

The Hub (Desktop App): The central nervous system. It handles user input (Voice/Text), renders the UI, and performs OS-level actions (File I/O, Window Management).
The Brain (Microservice): A Python FastAPI server that processes complex requests. It handles LLM inference, runs computer vision tasks, and manages the state of long-running autonomous agents.
The Cloud (Firebase/Supabase): Syncs user preferences, agent memory, and long-term history across devices.

(back to top)

Roadmap

Initial WinUI 3 Desktop Interface
Python Microservice with Gemini Integration
Basic Voice Command Execution
Advanced Vision: Real-time screen context understanding
Agent Marketplace: Community-driven agent sharing
Deep OS Integration: More granular control over Windows settings and registry
Multi-turn Conversation: Improved context retention for complex tasks

See the open issues for a full list of proposed features (and known issues).

(back to top)

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Contact

Shawn - @ShawnTheCreator

Project Link: https://github.com/ShawnTheCreator/kernalagent

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 187 Commits
.vscode		.vscode
Backend/KernalAgentBackend		Backend/KernalAgentBackend
Desktop-App/Kernel Agent		Desktop-App/Kernel Agent
Frontend		Frontend
Microservice		Microservice
.gitignore		.gitignore
AGENTS.md		AGENTS.md
QUICK_START.md		QUICK_START.md
README.md		README.md
render.yaml		render.yaml
requirements.txt		requirements.txt
setup-backend.ps1		setup-backend.ps1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kernel Agent

About The Project

Built With

Getting Started

Prerequisites

Installation

Usage

Architecture

Roadmap

Contributing

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Kernel Agent

About The Project

Built With

Getting Started

Prerequisites

Installation

Usage

Architecture

Roadmap

Contributing

License

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages