Skip to content

ShawnTheCreator/kernalagent

Repository files navigation


image

Kernel Agent

An advanced, autonomous AI desktop assistant designed to integrate seamlessly with your workflow.
Explore the docs »

View Demo · Report Bug · Request Feature

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Architecture
  5. Roadmap
  6. Contributing
  7. License
  8. Contact

About The Project

Kernel Agent is a next-generation desktop automation platform that combines a native Windows application with powerful AI microservices. It is designed to act as a true digital companion, capable of understanding voice commands, seeing your screen, and executing complex tasks across your operating system.

Key features include:

  • Holographic Desktop Overlay: A sleek, non-intrusive UI built with WinUI 3 that provides instant access to AI capabilities.
  • Multi-Modal AI: Integrates Google Gemini, Computer Vision (OpenCV/EasyOCR), and Voice Recognition (Vosk/Google Speech) for a seamless interaction model.
  • Autonomous Agents: Includes specialized agents like "Janitor" for system maintenance and "Sentinel" for security monitoring.
  • Extensible Architecture: Built on a microservice architecture allowing for easy addition of new capabilities and agents.

(back to top)

Built With

The project is built using a robust stack of modern technologies:

  • Desktop Application:
    • DotNet .NET 9.0 & WinUI 3
    • C# for core application logic and OS integration
  • AI Microservice:
    • Python FastAPI
    • PyTorch & OpenCV for ML and Vision
    • Google Gemini for reasoning and generation
  • Frontend / Web Dashboard:
    • Next
    • React
    • TailwindCSS
    • Three.js & React Three Fiber for 3D visualizations

(back to top)

Getting Started

To get a local copy up and running, follow these steps.

Prerequisites

Ensure you have the following installed on your development machine:

  • Node.js (v18+)
  • .NET 9.0 SDK
  • Python 3.10+
  • Git

Installation

  1. Clone the repository

    git clone https://github.com/ShawnTheCreator/kernalagent.git
    cd kernalagent
  2. Setup the AI Microservice

    cd Microservice
    python -m venv venv
    # Windows
    .\venv\Scripts\activate
    # Install dependencies
    pip install -r requirements.txt
    • Create a .env file in Microservice/ and add your keys (Gemini API, Firebase Service Account path).
  3. Setup the Frontend

    cd ../Frontend
    npm install
    • Create a .env.local file with your Firebase and Supabase credentials.
  4. Setup the Desktop App

    • Open Desktop-App/Kernel Agent.sln in Visual Studio 2022.
    • Ensure "Kernel Agent" is the startup project.
    • Add your service-account.json to the project root and set "Copy to Output Directory" to "Copy if newer".
  5. Run the System

    • Terminal 1 (Microservice): python app/main.py (or uvicorn app.main:app --reload)
    • Terminal 2 (Frontend): npm run dev
    • Visual Studio: Press F5 to build and run the Desktop App.

(back to top)

Usage

  • Voice Commands: Activate the agent with the wake word (configurable) or by clicking the orb. Try commands like "Open Notepad", "Check system health", or "Summarize this document".
  • Agent Forge: Use the "Forge" page in the desktop app to craft custom sub-agents with specific personalities and tool access.
  • Dashboard: Access the web dashboard (default localhost:3000) to view agent analytics, memory logs, and manage installed skills.

(back to top)

Architecture

The system operates on a hub-and-spoke model:

  • The Hub (Desktop App): The central nervous system. It handles user input (Voice/Text), renders the UI, and performs OS-level actions (File I/O, Window Management).
  • The Brain (Microservice): A Python FastAPI server that processes complex requests. It handles LLM inference, runs computer vision tasks, and manages the state of long-running autonomous agents.
  • The Cloud (Firebase/Supabase): Syncs user preferences, agent memory, and long-term history across devices.

(back to top)

Roadmap

  • Initial WinUI 3 Desktop Interface
  • Python Microservice with Gemini Integration
  • Basic Voice Command Execution
  • Advanced Vision: Real-time screen context understanding
  • Agent Marketplace: Community-driven agent sharing
  • Deep OS Integration: More granular control over Windows settings and registry
  • Multi-turn Conversation: Improved context retention for complex tasks

See the open issues for a full list of proposed features (and known issues).

(back to top)

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Contact

Shawn - @ShawnTheCreator

Project Link: https://github.com/ShawnTheCreator/kernalagent

(back to top)

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors