Gemma4-Edge-Bridge

Gemma4-Edge-Bridge is my local AI setup for running Gemma 4:E2B on a mid-range NVIDIA GTX 1050 Ti (4GB VRAM) with a GPU-first configuration tuned for fast, private inference.

💡 Key Features

Zero-Cloud Privacy: All inference happens locally on WSL2.
Full GPU Utilization: Optimized layers to fit entirely in 4GB VRAM.
Seamless Integration: Mirrored networking bridge between Windows Enterprise and Ubuntu.

🛠️ Setup

Copy .wslconfig to your Windows User folder so WSL2 picks up the memory and networking settings.
Run sh setup_linux.sh in WSL2 to configure the Ollama service.
Run powershell ./setup_windows.ps1 to prepare the Python UI.
Launch the interface: streamlit run app.py.

🧠 Why a Custom Modelfile?

I use gemma-hackathon to:

Force GPU Residency: By setting num_gpu 999, we encourage Ollama to keep every model layer resident in the 4GB VRAM of the 1050 Ti for maximum speed.
Logic-First Tuning: We lower temperature to 0.3 so the model stays precise, technical, and suitable for engineering tasks.
Environment Persistence: This wraps the Google Gemma 4 base model in a fixed configuration, so the Streamlit UI always talks to the same hardware-optimized version.

⚙️ Specs

Model: Gemma 4:E2B (Quantized)
Host: Windows Enterprise / WSL2 Ubuntu 24.04
Interface: Streamlit / Python 3.13

💻 Running without a GPU (CPU-Only Mode)

This project is designed to be hardware-agnostic. If you do not have an NVIDIA GPU:

Model Config: In the Modelfile, remove the line PARAMETER num_gpu 999.
Performance: The model will run using System RAM. While slower than GPU-accelerated inference, all logic and features remain fully functional.
Resource Tuning: Adjust .wslconfig memory settings to 50% of your total physical RAM for optimal stability.

📊 Monitoring & Utilities

To ensure the model is utilizing the GPU correctly, use the provided monitoring scripts:

Model Creation: Run sh scripts/create_model.sh to initialize the optimized Gemma 4 environment.
GPU Verification: Run sh scripts/monitor_gpu.sh to see real-time VRAM usage and layer offloading status.
Manual Checks:
- ollama ps: Shows which models are currently in memory and the GPU/CPU split.
- ollama list: Displays all locally available model versions.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
modelfile2		modelfile2
scripts		scripts
.gitignore		.gitignore
.wslconfig		.wslconfig
LICENSE		LICENSE
Modelfile		Modelfile
Modelfile.txt		Modelfile.txt
Project Info.txt		Project Info.txt
README.md		README.md
app.py		app.py
gitattributes		gitattributes
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gemma4-Edge-Bridge

💡 Key Features

🛠️ Setup

🧠 Why a Custom Modelfile?

⚙️ Specs

💻 Running without a GPU (CPU-Only Mode)

📊 Monitoring & Utilities

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Gemma4-Edge-Bridge

💡 Key Features

🛠️ Setup

🧠 Why a Custom Modelfile?

⚙️ Specs

💻 Running without a GPU (CPU-Only Mode)

📊 Monitoring & Utilities

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages