LLM-Local-Deployment-Guide

A practical guide for local LLM deployment with 4-bit quantization.
4-bit量化本地大模型部署实战指南

🚀 4060 笔记本本地化部署 Qwen2.5-1.5B 进阶实战：4-bit 量化与 Gradio 交互

Local Deployment of Qwen2.5 on RTX 4060: 4-bit Quantization & Gradio UI

中文 | English

🇨🇳 中文指南

🌟 项目亮点

本项目完整记录了如何在搭载 NVIDIA RTX 4060 的笔记本上，从零搭建 AI 环境，并实现大语言模型的本地私有化、全速运行。

🛠️ 核心技术路线

环境搭建：Anaconda + CUDA 12.1 + PyTorch。
推理加速：BitsAndBytes 4-bit 量化（显存降低50%以上）。
交互方式：提供命令行版（快速测试）和Gradio Web UI版（美观交互，支持手机访问）。

📊 性能表现

显卡：RTX 4060 Laptop (8GB VRAM)
模型：Qwen2.5-1.5B-Instruct
加载时间：约 7s（4-bit 量化）
推理速度：接近网页版原生体验

🚀 快速开始（推荐）

1. 创建并激活 Anaconda 环境

conda create -n qwen25 python=3.11
conda activate qwen25

2. 安装Pytorch(CUDA12.1版本)

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

3.安装依赖项目

pip install -r requirements.txt

4.运行项目（二选一）

选项A:命令行快速测试版（无量化，适合快速验证)

python scripts/run_qwen.py

选项B：4-bit 量化 + Gradio Web UI版（推荐！带漂亮聊天界面）

python scripts/my_local_ai.py

启动后浏览器会自动打开界面。手机访问方法：同一 WiFi 下，用手机浏览器打开 http://你的笔记本IP:7860

账号：local，密码：123456。如需修改，请编辑 scripts/my_local_ai.py 最后一行的 auth=("local", "123456")。

📋 依赖列表（requirements.txt 已包含）

torch（CUDA 12.1）
transformers, accelerate, bitsandbytes
gradio
sentencepiece, protobuf 等

注意：

第一次运行会自动下载模型（约1GB），建议开启科学上网或使用 hf-mirror。
4-bit 量化版（my_local_ai.py）显存占用更低，推荐在 4060 上使用。
如果出现 CUDA out of memory，可以尝试降低 max_new_tokens 或关闭浏览器其他标签。
Gradio 默认端口为 7860，可在代码中修改。

📚 更多文档

故障排除

🇬🇧 English Guide

🌟 Highlights

This project documents the end-to-end process of building a local AI environment on a laptop with an NVIDIA RTX 4060, achieving high-speed local inference of LLMs.

🛠️ Tech Stack

Environment: Precise configuration of CUDA 12.1 + PyTorch (GPU version) using Anaconda.
Acceleration: Implemented 4-bit Quantization via BitsAndBytes, reducing VRAM usage by >50% and enabling instant response.
UI & Interaction: Built a custom chat interface with Gradio, supporting local network tunneling for mobile access.

📊 Performance

GPU: RTX 4060 Laptop (8GB VRAM)
Model: Qwen2.5-1.5B-Instruct
Loading Time: ~7s
Speed: Near-native web experience under 4-bit quantization.

🚀 Quick Start

1.Create Conda Environment

conda create -n qwen25 python=3.11
conda activate qwen25

2. Install PyTorch (CUDA 12.1)

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

3.Install Dependencies

pip install -r requirements.txt

4.RUN(Choose One)

Option A: Command-line Quick Version (No quantification, suitable for rapid verification)

python scripts/run_qwen.py

Option B: 4-bit quantization + Gradio web UI version (recommended! Comes with a beautiful chat interface)

python scripts/my_local_ai.py

After startup, the browser will automatically open the interface. Mobile access method: Under the same WiFi network, open http://your laptop’s IP address:7860 in your mobile browser.

Account：local，Password：123456. To change credentials, edit the last line in scripts/my_local_ai.py.

📋 Notes

First run will download the model (~1GB). Use hf-mirror for faster speed.
4-bit version uses less VRAM — strongly recommended for RTX 4060.
For CUDA out of memory, reduce max_new_tokens or close other tabs.

📚 More documents

Troubleshooting

Author: Haven (Jinan University - AI Major)
License: MIT

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
docs		docs
scripts		scripts
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
test_scripts.py		test_scripts.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM-Local-Deployment-Guide

🚀 4060 笔记本本地化部署 Qwen2.5-1.5B 进阶实战：4-bit 量化与 Gradio 交互

🇨🇳 中文指南

🌟 项目亮点

🛠️ 核心技术路线

📊 性能表现

🚀 快速开始（推荐）

1. 创建并激活 Anaconda 环境

2. 安装Pytorch(CUDA12.1版本)

3.安装依赖项目

4.运行项目（二选一）

启动后浏览器会自动打开界面。手机访问方法：同一 WiFi 下，用手机浏览器打开 http://你的笔记本IP:7860

📋 依赖列表（requirements.txt 已包含）

注意：

📚 更多文档

🇬🇧 English Guide

🌟 Highlights

🛠️ Tech Stack

📊 Performance

🚀 Quick Start

1.Create Conda Environment

2. Install PyTorch (CUDA 12.1)

3.Install Dependencies

4.RUN(Choose One)

After startup, the browser will automatically open the interface. Mobile access method: Under the same WiFi network, open http://your laptop’s IP address:7860 in your mobile browser.

📋 Notes

📚 More documents

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLM-Local-Deployment-Guide

🚀 4060 笔记本本地化部署 Qwen2.5-1.5B 进阶实战：4-bit 量化与 Gradio 交互

🇨🇳 中文指南

🌟 项目亮点

🛠️ 核心技术路线

📊 性能表现

🚀 快速开始（推荐）

1. 创建并激活 Anaconda 环境

2. 安装Pytorch(CUDA12.1版本)

3.安装依赖项目

4.运行项目（二选一）

启动后浏览器会自动打开界面。 手机访问方法：同一 WiFi 下，用手机浏览器打开 http://你的笔记本IP:7860

📋 依赖列表（requirements.txt 已包含）

注意：

📚 更多文档

🇬🇧 English Guide

🌟 Highlights

🛠️ Tech Stack

📊 Performance

🚀 Quick Start

1.Create Conda Environment

2. Install PyTorch (CUDA 12.1)

3.Install Dependencies

4.RUN(Choose One)

After startup, the browser will automatically open the interface. Mobile access method: Under the same WiFi network, open http://your laptop’s IP address:7860 in your mobile browser.

📋 Notes

📚 More documents

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

启动后浏览器会自动打开界面。手机访问方法：同一 WiFi 下，用手机浏览器打开 http://你的笔记本IP:7860

Packages