Skip to content

feat: add fp8 optimization support for transformer model#6

Open
eric-gitta-moore wants to merge 6 commits intobrandon929:mainfrom
eric-gitta-moore:main
Open

feat: add fp8 optimization support for transformer model#6
eric-gitta-moore wants to merge 6 commits intobrandon929:mainfrom
eric-gitta-moore:main

Conversation

@eric-gitta-moore
Copy link
Copy Markdown

  • Implement fp8 quantization utilities for linear layers
  • Add fp8 optimization option to gradio demo interface
  • Modify worker function to handle fp8 optimized state dict
  • Include monkey patching for fp8 linear layer forward pass

merge https://github.com/kohya-ss/FramePack-LoRAReady/blob/main/utils/fp8_optimization_utils.py

- Implement fp8 quantization utilities for linear layers
- Add fp8 optimization option to gradio demo interface
- Modify worker function to handle fp8 optimized state dict
- Include monkey patching for fp8 linear layer forward pass
Add --offline flag to load models from local cache instead of downloading from HuggingFace hub. This enables usage in environments with restricted internet access.
Fix the --offline argument to remove incorrect store_true action and adjust the minimum value of gpu_memory_preservation slider from 6 to 0 for better flexibility in low-memory scenarios
The FP8 optimization checkbox was disabled by default, which may lead to suboptimal performance for users who are unaware of this setting. Enabling it by default ensures better performance out of the box.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant