| | --- |
| | title: Apply Lora And Quantize |
| | emoji: π¬ |
| | colorFrom: yellow |
| | colorTo: purple |
| | sdk: gradio |
| | sdk_version: 5.0.1 |
| | app_file: app.py |
| | pinned: false |
| | license: mit |
| | short_description: apply_lora_and_quantize |
| | --- |
| | |
| | An example chatbot using [Gradio](https://gradio.app), [`huggingface_hub`](https://huggingface.co/docs/huggingface_hub/v0.22.2/en/index), and the [Hugging Face Inference API](https://huggingface.co/docs/api-inference/index). |
| |
|
| | # Model Converter for HuggingFace |
| |
|
| | A powerful tool for converting and quantizing Large Language Models (LLMs) with LoRA adapters. |
| |
|
| | ## Features |
| |
|
| | - π Automatic system resource detection (CPU/GPU) |
| | - π Merge base models with LoRA adapters |
| | - π Support for 4-bit and 8-bit quantization |
| | - βοΈ Automatic upload to HuggingFace Hub |
| |
|
| | ## Requirements |
| |
|
| | - Python 3.8+ |
| | - CUDA compatible GPU (optional, but recommended) |
| | - HuggingFace account and token |
| |
|
| | ## Installation |
| |
|
| | ```bash |
| | pip install -r requirements.txt |
| | ``` |
| |
|
| | ## Configuration |
| |
|
| | Create a `.env` file in the project root: |
| | ``` |
| | HF_TOKEN=your_huggingface_token |
| | ``` |
| |
|
| | ## Usage |
| |
|
| | Run the script: |
| | ```bash |
| | python space_convert.py |
| | ``` |
| |
|
| | You will be prompted to enter: |
| | 1. Base model path (e.g., "Qwen/Qwen2.5-7B-Instruct") |
| | 2. LoRA model path |
| | 3. Target HuggingFace repository name |
| |
|
| | The script will: |
| | 1. Check available system resources |
| | 2. Choose the optimal device (GPU/CPU) |
| | 3. Merge the base model with LoRA |
| | 4. Create 8-bit and 4-bit quantized versions |
| | 5. Upload everything to HuggingFace |
| |
|
| | ## Memory Requirements |
| |
|
| | - 7B models: ~16GB RAM/VRAM |
| | - 14B models: ~32GB RAM/VRAM |
| | - Additional disk space: 3x model size |
| |
|
| | ## Note |
| |
|
| | The script automatically handles: |
| | - Resource availability checks |
| | - Device selection |
| | - Error handling |
| | - Progress tracking |
| | - Model optimization |