AVI-Edit: Audio-sync Video Instance Editing with Granularity-Aware Mask Refiner

Project Page | arXiv | Code

AVI-Edit is a framework for audio-sync video instance editing. It introduces a granularity-aware mask refiner that iteratively refines coarse user-provided masks into precise instance-level regions and a self-feedback audio agent to curate high-quality audio guidance, providing fine-grained temporal control.

Installation

To set up the environment, follow these steps from the official repository:

git clone https://github.com/suimuc/AVI-Edit-Framework.git
cd AVI-Edit-Framework
conda create -n avi_edit python=3.10
conda activate avi_edit
pip install -r requirements.txt
pip install -e .

Usage

The framework supports inference using either a pre-edited audio track or an automated audio agent.

1. Inference with an Edited Audio Track

Use this script when you already have the edited audio:

python scripts/inference_with_edited_audio.py \
  --video-path /path/to/input_video.mp4 \
  --audio-path /path/to/edited_audio.wav \
  --mask-path /path/to/mask.mp4 \
  --prompt "Describe the edited scene here." \
  --output-dir /path/to/output_dir

2. Inference with the Audio Agent

Use this script to generate replacement audio automatically from the video, mask, and edit prompt:

python scripts/inference.py \
  --video-path /path/to/input_video.mp4 \
  --mask-path /path/to/mask.mp4 \
  --prompt "Describe the edited scene here." \
  --output-dir /path/to/output_dir \
  --dashscope-api-key "<YOUR_QWEN_OR_OPENAI_COMPATIBLE_API_KEY>" \
  --eleven-api-key "<YOUR_ELEVENLABS_API_KEY>"

Citation

@article{avi-edit,
  title={Audio-sync Video Instance Editing with Granularity-Aware Mask Refiner},
  author={Zheng, Haojie and Weng, Shuchen and Liu, Jingqi and Yang, Siqi and Shi, Boxin and Wang, Xinlong},
  journal={arXiv preprint arXiv:2512.10571},
  year={2025}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Any-to-Any

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for suimu/AVI-Edit

AVI-Edit: Audio-sync Video Instance Editing with Granularity-Aware Mask Refiner

Paper • 2512.10571 • Published 7 days ago