Video-Free-Pokemon-LoRA-Hyvideo1.5_I2V
This model ships a LoRA adapter PokeLoRA.safetensors for HunyuanVideo-1.5 image-to-video (i2v), trained on the PokeFA Pokémon fan-art dataset, which domain-adapts the base model to Pokémon by explicitly reducing a common failure mode: motion-induced identity drift. The LoRA adapter needs to be paired with tencent/HunyuanVideo-1.5. The training base model is the HunyuanVideo-1.5 pipeline with the 480p_i2v transformer. This LoRA might also transfer to higher-resolution (e.g., 720p) variants of HunyuanVideo-1.5 i2v, but the generation quality of higher-res usage is not guaranteed and may require adapting inference settings.
In i2v, when the character distribution shifts from general to a narrow class (for example, Pokémon), adding motion to a static reference image often causes identity drift. The base model often produces frames that remain “Pokémon-ish” but drift away from the exact character identity. PokeLoRA is trained to keep motion and dynamics while improving identity stability under motion prompts (for comparisons and illustrations, see the Generation Examples section).
High-level training approach
This LoRA is trained with an image-only, video-free recipe designed for large domain shifts (general → Pokémon) in i2v:
Teacher–student distillation (preserve motion priors)
A frozen teacher (base model, no LoRA) generates a short denoising trajectory. The student (base + LoRA) is trained to match the teacher’s native prediction (flow/velocity/noise—whatever the base model uses).
This keeps the adapted model from “cheating” by collapsing to near-static frames and suppressing motion/background dynamics.Identity loss in pixel space (reduce drift under motion)
At mid-late timesteps (mid-low noise region), the student prediction is converted to a predicted clean latent with an Euler step and decoded to frames. A DINOv3 embedding space, tuned to Pokémon identity via a projection head (that's robust to large changes in style, pose, motion blur, viewpoint, and background), measures similarity between sampled frames and an anchor image. This explicitly penalizes identity drift while still allowing pose and motion changes.- Identity embedding model (DINOv3 projection head): DINOv3-PokeCon-Head
LoRA targets four groups: (1) MMDoubleStreamBlock attention projections; (2) MMSingleStreamBlock attention + fusion/MLP projections; (3) modulation/gating layers; (4) the final AdaLN modulation.
If you want to reproduce training, adapt it to other domains (characters, styles, rare “vibes”), or check out the implementation details, see the project GitHub page Video-Free-LoRA-Hyvideo1.5-I2V.
Generation examples
Side-by-side comparison (LoRA vs base)
These three videos use the same inference settings and configs (same prompt, same reference image, same seed, same steps). Since this model is trained with a distillation approach, the optimal classifier-free guidance (CFG) scale is 1. The sylveon_base clip uses the recommended base-model default of CFG 6. The sylveon_base_samecfg clip shows the base model under the same CFG 1 setting for a direct apples-to-apples comparison.
The LoRA preserves Sylveon identity during running; the base model at CFG 6 shows obvious identity drift; and the base model at CFG 1 has both identity drift and extra blur due to the low CFG.
| LoRA (CFG 1) | Base (CFG 6) | Base (CFG 1) |
|---|---|---|
Prompt, reference image, and source
Prompt:
Sylveon starts sprinting left-to-right in a side-view shot, with the camera tracking alongside.
Reference image source: https://s1.zerochan.net/Sylveon.600.4383743.jpg
More examples
Prompt, reference image, and source
Prompt:
Glaceon and Eevee run and jump side-by-side toward the camera through a snowy forest, front view with a backward-tracking follow camera; Glaceon gradually pulls ahead and fills the frame while Eevee drifts off to the left and disappears
Reference source: https://i.pinimg.com/1200x/f7/d7/c3/f7d7c314bb3321929da1bc5f6adee902.jpg
Prompt, reference image, and source
Prompt:
Umbreon in a vampire-style cape floats through a night sky, front-facing close shot with a slightly low-angle follow camera.
Reference image source: https://www.pinterest.com/pin/17662623532666810/
Prompt, reference image, and source
Prompt:
Eevee jumps off a rock and starts running left-to-right, side-view with a smooth follow camera tracking alongside through a forest path.
Reference image source: https://i.pinimg.com/736x/52/aa/3d/52aa3d882193c130a14f13772440acb7.jpg
Quickstart
Use the generate.py script from the GitHub repo: Video-Free-LoRA-Hyvideo1.5-I2V.
Quick guidance:
- Negative prompt is not needed.
- A safe CFG default is 1. You can experiment with 1-2, but going over 2 is not recommended for this LoRA.
- Default LoRA scale is 0.5 (recommended range 0.3-0.7).
- The adapter file in this repo is
PokeLoRA.safetensors.
Example:
python generate.py \
--prompt "your prompt" \
--image_path /path/to/ref.jpg \
--resolution 480p \
--model_path /path/to/hunyuanvideo_pipeline \
--lora_path /path/to/PokeLoRA.safetensors \
--guidance_scale 1 \
--lora_scale 0.5
Limitations & Caveats
Very long generations may degrade
This LoRA was trained with a short video length (17 frames) for compute reasons (the identity loss requires VAE decoding). As a result, it works best for short-to-medium generations. In my experience, outputs under ~100 frames are generally fine. I have not thoroughly evaluated lengths beyond 100 frames, so longer generations may degrade in quality.
Complex prompts are not well-tested
To target motion-induced identity drift, training used a straightforward prompt structure: {character} + (short motion prompt) + (short camera prompt) sampled from prompt banks. This is feasible because in i2v, the reference image already provides strong conditioning (style, background, lighting, composition, etc.). Recommendation: only specify in the prompt what you need (typically motion and camera), and let the reference image carry the rest. Very long / highly detailed prompts (many constraints, multiple subjects, dense scene descriptions, etc.) were not thoroughly tested, so results may be less predictable.
Out-of-domain generations are not well-tested
This LoRA was tuned for Pokémon identity, so it works best with Pokémon reference images. It should transfer better to anime-style characters (shared identity priors like color layout, facial structure, silhouette cues) and is least predictable on photorealistic references. Out-of-domain use is not extensively tested, and gains can be smaller or less consistent the farther the reference image is from the training distribution (Pokémon → anime-style → photorealistic).
License
This repository contains LoRA adapter weights trained from Tencent HunyuanVideo 1.5. The adapter constitutes a "Model Derivative" and is distributed under the Tencent Hunyuan Community License Agreement. Use and distribution must comply with the Agreement (including the Acceptable Use Policy and Territory restrictions).
Pokemon IP notice: Pokemon and all related names, characters, and imagery are trademarks and copyrighted works of The Pokemon Company, Nintendo, Game Freak, and Creatures Inc. This project is fan-made, non-commercial, and is not affiliated with, endorsed by, or sponsored by any of these entities.
Any use of this model that involves commercial exploitation of Pokemon IP (including but not limited to, selling generated images, using them in paid products or services, or monetized media) may infringe third-party rights. You are solely responsible for ensuring that your use of the model and its outputs complies with applicable laws, platform policies, and the rights of The Pokemon Company and other IP holders.
This description is provided for informational purposes only and does not constitute legal advice.
Model tree for Kev0208/Video-Free-Pokemon-LoRA-Hyvideo1.5_I2V
Base model
tencent/HunyuanVideo-1.5


