IDLM-Duo

IDLM-Duo is an Inverse-distilled Diffusion Language Model distilled from the pretrained Duo diffusion language model. It is released with the paper IDLM: Inverse-distilled Diffusion Language Models.

IDLM extends inverse distillation to discrete token spaces. Instead of running a pretrained diffusion language model for hundreds or thousands of reverse-diffusion steps, IDLM trains a few-step student generator using an auxiliary fake model and the teacher diffusion objective.

Project page: https://david-cripto.github.io/idlm-project-page/
Code: https://github.com/David-cripto/IDLM
Paper: https://arxiv.org/abs/2602.19066

Model Details

Model family: IDLM, discrete diffusion language model
Teacher checkpoint: s-sahoo/duo
Diffusion type: uniform-state / Duo-style diffusion
Training data: OpenWebText
Tokenizer: GPT-2 tokenizer
Context length: 1024 tokens
Parameters: 169,627,250
Tensor type: F32 Safetensors
Architecture config: 12 blocks, 12 heads, hidden size 768, conditioning dimension 128, dropout 0.1
License: MIT

Intended Use

This checkpoint is intended for research on diffusion language models, inverse distillation, and few-step sampling.

Installation

The sampling code depends on CUDA and FlashAttention.

git clone https://github.com/David-cripto/IDLM.git
cd IDLM

conda create -n idlm python=3.12
conda activate idlm
conda install nvidia/label/cuda-12.4.0::cuda-toolkit
pip install -r requirements.txt
pip install flash_attn==2.7.4.post1

Loading the Checkpoint

The Hugging Face repository contains custom model code. Use trust_remote_code=True.

from transformers import AutoModelForMaskedLM, AutoTokenizer

model_id = "kekchpek/idlm-duo"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForMaskedLM.from_pretrained(
    model_id,
    trust_remote_code=True,
)

Direct AutoModelForMaskedLM loading exposes the denoising network. For text generation, use the sampler in the official IDLM repository.

Generate Samples

mkdir -p samples

python -m main \
  mode=sample_eval \
  loader.batch_size=2 \
  loader.eval_batch_size=8 \
  data=openwebtext-split \
  algo=duo \
  algo.backbone=hf_dit \
  eval.checkpoint_path=kekchpek/idlm-duo \
  sampling.steps=16 \
  sampling.num_sample_batches=10 \
  sampling.noise_removal=greedy \
  +wandb.offline=true \
  eval.generated_samples_path=samples/idlm_duo_16steps.json

The generation script can be swept with different sampling steps. The paper reports both ancestral (a) and Greedy-Tail (g) sampling variants.

Evaluation

The paper reports generation perplexity (GenPPL, lower is better) and sample entropy (higher is better) on OpenWebText-style generation. The released evaluation code defaults to gpt2-large for GenPPL.

Sampling steps	Sampler	GenPPL (lower is better)	Entropy (higher is better)
32	Greedy-Tail	54.05	5.49
16	Greedy-Tail	68.04	5.55
8	Greedy-Tail	93.00	5.56
4	Greedy-Tail	144.74	4.28
32	Ancestral	63.10	5.54
16	Ancestral	78.00	5.58
8	Ancestral	117.88	5.62
4	Ancestral	495.85	5.56

For comparison, the Duo teacher is reported at 1024 steps with GenPPL 71.72 / entropy 5.22 under Greedy-Tail sampling and GenPPL 77.69 / entropy 5.55 under ancestral sampling.

Training Summary

IDLM-Duo was trained by initializing the student and fake model from the pretrained Duo teacher and alternating between:

Updating the fake model on student-generated samples using the teacher diffusion loss.
Updating the student using the teacher-fake loss gap.

The Duo setting uses a Gaussian relaxation and soft token inputs for stable backpropagation through the diffusion objective.

Citation

@article{li2026idlm,
  title={IDLM: Inverse-distilled Diffusion Language Models},
  author={Li, David and Gushchin, Nikita and Abulkhanov, Dmitry and Moulines, Eric and Oseledets, Ivan and Panov, Maxim and Korotin, Alexander},
  journal={arXiv preprint arXiv:2602.19066},
  year={2026}
}

Downloads last month: 54

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for kekchpek/idlm-duo

Base model

s-sahoo/duo

Finetuned

(2)

this model

Dataset used to train kekchpek/idlm-duo

Paper for kekchpek/idlm-duo

IDLM: Inverse-distilled Diffusion Language Models

Paper • 2602.19066 • Published Feb 22