IDLM-Duo

IDLM-Duo is an Inverse-distilled Diffusion Language Model distilled from the pretrained Duo diffusion language model. It is released with the paper IDLM: Inverse-distilled Diffusion Language Models.

IDLM extends inverse distillation to discrete token spaces. Instead of running a pretrained diffusion language model for hundreds or thousands of reverse-diffusion steps, IDLM trains a few-step student generator using an auxiliary fake model and the teacher diffusion objective.

Model Details

  • Model family: IDLM, discrete diffusion language model
  • Teacher checkpoint: s-sahoo/duo
  • Diffusion type: uniform-state / Duo-style diffusion
  • Training data: OpenWebText
  • Tokenizer: GPT-2 tokenizer
  • Context length: 1024 tokens
  • Parameters: 169,627,250
  • Tensor type: F32 Safetensors
  • Architecture config: 12 blocks, 12 heads, hidden size 768, conditioning dimension 128, dropout 0.1
  • License: MIT

Intended Use

This checkpoint is intended for research on diffusion language models, inverse distillation, and few-step sampling.

Installation

The sampling code depends on CUDA and FlashAttention.

git clone https://github.com/David-cripto/IDLM.git
cd IDLM

conda create -n idlm python=3.12
conda activate idlm
conda install nvidia/label/cuda-12.4.0::cuda-toolkit
pip install -r requirements.txt
pip install flash_attn==2.7.4.post1

Loading the Checkpoint

The Hugging Face repository contains custom model code. Use trust_remote_code=True.

from transformers import AutoModelForMaskedLM, AutoTokenizer

model_id = "kekchpek/idlm-duo"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForMaskedLM.from_pretrained(
    model_id,
    trust_remote_code=True,
)

Direct AutoModelForMaskedLM loading exposes the denoising network. For text generation, use the sampler in the official IDLM repository.

Generate Samples

mkdir -p samples

python -m main \
  mode=sample_eval \
  loader.batch_size=2 \
  loader.eval_batch_size=8 \
  data=openwebtext-split \
  algo=duo \
  algo.backbone=hf_dit \
  eval.checkpoint_path=kekchpek/idlm-duo \
  sampling.steps=16 \
  sampling.num_sample_batches=10 \
  sampling.noise_removal=greedy \
  +wandb.offline=true \
  eval.generated_samples_path=samples/idlm_duo_16steps.json

The generation script can be swept with different sampling steps. The paper reports both ancestral (a) and Greedy-Tail (g) sampling variants.

Evaluation

The paper reports generation perplexity (GenPPL, lower is better) and sample entropy (higher is better) on OpenWebText-style generation. The released evaluation code defaults to gpt2-large for GenPPL.

Sampling steps Sampler GenPPL (lower is better) Entropy (higher is better)
32 Greedy-Tail 54.05 5.49
16 Greedy-Tail 68.04 5.55
8 Greedy-Tail 93.00 5.56
4 Greedy-Tail 144.74 4.28
32 Ancestral 63.10 5.54
16 Ancestral 78.00 5.58
8 Ancestral 117.88 5.62
4 Ancestral 495.85 5.56

For comparison, the Duo teacher is reported at 1024 steps with GenPPL 71.72 / entropy 5.22 under Greedy-Tail sampling and GenPPL 77.69 / entropy 5.55 under ancestral sampling.

Training Summary

IDLM-Duo was trained by initializing the student and fake model from the pretrained Duo teacher and alternating between:

  1. Updating the fake model on student-generated samples using the teacher diffusion loss.
  2. Updating the student using the teacher-fake loss gap.

The Duo setting uses a Gaussian relaxation and soft token inputs for stable backpropagation through the diffusion objective.

Citation

@article{li2026idlm,
  title={IDLM: Inverse-distilled Diffusion Language Models},
  author={Li, David and Gushchin, Nikita and Abulkhanov, Dmitry and Moulines, Eric and Oseledets, Ivan and Panov, Maxim and Korotin, Alexander},
  journal={arXiv preprint arXiv:2602.19066},
  year={2026}
}
Downloads last month
54
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kekchpek/idlm-duo

Base model

s-sahoo/duo
Finetuned
(2)
this model

Dataset used to train kekchpek/idlm-duo

Paper for kekchpek/idlm-duo