SRA

Self-Representation Alignment for Diffusion Transformers

¹ Northwestern Polytechnical University ² SGIT AI Lab, State Grid Corporation of China
³ Zhejiang University of Technology ⁴ Baidu Inc.

^† Project lead ^* Corresponding author

^Preprint

🚈Overview

Recent studies have shown that learning a meaningful internal representation can both accelerate generative training and enhance generation quality of the diffusion transformers. However, existing approaches typically either introduce an additional and complex representation training framework or rely on a large-scale pre-trained representation foundation model to provide representational guidance during the original generative training.

In this work, we argue that the unique discriminative process inherent to diffusion transformers makes it possible to offer such guidance without needing external components. We thus introduce SRA, a simple yet straightforward method that introducing representation guidance through a self-distillation manner.

Experiment relusts show that SRA accelerates training and improves generation performance for both DiTs and SiTs.

🔑Approach

In short, SRA aligns the output latent representation of the diffusion transformer in earlier layer with higher noise to that in later layer with lower noise to progressively enhance the overall representation learning during only generative process.

🍀Results

SRA shows benifit on different baselines across different model size.

SRA shows comparable or superior performance against other methods that leverage either representation training paradigm or representation foundation model.

SRA genuinely enhances the representation capacity of the baseline model, and the generative capability is indeed strongly correlated with the representation guidance.

🌺Citation

@article{jiang2025sra, title={No Other Representation Component Is Needed: Diffusion Transformers Can Provide Representation Guidance by Themselves}, author={Jiang, Dengyang and Wang, Mengmeng and Li, Liuzhuozheng and Zhang, Lei and Wang, Haoyu and Wei, Wei and Dai, Guang and Zhang, Yanning and Wang, Jingdong}, journal={arXiv preprint arXiv:2505.02831}, year={2025} }

🎭SRA

Self-Representation Alignment for Diffusion Transformers

🚈Overview

🔍Observations

🔑Approach

🍀Results

🌺Citation