Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation Paper • 2604.24763 • Published 5 days ago • 64
Enhancing Spatial Understanding in Image Generation via Reward Modeling Paper • 2602.24233 • Published Feb 27 • 60
ScaleEdit-12M: Scaling Open-Source Image Editing Data Generation via Multi-Agent Framework Paper • 2603.20644 • Published Mar 21 • 5
EditCaption: Human-Aligned Instruction Synthesis for Image Editing via Supervised Fine-Tuning and Direct Preference Optimization Paper • 2604.08213 • Published 23 days ago • 1
ImageEdit-R1: Boosting Multi-Agent Image Editing via Reinforcement Learning Paper • 2603.08059 • Published Mar 9 • 1
OARS: Process-Aware Online Alignment for Generative Real-World Image Super-Resolution Paper • 2603.12811 • Published Mar 13 • 1
EditHF-1M: A Million-Scale Rich Human Preference Feedback for Image Editing Paper • 2603.14916 • Published Mar 16 • 1
Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation Paper • 2603.12247 • Published Mar 12 • 24
Diff-Aid: Inference-time Adaptive Interaction Denoising for Rectified Text-to-Image Generation Paper • 2602.13585 • Published Feb 27 • 1
Omni-Video 2: Scaling MLLM-Conditioned Diffusion for Unified Video Generation and Editing Paper • 2602.08820 • Published Feb 9 • 1
DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing Paper • 2602.12205 • Published Feb 12 • 82
RPiAE: A Representation-Pivoted Autoencoder Enhancing Both Image Generation and Editing Paper • 2603.19206 • Published Mar 19 • 1
RefAlign: Representation Alignment for Reference-to-Video Generation Paper • 2603.25743 • Published Mar 26 • 2
EVLF: Early Vision-Language Fusion for Generative Dataset Distillation Paper • 2603.07476 • Published Mar 8 • 1
Rethinking UMM Visual Generation: Masked Modeling for Efficient Image-Only Pre-training Paper • 2603.16139 • Published Mar 17 • 33
OmniScript: Towards Audio-Visual Script Generation for Long-Form Cinematic Video Paper • 2604.11102 • Published 19 days ago • 8