saeed abhari
galois77
AI & ML interests
None yet
Recent Activity
updated
a collection
8 days ago
THE ORB
updated
a collection
28 days ago
THE ORB
updated
a collection
about 1 month ago
Multi-language
Organizations
None yet
energy based models
Poetry
-
Fann or Flop: A Multigenre, Multiera Benchmark for Arabic Poetry Understanding in LLMs
Paper • 2505.18152 • Published • 1 -
AraPoemBERT: A Pretrained Language Model for Arabic Poetry Analysis
Paper • 2403.12392 • Published -
ALBERTI, a Multilingual Domain Specific Language Model for Poetry Analysis
Paper • 2307.01387 • Published • 1 -
CharPoet: A Chinese Classical Poetry Generation System Based on Token-free LLM
Paper • 2401.03512 • Published
Agentic
-
Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning
Paper • 2505.01441 • Published • 39 -
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities
Paper • 2504.16078 • Published • 21 -
Emergent Agentic Transformer from Chain of Hindsight Experience
Paper • 2305.16554 • Published -
DiaTool-DPO: Multi-Turn Direct Preference Optimization for Tool-Augmented Large Language Models
Paper • 2504.02882 • Published • 7
Inference
Videos
-
AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM
Paper • 2503.04504 • Published • 4 -
Zero-1-to-A: Zero-Shot One Image to Animatable Head Avatars Using Video Diffusion
Paper • 2503.15851 • Published • 10 -
NormalCrafter: Learning Temporally Consistent Normals from Video Diffusion Priors
Paper • 2504.11427 • Published • 19 -
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation
Paper • 2505.04512 • Published • 36
Image generation
-
Continuous Diffusion Model for Language Modeling
Paper • 2502.11564 • Published • 53 -
EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer
Paper • 2503.07027 • Published • 29 -
Efficient Generative Model Training via Embedded Representation Warmup
Paper • 2504.10188 • Published • 12 -
Improving Editability in Image Generation with Layer-wise Memory
Paper • 2505.01079 • Published • 29
RL
-
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 -
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 -
Process-Supervised Reinforcement Learning for Code Generation
Paper • 2502.01715 • Published
Benchmarks and challenges
-
PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models
Paper • 2502.01584 • Published • 9 -
CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging
Paper • 2502.05664 • Published • 24 -
Craw4LLM: Efficient Web Crawling for LLM Pretraining
Paper • 2502.13347 • Published • 30 -
Can Large Language Models Help Multimodal Language Analysis? MMLA: A Comprehensive Benchmark
Paper • 2504.16427 • Published • 18
Evaluators
THE ORB
-
UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist
Paper • 2511.08521 • Published • 37 -
Black-Box On-Policy Distillation of Large Language Models
Paper • 2511.10643 • Published • 48 -
Depth Anything 3: Recovering the Visual Space from Any Views
Paper • 2511.10647 • Published • 95 -
VGGT: Visual Geometry Grounded Transformer
Paper • 2503.11651 • Published • 35
OCR
Multi-language
Multimodal
-
Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction
Paper • 2505.02471 • Published • 15 -
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Paper • 2505.02567 • Published • 80 -
Self-Rewarding Vision-Language Model via Reasoning Decomposition
Paper • 2508.19652 • Published • 84
Check-later
-
Hogwild! Inference: Parallel LLM Generation via Concurrent Attention
Paper • 2504.06261 • Published • 110 -
InteractVLM: 3D Interaction Reasoning from 2D Foundational Models
Paper • 2504.05303 • Published • 5 -
TTRL: Test-Time Reinforcement Learning
Paper • 2504.16084 • Published • 120 -
Hybrid 3D-4D Gaussian Splatting for Fast Dynamic Scene Representation
Paper • 2505.13215 • Published • 29
ahan
-
PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation
Paper • 2404.13026 • Published • 24 -
Distilling Diversity and Control in Diffusion Models
Paper • 2503.10637 • Published • 14 -
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing
Paper • 2503.10639 • Published • 53 -
GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation
Paper • 2504.02782 • Published • 57
Training optimization
-
The Curse of Depth in Large Language Models
Paper • 2502.05795 • Published • 40 -
Transformers without Normalization
Paper • 2503.10622 • Published • 170 -
Parallel Scaling Law for Language Models
Paper • 2505.10475 • Published • 83 -
Learning to Skip the Middle Layers of Transformers
Paper • 2506.21103 • Published • 18
Reasoning
-
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper • 2501.18585 • Published • 61 -
Improving Multi-Step Reasoning Abilities of Large Language Models with Direct Advantage Policy Optimization
Paper • 2412.18279 • Published -
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback
Paper • 2501.10799 • Published • 15 -
Reward-Guided Speculative Decoding for Efficient LLM Reasoning
Paper • 2501.19324 • Published • 39
Instructions
Thousand brains theory
THE ORB
-
UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist
Paper • 2511.08521 • Published • 37 -
Black-Box On-Policy Distillation of Large Language Models
Paper • 2511.10643 • Published • 48 -
Depth Anything 3: Recovering the Visual Space from Any Views
Paper • 2511.10647 • Published • 95 -
VGGT: Visual Geometry Grounded Transformer
Paper • 2503.11651 • Published • 35
energy based models
OCR
Poetry
-
Fann or Flop: A Multigenre, Multiera Benchmark for Arabic Poetry Understanding in LLMs
Paper • 2505.18152 • Published • 1 -
AraPoemBERT: A Pretrained Language Model for Arabic Poetry Analysis
Paper • 2403.12392 • Published -
ALBERTI, a Multilingual Domain Specific Language Model for Poetry Analysis
Paper • 2307.01387 • Published • 1 -
CharPoet: A Chinese Classical Poetry Generation System Based on Token-free LLM
Paper • 2401.03512 • Published
Multi-language
Agentic
-
Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning
Paper • 2505.01441 • Published • 39 -
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities
Paper • 2504.16078 • Published • 21 -
Emergent Agentic Transformer from Chain of Hindsight Experience
Paper • 2305.16554 • Published -
DiaTool-DPO: Multi-Turn Direct Preference Optimization for Tool-Augmented Large Language Models
Paper • 2504.02882 • Published • 7
Multimodal
-
Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction
Paper • 2505.02471 • Published • 15 -
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Paper • 2505.02567 • Published • 80 -
Self-Rewarding Vision-Language Model via Reasoning Decomposition
Paper • 2508.19652 • Published • 84
Inference
Check-later
-
Hogwild! Inference: Parallel LLM Generation via Concurrent Attention
Paper • 2504.06261 • Published • 110 -
InteractVLM: 3D Interaction Reasoning from 2D Foundational Models
Paper • 2504.05303 • Published • 5 -
TTRL: Test-Time Reinforcement Learning
Paper • 2504.16084 • Published • 120 -
Hybrid 3D-4D Gaussian Splatting for Fast Dynamic Scene Representation
Paper • 2505.13215 • Published • 29
Videos
-
AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM
Paper • 2503.04504 • Published • 4 -
Zero-1-to-A: Zero-Shot One Image to Animatable Head Avatars Using Video Diffusion
Paper • 2503.15851 • Published • 10 -
NormalCrafter: Learning Temporally Consistent Normals from Video Diffusion Priors
Paper • 2504.11427 • Published • 19 -
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation
Paper • 2505.04512 • Published • 36
ahan
-
PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation
Paper • 2404.13026 • Published • 24 -
Distilling Diversity and Control in Diffusion Models
Paper • 2503.10637 • Published • 14 -
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing
Paper • 2503.10639 • Published • 53 -
GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation
Paper • 2504.02782 • Published • 57
Image generation
-
Continuous Diffusion Model for Language Modeling
Paper • 2502.11564 • Published • 53 -
EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer
Paper • 2503.07027 • Published • 29 -
Efficient Generative Model Training via Embedded Representation Warmup
Paper • 2504.10188 • Published • 12 -
Improving Editability in Image Generation with Layer-wise Memory
Paper • 2505.01079 • Published • 29
Training optimization
-
The Curse of Depth in Large Language Models
Paper • 2502.05795 • Published • 40 -
Transformers without Normalization
Paper • 2503.10622 • Published • 170 -
Parallel Scaling Law for Language Models
Paper • 2505.10475 • Published • 83 -
Learning to Skip the Middle Layers of Transformers
Paper • 2506.21103 • Published • 18
RL
-
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 -
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 -
Process-Supervised Reinforcement Learning for Code Generation
Paper • 2502.01715 • Published
Reasoning
-
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper • 2501.18585 • Published • 61 -
Improving Multi-Step Reasoning Abilities of Large Language Models with Direct Advantage Policy Optimization
Paper • 2412.18279 • Published -
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback
Paper • 2501.10799 • Published • 15 -
Reward-Guided Speculative Decoding for Efficient LLM Reasoning
Paper • 2501.19324 • Published • 39
Benchmarks and challenges
-
PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models
Paper • 2502.01584 • Published • 9 -
CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging
Paper • 2502.05664 • Published • 24 -
Craw4LLM: Efficient Web Crawling for LLM Pretraining
Paper • 2502.13347 • Published • 30 -
Can Large Language Models Help Multimodal Language Analysis? MMLA: A Comprehensive Benchmark
Paper • 2504.16427 • Published • 18
Instructions
Evaluators