Sam
samsam55
·
AI & ML interests
None yet
Recent Activity
updated
a collection
29 days ago
Computer Use
updated
a collection
about 2 months ago
Visual Multi Modal LLM
updated
a collection
about 2 months ago
Misc
Organizations
None yet
Self Improving
Deep Search
Computer Use
Visual Multi Modal LLM
-
NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints
Paper • 2510.08565 • Published • 19 -
Detect Anything via Next Point Prediction
Paper • 2510.12798 • Published • 46 -
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper • 2510.14528 • Published • 108 -
DeepEyesV2: Toward Agentic Multimodal Model
Paper • 2511.05271 • Published • 42
Misc
-
UNIDOC-BENCH: A Unified Benchmark for Document-Centric Multimodal RAG
Paper • 2510.03663 • Published • 15 -
LLM-guided Hierarchical Retrieval
Paper • 2510.13217 • Published • 19 -
AnyUp: Universal Feature Upsampling
Paper • 2510.12764 • Published • 11 -
katanemo/Arch-Router-1.5B
Text Generation • 2B • Updated • 2.97k • • 237
3D Models & Modeling
-
Towards Scalable and Consistent 3D Editing
Paper • 2510.02994 • Published • 5 -
UP2You: Fast Reconstruction of Yourself from Unconstrained Photo Collections
Paper • 2509.24817 • Published • 8 -
NANO3D: A Training-Free Approach for Efficient 3D Editing Without Masks
Paper • 2510.15019 • Published • 63 -
Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery
Paper • 2510.15869 • Published • 48
Datasets
Run on CPU Optimizations
World View Creation (out painting 3D)
Coding LLMs
TTS & Speech to Text
-
Taming Text-to-Sounding Video Generation via Advanced Modality Condition and Interaction
Paper • 2510.03117 • Published • 11 -
ResembleAI/chatterbox
Text-to-Speech • Updated • 525k • • 1.37k -
thewh1teagle/phonikud
0.3B • Updated • 205 • 1 -
UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE
Paper • 2510.13344 • Published • 62
Agents
Reinforcement Learning Etc..
Datasets
Self Improving
Run on CPU Optimizations
Deep Search
World View Creation (out painting 3D)
Computer Use
Coding LLMs
Visual Multi Modal LLM
-
NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints
Paper • 2510.08565 • Published • 19 -
Detect Anything via Next Point Prediction
Paper • 2510.12798 • Published • 46 -
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper • 2510.14528 • Published • 108 -
DeepEyesV2: Toward Agentic Multimodal Model
Paper • 2511.05271 • Published • 42
TTS & Speech to Text
-
Taming Text-to-Sounding Video Generation via Advanced Modality Condition and Interaction
Paper • 2510.03117 • Published • 11 -
ResembleAI/chatterbox
Text-to-Speech • Updated • 525k • • 1.37k -
thewh1teagle/phonikud
0.3B • Updated • 205 • 1 -
UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE
Paper • 2510.13344 • Published • 62
Misc
-
UNIDOC-BENCH: A Unified Benchmark for Document-Centric Multimodal RAG
Paper • 2510.03663 • Published • 15 -
LLM-guided Hierarchical Retrieval
Paper • 2510.13217 • Published • 19 -
AnyUp: Universal Feature Upsampling
Paper • 2510.12764 • Published • 11 -
katanemo/Arch-Router-1.5B
Text Generation • 2B • Updated • 2.97k • • 237
Agents
3D Models & Modeling
-
Towards Scalable and Consistent 3D Editing
Paper • 2510.02994 • Published • 5 -
UP2You: Fast Reconstruction of Yourself from Unconstrained Photo Collections
Paper • 2509.24817 • Published • 8 -
NANO3D: A Training-Free Approach for Efficient 3D Editing Without Masks
Paper • 2510.15019 • Published • 63 -
Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery
Paper • 2510.15869 • Published • 48