Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach Paper β’ 2512.02834 β’ Published about 1 month ago β’ 39
Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation Paper β’ 2507.08441 β’ Published Jul 11, 2025 β’ 61
Runtime error Featured 515 Florence2 + SAM2 π₯ 515 Segment and caption objects in images and videos
Physical AI Collection Collection of open, commercial-grade datasets for physical AI developers β’ 23 items β’ Updated 10 days ago β’ 103
TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models Paper β’ 2502.06608 β’ Published Feb 10, 2025 β’ 39
openai/whisper-medium Automatic Speech Recognition β’ 0.8B β’ Updated Feb 29, 2024 β’ 674k β’ 272