19 47 20

Xiangtai Li

LXT

https://lxtgh.github.io/

AI & ML interests

Computer Vision, Multi-Modal Understanding, Generative AI

Recent Activity

upvoted a paper 6 days ago

Vision-Language-Action Models for Autonomous Driving: Past, Present, and Future

upvoted a paper 6 days ago

LLaDA2.0: Scaling Up Diffusion Language Models to 100B

liked a model 6 days ago

WeiChow/EditMGT

View all activity

Organizations

authored 13 papers 9 days ago

DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World

Paper • 2506.24102 • Published Jun 30

One Flight Over the Gap: A Survey from Perspective to Panoramic Vision

Paper • 2509.04444 • Published Sep 4

VimoRAG: Video-based Retrieval-augmented 3D Motion Generation for Motion Language Models

Paper • 2508.12081 • Published Aug 16

DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training

Paper • 2510.11712 • Published Oct 13 • 30

Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs

Paper • 2510.18876 • Published Oct 21 • 36

Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark

Paper • 2510.26802 • Published Oct 30 • 33

Visual Spatial Tuning

Paper • 2511.05491 • Published Nov 7 • 51

Towards Open Vocabulary Learning: A Survey

Paper • 2306.15880 • Published Jun 28, 2023

RobuRCDet: Enhancing Robustness of Radar-Camera Fusion in Bird's Eye View for 3D Object Detection

Paper • 2502.13071 • Published Feb 18

MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation

Paper • 2511.09611 • Published Nov 12 • 68

authored 3 papers 6 months ago

Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models

Paper • 2505.24164 • Published May 30

UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions

Paper • 2506.13691 • Published Jun 16 • 2

Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology

Paper • 2507.07999 • Published Jul 10 • 49

authored 4 papers 7 months ago

OmniAudio: Generating Spatial Audio from 360-Degree Video

Paper • 2504.14906 • Published Apr 21

Towards Semantic Equivalence of Tokenization in Multimodal LLM

Paper • 2406.05127 • Published Jun 7, 2024

So-Fake: Benchmarking and Explaining Social Media Image Forgery Detection

Paper • 2505.18660 • Published May 24 • 1

PixelThink: Towards Efficient Chain-of-Pixel Reasoning

Paper • 2505.23727 • Published May 29 • 5

Xiangtai Li

AI & ML interests

Recent Activity

Organizations

LXT's activity