Ye Fang

aleafy

AI & ML interests

None yet

Recent Activity

upvoted a paper 21 days ago

V-RGBX: Video Editing with Accurate Controls over Intrinsic Properties

upvoted a paper about 1 month ago

ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation

updated a model 3 months ago

aleafy/test

View all activity

Organizations

None yet

upvoted a paper 21 days ago

V-RGBX: Video Editing with Accurate Controls over Intrinsic Properties

Paper • 2512.11799 • Published 23 days ago • 29

upvoted a paper about 1 month ago

ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation

Paper • 2512.03036 • Published Dec 2, 2025 • 20

upvoted a paper 4 months ago

CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning

Paper • 2508.20096 • Published Aug 27, 2025 • 36

upvoted a paper 5 months ago

SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience

Paper • 2508.04700 • Published Aug 6, 2025 • 52

upvoted a paper 6 months ago

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction

Paper • 2507.15852 • Published Jul 21, 2025 • 38

upvoted a paper 7 months ago

Video World Models with Long-term Spatial Memory

Paper • 2506.05284 • Published Jun 5, 2025 • 55

upvoted 3 papers 9 months ago

MM-IFEngine: Towards Multimodal Instruction Following

Paper • 2504.07957 • Published Apr 10, 2025 • 35

GenDoP: Auto-regressive Camera Trajectory Generation as a Director of Photography

Paper • 2504.07083 • Published Apr 9, 2025 • 22

RelightVid: Temporal-Consistent Diffusion Model for Video Relighting

Paper • 2501.16330 • Published Jan 27, 2025 • 2

upvoted a paper 10 months ago

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

Paper • 2502.18411 • Published Feb 25, 2025 • 74

upvoted a paper 12 months ago

BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning

Paper • 2501.03226 • Published Jan 6, 2025 • 43

upvoted 5 papers about 1 year ago

FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models

Paper • 2412.07674 • Published Dec 10, 2024 • 20

X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models

Paper • 2412.01824 • Published Dec 2, 2024 • 64

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models

Paper • 2410.17637 • Published Oct 23, 2024 • 35

PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction

Paper • 2410.17247 • Published Oct 22, 2024 • 47

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree

Paper • 2410.16268 • Published Oct 21, 2024 • 69

upvoted 2 papers over 1 year ago

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Paper • 2407.03320 • Published Jul 3, 2024 • 94

Make-it-Real: Unleashing Large Multimodal Model's Ability for Painting 3D Objects with Realistic Materials

Paper • 2404.16829 • Published Apr 25, 2024 • 4

upvoted 2 papers almost 2 years ago

InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model

Paper • 2401.16420 • Published Jan 29, 2024 • 55

GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation

Paper • 2401.04092 • Published Jan 8, 2024 • 21

Ye Fang

AI & ML interests

Recent Activity

Organizations

aleafy's activity