Improve model card: Add tags, detailed description, usage, and citation

by nielsr HF Staff - opened Nov 18, 2025

←

nielsr

Nov 18, 2025

This PR significantly enhances the model card for SpatialThinker by:

Adding the pipeline_tag: image-text-to-text to improve discoverability for multimodal vision-language tasks.
Adding library_name: transformers as evidenced by the model's architecture files and GitHub requirements, which will enable the automated "how to use" widget.
Expanding the model description with the paper's abstract.
Including direct links to the Hugging Face paper page, the project page, and the GitHub repository.
Integrating key sections from the GitHub README, including Requirements, Installation, Training, Merge Checkpoints, Evaluation (with shell command examples), Supported Evaluation Datasets, Citation (BibTeX), and Acknowledgements to provide comprehensive usage information.
Adding the overview image from the GitHub repository.

These updates aim to provide a more complete and useful model card for the Hugging Face community.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment