arxiv:2601.22108

Value-Based Pre-Training with Downstream Feedback

Published on Jan 29

· Submitted by

Shuqi Ke on Feb 2

Carnegie Mellon University

Upvote

Authors:

Shuqi Ke ,

Abstract

V-Pretraining uses downstream task gradients to reshape pretraining objectives, improving model capabilities with minimal labeled data and reduced computational costs.

AI-generated summary

Can a small amount of verified goal information steer the expensive self-supervised pretraining of foundation models? Standard pretraining optimizes a fixed proxy objective (e.g., next-token prediction), which can misallocate compute away from downstream capabilities of interest. We introduce V-Pretraining: a value-based, modality-agnostic method for controlled continued pretraining in which a lightweight task designer reshapes the pretraining task to maximize the value of each gradient step. For example, consider self-supervised learning (SSL) with sample augmentation. The V-Pretraining task designer selects pretraining tasks (e.g., augmentations) for which the pretraining loss gradient is aligned with a gradient computed over a downstream task (e.g., image segmentation). This helps steer pretraining towards relevant downstream capabilities. Notably, the pretrained model is never updated on downstream task labels; they are used only to shape the pretraining task. Under matched learner update budgets, V-Pretraining of 0.5B--7B language models improves reasoning (GSM8K test Pass@1) by up to 18% relative over standard next-token prediction using only 12% of GSM8K training examples as feedback. In vision SSL, we improve the state-of-the-art results on ADE20K by up to 1.07 mIoU and reduce NYUv2 RMSE while improving ImageNet linear accuracy, and we provide pilot evidence of improved token efficiency in continued pretraining.

View arXiv page View PDF Add to collection

Community

shuqike

Paper author Paper submitter about 3 hours ago

We’re entering the age of research, not just the age of scaling. Bigger models gave us horsepower. But pretraining still has almost no steering wheel.

Today’s foundation models learn in an open loop: pick a proxy objective (next‑token / fixed augmentations) → burn trillions of tokens → hope the capabilities we care about “emerge”.

That hope is getting expensive. If the “AGI won’t happen from brute-force scaling” camp is even partly right, then the bottleneck is clear: value per gradient step.

So we asked a practical question: Can a small amount of verified goal information steer the massive unlabeled pretraining phase—without turning pretraining into supervised finetuning?

A potential answer: V‑Pretraining (Value‑Based Pre‑Training with Downstream Feedback). https://arxiv.org/abs/2601.22108. Idea: add a lightweight task designer that reshapes the self‑supervised task so each unlabeled update is more useful downstream.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.22108 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2601.22108 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.22108 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.