andrewdalpino/UltraZoom-2X · ONNX Inference Error

Oct 21, 2025

I tried to run the models(2X, 3X, 4X) inference with an image measuring 860x863 from my computer in .onnx format with onnxruntime_directml 1.20.1, and when I did so, the following error appeared: "Error in operation ‘Upscaling process’: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Got invalid dimensions for input: args_0 for the following indices
index: 2 Got: 863 Expected: 128
index: 3 Got: 860 Expected: 128
Please fix either the inputs/outputs or the model."

Do the models have measurement restrictions on the input image? As I can see in the export-onnx-model.ipynb file, the measurements you exported to .onnx format are not dynamic but static. Could you correct that and publish it in the HuggingFace repositories of UltraZoom 2x, 3x and 4x?

andrewdalpino

Owner Oct 21, 2025

Ah nice catch @binatheis , I think that has something to do with the way I exported the ONNX version. See the last cell in this notebook https://github.com/andrewdalpino/UltraZoom/blob/master/export-model.ipynb. I believe there is a way to specify that the last two dimensions of the input tensor are dynamic. I'll give this a look when I have free time.

Did you try the PyTorch version? Did that work ok?

binatheis

Oct 21, 2025

Ah nice catch @binatheis , I think that has something to do with the way I exported the ONNX version. See the last cell in this notebook https://github.com/andrewdalpino/UltraZoom/blob/master/export-model.ipynb. I believe there is a way to specify that the last two dimensions of the input tensor are dynamic. I'll give this a look when I have free time.

Did you try the PyTorch version? Did that work ok?

No, I haven't run the model with PyTorch, but I assume everything would work fine with it. The problem is with .onnx, as you know, it's a more accessible and easier format to run, since the dependencies are minimal. I hope you manage to fix it.

andrewdalpino

Owner Oct 21, 2025

•

edited Oct 21, 2025

Ok @binatheis , try the latest ONNX model I just uploaded and let me know if that fixed it.

binatheis

Oct 22, 2025

Ok @binatheis , try the latest ONNX model I just uploaded and let me know if that fixed it.

I confirm that everything is working now, thank you very much, Andrew. I hope you continue to publish similar content in the future, especially in .onnx format for consumer computers.

binatheis changed discussion status to closed Oct 22, 2025

andrewdalpino

Owner Oct 22, 2025

Sweet, I'm actually working on a version 2/fine-tune of Ultra Zoom currently, stay tuned!

andrewdalpino

Owner Nov 15, 2025

Hey @binatheis , I just released a new version of this model that integrated control modules into the architecture, allowing you to independently adjust the amount of deblurring, denoising, and deartifacting. I'd love to hear your feedback!

https://huggingface.co/andrewdalpino/UltraZoom-2X-Ctrl

binatheis

Nov 16, 2025

Hey @binatheis , I just released a new version of this model that integrated control modules into the architecture, allowing you to independently adjust the amount of deblurring, denoising, and deartifacting. I'd love to hear your feedback!

https://huggingface.co/andrewdalpino/UltraZoom-2X-Ctrl

Hi Andrew, yes. Just yesterday I was checking Github and saw that you had updated it. I already downloaded it from HugginFace repository, but I haven't tried it yet. I think that starting Monday I'll dedicate myself to testing it and finishing up the product I'll be launching with them. Could you explain those three parameters to me in more detail, what they do, and how they could be used in Python using the .onnx formats with onnxruntime?

binatheis changed discussion status to open Nov 16, 2025

andrewdalpino

Owner Nov 16, 2025

•

edited Nov 16, 2025

Sure, with the new "Ctrl" models the input is not only a low-resolution image x of size [b, 3, h, w] but also a control vector c of size [b, 3]. The 3 features of the control vector are gaussian_blur, gaussian_noise, and jpeg_compression. They are values between 0 and 1 and are interpreted as the level of degradation of that kind assumed to be present in the input image. So if you have a noisy camera, a meme that's been shared on social media 100 times, or an AI-generated image - you should be able to fine-tune the model at inference time to get the most out of it based on what you know about the source. Under the hood, the control vector gets injected into control modules within the encoder network steering the activations. This is how you'd do it using the ultrazoom library, with ONNX you'd just have to instantiate and input the control vector along with the image, the output is still the upscaled image.

import torch

from torchvision.io import decode_image, ImageReadMode
from torchvision.transforms.v2 import ToDtype, ToPILImage

from ultrazoom.model import UltraZoom
from ultrazoom.control import ControlVector


model_name = "andrewdalpino/UltraZoom-2X-Ctrl"
image_path = "./dataset/bird.png"

model = UltraZoom.from_pretrained(model_name)

image_to_tensor = ToDtype(torch.float32, scale=True)
tensor_to_pil = ToPILImage()

image = decode_image(image_path, mode=ImageReadMode.RGB)

x = image_to_tensor(image).unsqueeze(0)

c = ControlVector(
    gaussian_blur=0.5,      # Higher values indicate more degradation
    gaussian_noise=0.6,     # which increases the strength of the
    jpeg_compression=0.7    # enhancement.
).to_tensor()

y_pred = model.upscale(x, c)

pil_image = tensor_to_pil(y_pred.squeeze(0))

pil_image.show()

binatheis

Nov 16, 2025

Sure, with the new "Ctrl" models the input is not only a low-resolution image x of size [b, 3, h, w] but also a control vector c of size [b, 3]. The 3 features of the control vector are gaussian_blur, gaussian_noise, and jpeg_compression. They are values between 0 and 1 and are interpreted as the level of degradation of that kind assumed to be present in the input image. So if you have a noisy camera, a meme that's been shared on social media 100 times, or an AI-generated image - you should be able to fine-tune the model at inference time to get the most out of it based on what you know about the source. Under the hood, the control vector gets injected into control modules within the encoder network steering the activations. This is how you'd do it using the ultrazoom library, with ONNX you'd just have to instantiate and input the control vector along with the image, the output is still the upscaled image.
import torch

from torchvision.io import decode_image, ImageReadMode
from torchvision.transforms.v2 import ToDtype, ToPILImage

from ultrazoom.model import UltraZoom
from ultrazoom.control import ControlVector


model_name = "andrewdalpino/UltraZoom-2X-Ctrl"
image_path = "./dataset/bird.png"

model = UltraZoom.from_pretrained(model_name)

image_to_tensor = ToDtype(torch.float32, scale=True)
tensor_to_pil = ToPILImage()

image = decode_image(image_path, mode=ImageReadMode.RGB)

x = image_to_tensor(image).unsqueeze(0)

c = ControlVector(
    gaussian_blur=0.5,      # Higher values indicate more degradation
    gaussian_noise=0.6,     # which increases the strength of the
    jpeg_compression=0.7    # enhancement.
).to_tensor()

y_pred = model.upscale(x, c)

pil_image = tensor_to_pil(y_pred.squeeze(0))

pil_image.show()

I understand a little more now. The new feature serves to better guide image generation. It's as if we were exposing certain parameters that the model will use to improve generation, taking into account that not all input images are of the same quality. I'm going to try to test its use from onnxruntime to see if it works and what effects it would have on an image. It would be interesting if you could provide visual examples of how playing with the parameters affects the image and what abstraction needs to be done on an input image to know what value to set for each parameter.

andrewdalpino

Owner Nov 17, 2025

Yup, that's exactly right @binatheis . It's like have three knobs that you can adjust rather than letting the model make assumptions about how much degradation exists in the image.

Absolutely, I'll work on some examples and get back to you.

andrewdalpino

Owner Nov 17, 2025

Ok here we go

This comparison demonstrates the strength of the enhancements (deblurring, denoising, and deartifacting) applied to the upscaled image.

This comparison demonstrates the individual enhancements applied in isolation.

binatheis

Nov 17, 2025

Ok here we go

This comparison demonstrates the strength of the enhancements (deblurring, denoising, and deartifacting) applied to the upscaled image.

This comparison demonstrates the individual enhancements applied in isolation.

Perfect, now I understand perfectly. I'll start in a few minutes and then let you know how my experiment turned out.

binatheis

Nov 19, 2025

Ok here we go

This comparison demonstrates the strength of the enhancements (deblurring, denoising, and deartifacting) applied to the upscaled image.

This comparison demonstrates the individual enhancements applied in isolation.

Andrew, I've tried the new UltraZoom-2X-Ctrl model and I can confirm that the controls work perfectly in .onnx with onnxruntime in Python. I didn't need to use the Ultrazoom library that you have in pypi, as numpy was sufficient since we were using onnx.

I hope to continue hearing about your new projects. I have several in mind and would love to have a partner for one of them, as it involves knowledge of training, etc. Although it would be more like a bootstrapped startup idea.

andrewdalpino

Owner Nov 19, 2025

Awesome, glad to hear it works using ONNX runtime in Python. Let's keep in touch! Do you have Telegram? If so, you can find me in https://t.me/RubixML.

binatheis changed discussion status to closed Nov 20, 2025