Stable-Lime-v1.0

Stable-Lime-v1.0 is an unconditional diffusion model based on the Denoising Diffusion Probabilistic Models (DDPM) architecture. It has been trained specifically to generate images representing the "essence of Lime."
Model Details
- Model Type: Unconditional Image Generation (Diffusion)
- Architecture: UNet2DModel with DDPMScheduler
- Framework: PyTorch & Hugging Face Diffusers
- Resolution: $64 \times 64$ pixels
- Channels: 3 (RGB)
- License: MIT (Assumed based on open-source usage)
Intended Use
This model is designed for:
- Generating $64 \times 64$ images of limes (or lime-like textures).
- Educational purposes regarding the implementation of DDPM loops.
- Low-resolution, "retro" aesthetic generation.
Out of Scope:
- Text-to-Image generation (this model does not accept text prompts).
- High-resolution photorealism (limited by the 64px architecture).
Training Data
The model was trained on a proprietary dataset located at dataset_lime/processed.
- Preprocessing: Images were resized to $64 \times 64$ and normalized to the range $[-1, 1]$.
- Augmentation: Random horizontal flips were applied during training to improve generalization.
Training Procedure
Hyperparameters
The model was trained using the following configuration ("The Lime Settings"):
| Parameter |
Value |
Description |
| Batch Size |
16 |
Small batch size suitable for consumer GPUs. |
| Learning Rate |
$1 \times 10^{-4}$ |
Optimizer step size (AdamW). |
| Epochs |
5 |
Note: This is a very short training duration. |
| Timesteps |
1000 |
Number of diffusion noise steps. |
| Image Size |
64 |
Output resolution. |
Architecture Specification
The U-Net architecture utilizes a deep structure with attention mechanisms in the lower bottleneck layers:
- Block Output Channels:
(128, 128, 256, 256, 512, 512)
- Downsampling: 4x
DownBlock2D, 1x AttnDownBlock2D, 1x DownBlock2D
- Upsampling: Mirror of downsampling blocks.
Loss Function
The model optimizes the Mean Squared Error (MSE) between the actual noise added and the predicted noise:
L=MSE(ϵ,ϵθ(xt,t))
Where $\epsilon$ is the Gaussian noise and $\epsilon_\theta$ is the model's prediction at timestep $t$.
Limitations & Biases
- Undertraining Risk: With only 5 Epochs, the model may not have fully converged. Generated images might appear blurry or retain significant noise (static) rather than clear lime features.
- Resolution: The output is strictly $64 \times 64$, resulting in pixelated, low-fidelity images.
- Dataset Bias: The model's output is entirely dependent on the variety found in
dataset_lime. If the dataset contained only green limes, it will not generate yellow limes (lemons).