FLUX.1-dev LoRA – Ultra-Realistic Cinematic Photography

This repository contains a QLoRA fine-tuned adapter for black-forest-labs/FLUX.1-dev, trained to generate images in a consistent ultra-realistic cinematic photography style.

The LoRA was trained on the dataset:

This model is ideal if you want FLUX.1-dev to produce:

  • High-detail, realistic wildlife photos
  • Cinematic landscapes and seascapes
  • Premium food photography
  • Macro floral shots with bokeh
  • Emotional pet portraits

🧩 Model Details

  • Base model: black-forest-labs/FLUX.1-dev
  • Type: LoRA adapter (QLoRA) for the transformer (MMDiT) only
  • Library: πŸ€— diffusers + peft + bitsandbytes
  • Quantization: NF4 (4-bit) for base transformer weights using BitsAndBytesConfig
  • Precision: fp16 mixed precision during training
  • Adapter rank: r = 4
  • Trainable parameters: LoRA layers on attention projections (to_q, to_k, to_v, to_out.0)
  • Frozen components:
    • CLIP and T5 text encoders
    • VAE (except during latent caching)
    • Non-LoRA transformer weights

This repo contains only the LoRA weights. You must load them on top of the original FLUX.1-dev base model.


πŸ“š Training Data

  • Dataset: akba08/ultra-realistic-cinematic-photography
  • Size: ~165 high-quality images
  • Content:
    • Wildlife (lions, tigers, leopards, birds, flamingos, marine birds, etc.)
    • Domestic animals (dogs, cats)
    • Food photography (steak, lamb, casseroles, pizza, soups)
    • Flowers and macro shots (lilies, blossoms, dew-covered petals)
    • Landscapes and nature (beaches, mountains, rivers, sunsets)
  • Style characteristics:
    • Ultra-realistic rendering
    • Cinematic lighting (sunset, warm tones, moody light)
    • Shallow depth of field & strong bokeh
    • High detail & DSLR-like composition

Each image is paired with a descriptive prompt ending in:

..., ultra-realistic cinematic photography


πŸ‹οΈ Training Procedure

Training was performed in Google Colab on a single T4 GPU using a modified version of the official flux_lora_quantization example.

Key Steps

  1. Pre-compute text embeddings

    • Used FluxPipeline + T5 encoder (text_encoder_2) from FLUX.1-dev
    • Encoded all prompts and saved to a .parquet file:
      • embeddings_akba08_finetune.parquet
    • This allows training without keeping the heavy text encoders in GPU memory.
  2. Set up QLoRA training

    • Load:
      • VAE in fp16
      • Transformer in 4-bit NF4 with BitsAndBytesConfig
    • Prepare transformer for k-bit training with prepare_model_for_kbit_training
    • Add LoRA adapters with:
      LoraConfig(
        r=4,
        lora_alpha=4,
        init_lora_weights="gaussian",
        target_modules=["to_k", "to_q", "to_v", "to_out.0"],
      )
      
  3. Cache latents

    • Encode all images once with the VAE
    • Store latent_dist in RAM
    • Remove VAE from GPU to reduce VRAM usage
  4. Training configuration

pretrained_model_name_or_path = "black-forest-labs/FLUX.1-dev"
data_df_path = "embeddings_akba08_finetune.parquet"
output_dir = "akba08_finetune_lora_flux_nf4"

width, height = 896, 896
train_batch_size = 1
learning_rate = 1e-4
guidance_scale = 1.0

gradient_accumulation_steps = 4
gradient_checkpointing = True
rank = 4
max_train_steps = 700      # demo runs also tested with 100 steps
checkpointing_steps = 100
mixed_precision = "fp16"
weighting_scheme = "none"
report_to = "wandb"
seed = 0
  • Optimizer: bitsandbytes.optim.AdamW8bit
  • Scheduler: constant LR schedule via get_scheduler
  • Noise schedule: FlowMatchEulerDiscreteScheduler from FLUX.1-dev
  • Loss: weighted MSE on noise - latents following the SD3 / FLUX training utilities

Hardware & Memory

  • Environment: Google Colab, free T4 GPU

  • Approx. memory usage: ~9.8 GB VRAM during training

  • Example run:

    • max_train_steps = 100 β†’ ~30 minutes on T4
    • Extrapolated max_train_steps = 700 β†’ several hours on T4 (or ~40 min on 4090, per original blogpost reference)

🎯 Intended Use

This LoRA is intended for:

  • Generating ultra-realistic, cinematic photographs of:

    • Animals (wild and domestic)
    • Food and dishes
    • Flowers and nature
    • Landscapes and seascapes
  • Fine-tuning experiments with FLUX.1-dev on consumer hardware

  • Demonstrating QLoRA + NF4 + LoRA + cached latents for efficient training

Not intended for:

  • Sensitive, medical, biometric, or surveillance use
  • Critical decision-making applications
  • Generating harmful, misleading, or NSFW content

βš™οΈ How to Use

1️⃣ Load FLUX.1-dev and this LoRA (Diffusers)

import torch
from diffusers import FluxPipeline

base_model = "black-forest-labs/FLUX.1-dev"
lora_repo = "akba08/flux_cinematic_photography_finetuned"

pipe = FluxPipeline.from_pretrained(
    base_model,
    torch_dtype=torch.float16,
    device_map="auto",
)

pipe.load_lora_weights(lora_repo)
pipe.to("cuda")

prompt = "tropical beach at sunset with palm trees leaning over calm waves, vibrant pink-orange sky, high-detail reflections on water, ultra-realistic cinematic photography"
image = pipe(
    prompt,
    num_inference_steps=28,
    guidance_scale=3.5,
    height=896,
    width=1024,
    generator=torch.manual_seed(0),
).images[0]

image.save("flux_cinematic_beach.png")

2️⃣ General Prompting Tips

The LoRA is trained heavily on prompts that end with:

ultra-realistic cinematic photography or ultra-realistic cinematic photography style

Good prompt patterns:

  • "a golden retriever lying on a wooden floor with soft window light on its face, expressive eyes, shallow depth of field, ultra-realistic cinematic photography style"
  • "macro close-up of a pink lily covered in morning dew, glowing sunrise light, creamy bokeh background, ultra-realistic cinematic photography"
  • "a steaming cast-iron pan filled with roasted vegetables and herb-seasoned beef, warm rustic kitchen lighting, soft bokeh, ultra-realistic cinematic photography"

πŸ”¬ Evaluation & Qualitative Results

Validation prompts used during testing included:

  • "tropical beach at sunset with palm trees leaning over calm waves, vibrant pink-orange sky, high-detail reflections on water, ultra-realistic cinematic photography"
  • "a steaming cast-iron pan filled with roasted vegetables and herb-seasoned beef, warm rustic kitchen lighting, soft bokeh, ultra-realistic cinematic photography"
  • "macro close-up of a pink lily covered in morning dew, glowing sunrise light, creamy bokeh background, ultra-realistic cinematic photography"
  • "a golden retriever lying on a wooden floor with soft window light on its face, expressive eyes, shallow depth of field, ultra-realistic cinematic photography style"

Qualitatively, the LoRA:

  • Preserves FLUX.1-dev’s high fidelity and coherence

  • Strengthens:

    • Cinematic color grading
    • Shallow DOF / bokeh
    • Realistic lighting and reflections
  • Produces consistent high-detail photographic outputs across categories (animals, food, nature, flowers)


πŸ§ͺ FP8 / torchao Note (Optional Extension)

The training script can be extended to FP8 training with torchao on GPUs with compute capability β‰₯ 8.9 (e.g., NVIDIA H100).

Key ideas:

  • Use torchao.float8.convert_to_float8_training to inject FP8 layers
  • Define module_filter_fn to choose which modules go FP8
  • Train with --do_fp8_training for further memory/speed efficiencies

The Colab version here focuses on NF4 + QLoRA for consumer-grade hardware (T4 / 4090 style setups).


πŸ”’ Limitations & Risks

  • Biases: The dataset is biased toward:

    • Animals, nature, landscapes, food, flowers
    • Bright / moody cinematic scenes
    • No people/portraits in training (or very limited)
  • Out-of-distribution prompts:

    • May produce less coherent results for:

      • Abstract concepts
      • Complex multi-character scenes
      • Text-heavy compositions
  • Style locking:

    • Prompts might lean toward cinematic grading even when you prompt for neutral/flat styles, especially if you include the phrase β€œcinematic photography”.

Users are responsible for ensuring that generated content respects local laws, platform policies, and ethical norms.


πŸ“œ License

  • Base model license: see black-forest-labs/FLUX.1-dev
  • LoRA adapter license: TBD by the author (currently marked as license: other in the metadata β€” please update to your chosen license such as mit, apache-2.0, etc.)

πŸ“– Citation

Coming Soon


🀝 Acknowledgements

  • Base model: Black Forest Labs – FLUX.1-dev

  • Training framework: πŸ€— diffusers, accelerate, peft, bitsandbytes

  • Quantization ideas & FP8 references inspired by:

    • Hugging Face flux_lora_quantization research example
    • torchao FP8 training resources

πŸ’¬ Contact

For questions, issues, or suggestions:

  • Open an issue or discussion on this model’s Hugging Face model page.
  • Mention this model by name: β€œFLUX.1-dev LoRA – Ultra-Realistic Cinematic Photography”.

Downloads last month
8
Inference Providers NEW

Model tree for akba08/flux_cinematic_photography_finetuned

Adapter
(36360)
this model