FLUX.1-dev LoRA – Ultra-Realistic Cinematic Photography

This repository contains a QLoRA fine-tuned adapter for black-forest-labs/FLUX.1-dev, trained to generate images in a consistent ultra-realistic cinematic photography style.

The LoRA was trained on the dataset:

📁 Dataset: akba08/ultra-realistic-cinematic-photography
🎨 Style: ultra-realistic, photorealistic, cinematic lighting, shallow depth of field, high dynamic range

This model is ideal if you want FLUX.1-dev to produce:

High-detail, realistic wildlife photos
Cinematic landscapes and seascapes
Premium food photography
Macro floral shots with bokeh
Emotional pet portraits

🧩 Model Details

Base model: black-forest-labs/FLUX.1-dev
Type: LoRA adapter (QLoRA) for the transformer (MMDiT) only
Library: 🤗 diffusers + peft + bitsandbytes
Quantization: NF4 (4-bit) for base transformer weights using BitsAndBytesConfig
Precision: fp16 mixed precision during training
Adapter rank: r = 4
Trainable parameters: LoRA layers on attention projections (to_q, to_k, to_v, to_out.0)
Frozen components:
- CLIP and T5 text encoders
- VAE (except during latent caching)
- Non-LoRA transformer weights

This repo contains only the LoRA weights. You must load them on top of the original FLUX.1-dev base model.

📚 Training Data

Dataset: akba08/ultra-realistic-cinematic-photography
Size: ~165 high-quality images
Content:
- Wildlife (lions, tigers, leopards, birds, flamingos, marine birds, etc.)
- Domestic animals (dogs, cats)
- Food photography (steak, lamb, casseroles, pizza, soups)
- Flowers and macro shots (lilies, blossoms, dew-covered petals)
- Landscapes and nature (beaches, mountains, rivers, sunsets)
Style characteristics:
- Ultra-realistic rendering
- Cinematic lighting (sunset, warm tones, moody light)
- Shallow depth of field & strong bokeh
- High detail & DSLR-like composition

Each image is paired with a descriptive prompt ending in:

..., ultra-realistic cinematic photography

🏋️ Training Procedure

Training was performed in Google Colab on a single T4 GPU using a modified version of the official flux_lora_quantization example.

Key Steps

Pre-compute text embeddings
- Used FluxPipeline + T5 encoder (text_encoder_2) from FLUX.1-dev
- Encoded all prompts and saved to a .parquet file:
  - embeddings_akba08_finetune.parquet
- This allows training without keeping the heavy text encoders in GPU memory.
Set up QLoRA training
- Load:
  - VAE in fp16
  - Transformer in 4-bit NF4 with BitsAndBytesConfig
- Prepare transformer for k-bit training with prepare_model_for_kbit_training
- Add LoRA adapters with:
```
LoraConfig(
  r=4,
  lora_alpha=4,
  init_lora_weights="gaussian",
  target_modules=["to_k", "to_q", "to_v", "to_out.0"],
)
```
Cache latents
- Encode all images once with the VAE
- Store latent_dist in RAM
- Remove VAE from GPU to reduce VRAM usage
Training configuration

pretrained_model_name_or_path = "black-forest-labs/FLUX.1-dev"
data_df_path = "embeddings_akba08_finetune.parquet"
output_dir = "akba08_finetune_lora_flux_nf4"

width, height = 896, 896
train_batch_size = 1
learning_rate = 1e-4
guidance_scale = 1.0

gradient_accumulation_steps = 4
gradient_checkpointing = True
rank = 4
max_train_steps = 700      # demo runs also tested with 100 steps
checkpointing_steps = 100
mixed_precision = "fp16"
weighting_scheme = "none"
report_to = "wandb"
seed = 0

Optimizer: bitsandbytes.optim.AdamW8bit
Scheduler: constant LR schedule via get_scheduler
Noise schedule: FlowMatchEulerDiscreteScheduler from FLUX.1-dev
Loss: weighted MSE on noise - latents following the SD3 / FLUX training utilities

Hardware & Memory

Environment: Google Colab, free T4 GPU
Approx. memory usage: ~9.8 GB VRAM during training
Example run:
- max_train_steps = 100 → ~30 minutes on T4
- Extrapolated max_train_steps = 700 → several hours on T4 (or ~40 min on 4090, per original blogpost reference)

🎯 Intended Use

This LoRA is intended for:

Generating ultra-realistic, cinematic photographs of:
- Animals (wild and domestic)
- Food and dishes
- Flowers and nature
- Landscapes and seascapes
Fine-tuning experiments with FLUX.1-dev on consumer hardware
Demonstrating QLoRA + NF4 + LoRA + cached latents for efficient training

Not intended for:

Sensitive, medical, biometric, or surveillance use
Critical decision-making applications
Generating harmful, misleading, or NSFW content

⚙️ How to Use

1️⃣ Load FLUX.1-dev and this LoRA (Diffusers)

import torch
from diffusers import FluxPipeline

base_model = "black-forest-labs/FLUX.1-dev"
lora_repo = "akba08/flux_cinematic_photography_finetuned"

pipe = FluxPipeline.from_pretrained(
    base_model,
    torch_dtype=torch.float16,
    device_map="auto",
)

pipe.load_lora_weights(lora_repo)
pipe.to("cuda")

prompt = "tropical beach at sunset with palm trees leaning over calm waves, vibrant pink-orange sky, high-detail reflections on water, ultra-realistic cinematic photography"
image = pipe(
    prompt,
    num_inference_steps=28,
    guidance_scale=3.5,
    height=896,
    width=1024,
    generator=torch.manual_seed(0),
).images[0]

image.save("flux_cinematic_beach.png")

2️⃣ General Prompting Tips

The LoRA is trained heavily on prompts that end with:

ultra-realistic cinematic photography or ultra-realistic cinematic photography style

Good prompt patterns:

"a golden retriever lying on a wooden floor with soft window light on its face, expressive eyes, shallow depth of field, ultra-realistic cinematic photography style"
"macro close-up of a pink lily covered in morning dew, glowing sunrise light, creamy bokeh background, ultra-realistic cinematic photography"
"a steaming cast-iron pan filled with roasted vegetables and herb-seasoned beef, warm rustic kitchen lighting, soft bokeh, ultra-realistic cinematic photography"

🔬 Evaluation & Qualitative Results

Validation prompts used during testing included:

"tropical beach at sunset with palm trees leaning over calm waves, vibrant pink-orange sky, high-detail reflections on water, ultra-realistic cinematic photography"
"a steaming cast-iron pan filled with roasted vegetables and herb-seasoned beef, warm rustic kitchen lighting, soft bokeh, ultra-realistic cinematic photography"
"macro close-up of a pink lily covered in morning dew, glowing sunrise light, creamy bokeh background, ultra-realistic cinematic photography"
"a golden retriever lying on a wooden floor with soft window light on its face, expressive eyes, shallow depth of field, ultra-realistic cinematic photography style"

Qualitatively, the LoRA:

Preserves FLUX.1-dev’s high fidelity and coherence
Strengthens:
- Cinematic color grading
- Shallow DOF / bokeh
- Realistic lighting and reflections
Produces consistent high-detail photographic outputs across categories (animals, food, nature, flowers)

🧪 FP8 / torchao Note (Optional Extension)

The training script can be extended to FP8 training with torchao on GPUs with compute capability ≥ 8.9 (e.g., NVIDIA H100).

Key ideas:

Use torchao.float8.convert_to_float8_training to inject FP8 layers
Define module_filter_fn to choose which modules go FP8
Train with --do_fp8_training for further memory/speed efficiencies

The Colab version here focuses on NF4 + QLoRA for consumer-grade hardware (T4 / 4090 style setups).

🔒 Limitations & Risks

Biases: The dataset is biased toward:
- Animals, nature, landscapes, food, flowers
- Bright / moody cinematic scenes
- No people/portraits in training (or very limited)
Out-of-distribution prompts:
- May produce less coherent results for:
  - Abstract concepts
  - Complex multi-character scenes
  - Text-heavy compositions
Style locking:
- Prompts might lean toward cinematic grading even when you prompt for neutral/flat styles, especially if you include the phrase “cinematic photography”.

Users are responsible for ensuring that generated content respects local laws, platform policies, and ethical norms.

📜 License

Base model license: see black-forest-labs/FLUX.1-dev
LoRA adapter license: TBD by the author (currently marked as license: other in the metadata — please update to your chosen license such as mit, apache-2.0, etc.)

📖 Citation

Coming Soon

🤝 Acknowledgements

Base model: Black Forest Labs – FLUX.1-dev
Training framework: 🤗 diffusers, accelerate, peft, bitsandbytes
Quantization ideas & FP8 references inspired by:
- Hugging Face flux_lora_quantization research example
- torchao FP8 training resources

💬 Contact

For questions, issues, or suggestions:

Open an issue or discussion on this model’s Hugging Face model page.
Mention this model by name: “FLUX.1-dev LoRA – Ultra-Realistic Cinematic Photography”.

Downloads last month: 8

Model tree for akba08/flux_cinematic_photography_finetuned

Base model

black-forest-labs/FLUX.1-dev

Adapter

(36360)

this model