FLUX.1-dev LoRA β Ultra-Realistic Cinematic Photography
This repository contains a QLoRA fine-tuned adapter for black-forest-labs/FLUX.1-dev, trained to generate images in a consistent ultra-realistic cinematic photography style.
The LoRA was trained on the dataset:
- π Dataset:
akba08/ultra-realistic-cinematic-photography - π¨ Style: ultra-realistic, photorealistic, cinematic lighting, shallow depth of field, high dynamic range
This model is ideal if you want FLUX.1-dev to produce:
- High-detail, realistic wildlife photos
- Cinematic landscapes and seascapes
- Premium food photography
- Macro floral shots with bokeh
- Emotional pet portraits
π§© Model Details
- Base model:
black-forest-labs/FLUX.1-dev - Type: LoRA adapter (QLoRA) for the transformer (MMDiT) only
- Library: π€
diffusers+peft+bitsandbytes - Quantization: NF4 (4-bit) for base transformer weights using
BitsAndBytesConfig - Precision:
fp16mixed precision during training - Adapter rank:
r = 4 - Trainable parameters: LoRA layers on attention projections (
to_q,to_k,to_v,to_out.0) - Frozen components:
- CLIP and T5 text encoders
- VAE (except during latent caching)
- Non-LoRA transformer weights
This repo contains only the LoRA weights. You must load them on top of the original FLUX.1-dev base model.
π Training Data
- Dataset:
akba08/ultra-realistic-cinematic-photography - Size: ~165 high-quality images
- Content:
- Wildlife (lions, tigers, leopards, birds, flamingos, marine birds, etc.)
- Domestic animals (dogs, cats)
- Food photography (steak, lamb, casseroles, pizza, soups)
- Flowers and macro shots (lilies, blossoms, dew-covered petals)
- Landscapes and nature (beaches, mountains, rivers, sunsets)
- Style characteristics:
- Ultra-realistic rendering
- Cinematic lighting (sunset, warm tones, moody light)
- Shallow depth of field & strong bokeh
- High detail & DSLR-like composition
Each image is paired with a descriptive prompt ending in:
..., ultra-realistic cinematic photography
ποΈ Training Procedure
Training was performed in Google Colab on a single T4 GPU using a modified version of the official flux_lora_quantization example.
Key Steps
Pre-compute text embeddings
- Used
FluxPipeline+ T5 encoder (text_encoder_2) fromFLUX.1-dev - Encoded all prompts and saved to a
.parquetfile:embeddings_akba08_finetune.parquet
- This allows training without keeping the heavy text encoders in GPU memory.
- Used
Set up QLoRA training
- Load:
- VAE in
fp16 - Transformer in 4-bit NF4 with
BitsAndBytesConfig
- VAE in
- Prepare transformer for k-bit training with
prepare_model_for_kbit_training - Add LoRA adapters with:
LoraConfig( r=4, lora_alpha=4, init_lora_weights="gaussian", target_modules=["to_k", "to_q", "to_v", "to_out.0"], )
- Load:
Cache latents
- Encode all images once with the VAE
- Store
latent_distin RAM - Remove VAE from GPU to reduce VRAM usage
Training configuration
pretrained_model_name_or_path = "black-forest-labs/FLUX.1-dev"
data_df_path = "embeddings_akba08_finetune.parquet"
output_dir = "akba08_finetune_lora_flux_nf4"
width, height = 896, 896
train_batch_size = 1
learning_rate = 1e-4
guidance_scale = 1.0
gradient_accumulation_steps = 4
gradient_checkpointing = True
rank = 4
max_train_steps = 700 # demo runs also tested with 100 steps
checkpointing_steps = 100
mixed_precision = "fp16"
weighting_scheme = "none"
report_to = "wandb"
seed = 0
- Optimizer:
bitsandbytes.optim.AdamW8bit - Scheduler: constant LR schedule via
get_scheduler - Noise schedule:
FlowMatchEulerDiscreteSchedulerfrom FLUX.1-dev - Loss: weighted MSE on
noise - latentsfollowing the SD3 / FLUX training utilities
Hardware & Memory
Environment: Google Colab, free T4 GPU
Approx. memory usage: ~9.8 GB VRAM during training
Example run:
max_train_steps = 100β ~30 minutes on T4- Extrapolated
max_train_steps = 700β several hours on T4 (or ~40 min on 4090, per original blogpost reference)
π― Intended Use
This LoRA is intended for:
Generating ultra-realistic, cinematic photographs of:
- Animals (wild and domestic)
- Food and dishes
- Flowers and nature
- Landscapes and seascapes
Fine-tuning experiments with FLUX.1-dev on consumer hardware
Demonstrating QLoRA + NF4 + LoRA + cached latents for efficient training
Not intended for:
- Sensitive, medical, biometric, or surveillance use
- Critical decision-making applications
- Generating harmful, misleading, or NSFW content
βοΈ How to Use
1οΈβ£ Load FLUX.1-dev and this LoRA (Diffusers)
import torch
from diffusers import FluxPipeline
base_model = "black-forest-labs/FLUX.1-dev"
lora_repo = "akba08/flux_cinematic_photography_finetuned"
pipe = FluxPipeline.from_pretrained(
base_model,
torch_dtype=torch.float16,
device_map="auto",
)
pipe.load_lora_weights(lora_repo)
pipe.to("cuda")
prompt = "tropical beach at sunset with palm trees leaning over calm waves, vibrant pink-orange sky, high-detail reflections on water, ultra-realistic cinematic photography"
image = pipe(
prompt,
num_inference_steps=28,
guidance_scale=3.5,
height=896,
width=1024,
generator=torch.manual_seed(0),
).images[0]
image.save("flux_cinematic_beach.png")
2οΈβ£ General Prompting Tips
The LoRA is trained heavily on prompts that end with:
ultra-realistic cinematic photographyorultra-realistic cinematic photography style
Good prompt patterns:
"a golden retriever lying on a wooden floor with soft window light on its face, expressive eyes, shallow depth of field, ultra-realistic cinematic photography style""macro close-up of a pink lily covered in morning dew, glowing sunrise light, creamy bokeh background, ultra-realistic cinematic photography""a steaming cast-iron pan filled with roasted vegetables and herb-seasoned beef, warm rustic kitchen lighting, soft bokeh, ultra-realistic cinematic photography"
π¬ Evaluation & Qualitative Results
Validation prompts used during testing included:
"tropical beach at sunset with palm trees leaning over calm waves, vibrant pink-orange sky, high-detail reflections on water, ultra-realistic cinematic photography""a steaming cast-iron pan filled with roasted vegetables and herb-seasoned beef, warm rustic kitchen lighting, soft bokeh, ultra-realistic cinematic photography""macro close-up of a pink lily covered in morning dew, glowing sunrise light, creamy bokeh background, ultra-realistic cinematic photography""a golden retriever lying on a wooden floor with soft window light on its face, expressive eyes, shallow depth of field, ultra-realistic cinematic photography style"
Qualitatively, the LoRA:
Preserves FLUX.1-devβs high fidelity and coherence
Strengthens:
- Cinematic color grading
- Shallow DOF / bokeh
- Realistic lighting and reflections
Produces consistent high-detail photographic outputs across categories (animals, food, nature, flowers)
π§ͺ FP8 / torchao Note (Optional Extension)
The training script can be extended to FP8 training with torchao on GPUs with compute capability β₯ 8.9 (e.g., NVIDIA H100).
Key ideas:
- Use
torchao.float8.convert_to_float8_trainingto inject FP8 layers - Define
module_filter_fnto choose which modules go FP8 - Train with
--do_fp8_trainingfor further memory/speed efficiencies
The Colab version here focuses on NF4 + QLoRA for consumer-grade hardware (T4 / 4090 style setups).
π Limitations & Risks
Biases: The dataset is biased toward:
- Animals, nature, landscapes, food, flowers
- Bright / moody cinematic scenes
- No people/portraits in training (or very limited)
Out-of-distribution prompts:
May produce less coherent results for:
- Abstract concepts
- Complex multi-character scenes
- Text-heavy compositions
Style locking:
- Prompts might lean toward cinematic grading even when you prompt for neutral/flat styles, especially if you include the phrase βcinematic photographyβ.
Users are responsible for ensuring that generated content respects local laws, platform policies, and ethical norms.
π License
- Base model license: see
black-forest-labs/FLUX.1-dev - LoRA adapter license: TBD by the author (currently marked as
license: otherin the metadata β please update to your chosen license such asmit,apache-2.0, etc.)
π Citation
Coming Soon
π€ Acknowledgements
Base model: Black Forest Labs β
FLUX.1-devTraining framework: π€ diffusers, accelerate, peft, bitsandbytes
Quantization ideas & FP8 references inspired by:
- Hugging Face
flux_lora_quantizationresearch example - torchao FP8 training resources
- Hugging Face
π¬ Contact
For questions, issues, or suggestions:
- Open an issue or discussion on this modelβs Hugging Face model page.
- Mention this model by name: βFLUX.1-dev LoRA β Ultra-Realistic Cinematic Photographyβ.
- Downloads last month
- 8
Model tree for akba08/flux_cinematic_photography_finetuned
Base model
black-forest-labs/FLUX.1-dev