Configuration Parsing Warning:In adapter_config.json: "peft.base_model_name_or_path" must be a string
Configuration Parsing Warning:In adapter_config.json: "peft.task_type" must be a string
SD Image Upscaler β Stage-B SD 1.5 LoRA (research artifact)
A LoRA from the SD Image Upscaler capability study at github.com/bradhinkel/SD_image_upscaler. Phase 4c. Read this card before using.
What this LoRA actually does (and doesn't)
This is a research artifact, not a production-grade weight.
Tested through the project's full Phase 2 two-stage upscaling pipeline
(stable-diffusion-x4-upscaler -> stable-diffusion-v1-5 + ControlNet Tile,
img2img, tiled) on a 60-image frozen test set at 5x:
| Pipeline | Mean LPIPS at 5x (lower = better) |
|---|---|
| Real-ESRGAN baseline | 0.299 |
| Two-stage (no LoRA) | 0.433 |
| Two-stage + this LoRA | 0.443 |
The LoRA does not improve the pipeline on average. It does shift the texture/color prior in interesting per-category ways:
| Test slice | LoRA win rate vs no-LoRA two-stage | Ξ mean LPIPS |
|---|---|---|
| traditional / landscape | 30% | +0.0145 (worse) |
| traditional / cityscape | 20% | +0.0156 (worse) |
| traditional / animals | 30% | +0.0040 |
| hard / fine_architecture | 25% | +0.0150 (worse) |
| hard / hf_texture | 50% | +0.0093 (worse) |
| hard / night | 62.5% | -0.0056 (better) |
| hard / reflection | 60% | +0.0019 |
| hard / noise | 50% | +0.0109 (worse) |
| hard / text | 46% | +0.0092 (worse) |
The LoRA learned a slight darken-and-saturate prior that genuinely helps night scenes but mildly hurts daylight categories β even the very domains it was trained on (landscape / cityscape / animals).
Why publish a non-improving LoRA?
This is the third training attempt in the study. The first two targeted
LoRA on stable-diffusion-x4-upscaler's cross-attention modules and produced
catastrophically destructive deltas (output LPIPS 0.78-0.92, vs base 0.33)
regardless of recipe. Detailed failure analysis is in the project's Phase 4c
writeup; the short version is that x4-upscaler's denoising trajectory is
unusually fragile to U-Net perturbations and the SUPIR paper's
architectural choices (zero-init additive adapters on intermediate
ResBlocks, NOT LoRA on attention) are validated by our negative result.
This stage-B LoRA is technically functional (does not destabilise the pipeline), and that's noteworthy on its own. The honest finding is that small-scale cross-attention LoRA on SD 1.5 isn't sufficient to close the gap to dedicated SR architectures (Real-ESRGAN, SUPIR) when LR is clean bicubic-downsampled HR.
Architecture
- Base model:
stable-diffusion-v1-5/stable-diffusion-v1-5 - Adapter: PEFT LoRA, rank 16, alpha 8 (effective scale 0.5)
- Targets:
to_q / to_k / to_v / to_out.0in the UNet cross-attention - Trainable params:
3.2M (0.7% of base)
Training
- 7786 (LR_128, HR_512) pairs from DIV2K + Unsplash Lite
(
bradhinkel/sd-image-upscaler-pairs, private dataset) - BLIP-large captions (one per HR tile)
- 8000 steps, batch 4, lr 1e-4, fp16 UNet + fp32 VAE
- Loss flat at ~0.15 across all 8000 steps (base SD 1.5 already achieves this on photographs at step 0; LoRA's contribution is the small prior shift, not loss reduction)
- RunPod RTX 5090 32 GB, ~63 min, ~$0.71
Usage
from diffusers import StableDiffusionControlNetImg2ImgPipeline, ControlNetModel
from peft import PeftModel
import torch
controlnet = ControlNetModel.from_pretrained(
"lllyasviel/control_v11f1e_sd15_tile", torch_dtype=torch.float16
)
pipe = StableDiffusionControlNetImg2ImgPipeline.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5",
controlnet=controlnet,
torch_dtype=torch.float16,
safety_checker=None,
).to("cuda")
pipe.unet = PeftModel.from_pretrained(pipe.unet, "bradhinkel/sd-image-upscaler-sd15-lora")
# Use as the stage-B refinement step in a two-stage upscale pipeline.
# See github.com/bradhinkel/SD_image_upscaler for the full inference path.
Recommended usage
If you want to use this artifact at all, night scenes only, at LoRA scale 0.5-1.0. For daylight imagery, prefer the same pipeline without the LoRA β it's slightly better.
Source attributions
- Training dataset: DIV2K (research-only license) + Unsplash Lite (ML training permitted under the Lite Dataset terms)
- Captions:
Salesforce/blip-image-captioning-large
License
OpenRAIL inherited from SD 1.5 base.
- Downloads last month
- 28
Model tree for bradhinkel/sd-image-upscaler-sd15-lora
Base model
stable-diffusion-v1-5/stable-diffusion-v1-5