Configuration Parsing Warning:In adapter_config.json: "peft.base_model_name_or_path" must be a string

Configuration Parsing Warning:In adapter_config.json: "peft.task_type" must be a string

SD Image Upscaler β€” Stage-B SD 1.5 LoRA (research artifact)

A LoRA from the SD Image Upscaler capability study at github.com/bradhinkel/SD_image_upscaler. Phase 4c. Read this card before using.

What this LoRA actually does (and doesn't)

This is a research artifact, not a production-grade weight.

Tested through the project's full Phase 2 two-stage upscaling pipeline (stable-diffusion-x4-upscaler -> stable-diffusion-v1-5 + ControlNet Tile, img2img, tiled) on a 60-image frozen test set at 5x:

Pipeline Mean LPIPS at 5x (lower = better)
Real-ESRGAN baseline 0.299
Two-stage (no LoRA) 0.433
Two-stage + this LoRA 0.443

The LoRA does not improve the pipeline on average. It does shift the texture/color prior in interesting per-category ways:

Test slice LoRA win rate vs no-LoRA two-stage Ξ” mean LPIPS
traditional / landscape 30% +0.0145 (worse)
traditional / cityscape 20% +0.0156 (worse)
traditional / animals 30% +0.0040
hard / fine_architecture 25% +0.0150 (worse)
hard / hf_texture 50% +0.0093 (worse)
hard / night 62.5% -0.0056 (better)
hard / reflection 60% +0.0019
hard / noise 50% +0.0109 (worse)
hard / text 46% +0.0092 (worse)

The LoRA learned a slight darken-and-saturate prior that genuinely helps night scenes but mildly hurts daylight categories β€” even the very domains it was trained on (landscape / cityscape / animals).

Why publish a non-improving LoRA?

This is the third training attempt in the study. The first two targeted LoRA on stable-diffusion-x4-upscaler's cross-attention modules and produced catastrophically destructive deltas (output LPIPS 0.78-0.92, vs base 0.33) regardless of recipe. Detailed failure analysis is in the project's Phase 4c writeup; the short version is that x4-upscaler's denoising trajectory is unusually fragile to U-Net perturbations and the SUPIR paper's architectural choices (zero-init additive adapters on intermediate ResBlocks, NOT LoRA on attention) are validated by our negative result.

This stage-B LoRA is technically functional (does not destabilise the pipeline), and that's noteworthy on its own. The honest finding is that small-scale cross-attention LoRA on SD 1.5 isn't sufficient to close the gap to dedicated SR architectures (Real-ESRGAN, SUPIR) when LR is clean bicubic-downsampled HR.

Architecture

  • Base model: stable-diffusion-v1-5/stable-diffusion-v1-5
  • Adapter: PEFT LoRA, rank 16, alpha 8 (effective scale 0.5)
  • Targets: to_q / to_k / to_v / to_out.0 in the UNet cross-attention
  • Trainable params: 3.2M (0.7% of base)

Training

  • 7786 (LR_128, HR_512) pairs from DIV2K + Unsplash Lite (bradhinkel/sd-image-upscaler-pairs, private dataset)
  • BLIP-large captions (one per HR tile)
  • 8000 steps, batch 4, lr 1e-4, fp16 UNet + fp32 VAE
  • Loss flat at ~0.15 across all 8000 steps (base SD 1.5 already achieves this on photographs at step 0; LoRA's contribution is the small prior shift, not loss reduction)
  • RunPod RTX 5090 32 GB, ~63 min, ~$0.71

Usage

from diffusers import StableDiffusionControlNetImg2ImgPipeline, ControlNetModel
from peft import PeftModel
import torch

controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/control_v11f1e_sd15_tile", torch_dtype=torch.float16
)
pipe = StableDiffusionControlNetImg2ImgPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5",
    controlnet=controlnet,
    torch_dtype=torch.float16,
    safety_checker=None,
).to("cuda")
pipe.unet = PeftModel.from_pretrained(pipe.unet, "bradhinkel/sd-image-upscaler-sd15-lora")

# Use as the stage-B refinement step in a two-stage upscale pipeline.
# See github.com/bradhinkel/SD_image_upscaler for the full inference path.

Recommended usage

If you want to use this artifact at all, night scenes only, at LoRA scale 0.5-1.0. For daylight imagery, prefer the same pipeline without the LoRA β€” it's slightly better.

Source attributions

  • Training dataset: DIV2K (research-only license) + Unsplash Lite (ML training permitted under the Lite Dataset terms)
  • Captions: Salesforce/blip-image-captioning-large

License

OpenRAIL inherited from SD 1.5 base.

Downloads last month
28
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for bradhinkel/sd-image-upscaler-sd15-lora

Adapter
(636)
this model