YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Residual Convolutional Autoencoder Ensemble

Deep learning models for image reconstruction using residual convolutional autoencoders.

Model Architecture

Two variants of a deep convolutional autoencoder with residual blocks:

  • Model A: latent_dim=512, dropout=0.15
  • Model B: latent_dim=768, dropout=0.20

Architecture Details

Input: (B, 3, 256, 256) RGB images in range [-1, 1]
Encoder: 6-layer CNN with residual blocks (256β†’128β†’64β†’32β†’16β†’8β†’4)
Latent: Fully connected projection to latent_dim
Decoder: 6-layer TransposeCNN with residual blocks (4β†’8β†’16β†’32β†’64β†’128β†’256)
Output: (B, 3, 256, 256) Reconstructed images + (B, latent_dim) latent codes

Training Details

  • Dataset: Real images (256x256 resolution)
  • Loss: MSE (Mean Squared Error)
  • Optimizer: AdamW with weight decay
  • Training: 100+ epochs with validation monitoring
  • Best Validation Loss:
    • Model A: 0.025486
    • Model B: 0.025033

Usage

import torch
from model import ResidualConvAutoencoder, load_model

# Option 1: Load pre-trained model
model, checkpoint = load_model('model_a_best.pth', latent_dim=512, dropout=0.15)

# Option 2: Create from scratch
model = ResidualConvAutoencoder(latent_dim=512, dropout=0.15)
model.eval()

# Prepare image (normalize to [-1, 1])
from torchvision import transforms
transform = transforms.Compose([
    transforms.Resize((256, 256)),
    transforms.ToTensor(),
    transforms.Lambda(lambda x: x * 2 - 1)  # [0,1] -> [-1,1]
])

# Inference
with torch.no_grad():
    img_tensor = transform(image).unsqueeze(0)
    reconstructed, latent = model(img_tensor)
    
    # Get reconstruction error
    error = torch.nn.functional.mse_loss(reconstructed, img_tensor)

Model Files

  • model_a_best.pth - Model A checkpoint (latent_dim=512)
  • model_b_best.pth - Model B checkpoint (latent_dim=768)
  • model.py - Model architecture definition
  • config.json - Training configuration
  • training_history.json - Full training metrics

Research Findings

Important Note: These models were trained as image reconstruction autoencoders. Testing revealed they function as enhancement/denoising models rather than anomaly detectors:

  • βœ… Successfully reconstructs natural images
  • βœ… Can denoise corrupted images (JPEG artifacts, blur, contrast)
  • ⚠️ Not suitable for detecting modern AI-generated images
  • ⚠️ Shows negative discrimination for degraded images (reconstructs them better)

Performance on Synthetic Corruptions

Corruption Type Separation from Real
Noise Added +122.1% βœ…
Color Shifted +23.8% ⚠️
Patch Corrupted +12.6% ❌
JPEG Compressed -9.8% ❌
Contrast Altered -90.1% ❌
Blurred -92.5% ❌

Negative percentages indicate the model reconstructs corrupted images better than real images (denoising effect).

Limitations

  1. Not an anomaly detector: Models enhance/denoise rather than faithfully reconstruct
  2. Poor for fake detection: Cannot reliably distinguish modern AI-generated images from real ones
  3. Pixel-space limitations: Modern AI images are statistically similar to real images in pixel space

Recommended Use Cases

βœ… Image denoising and enhancement
βœ… Feature extraction (latent representations)
βœ… Image compression/reconstruction
βœ… Transfer learning backbone
❌ Fake image detection (use supervised classifiers instead)
❌ Anomaly detection (use different approach)

Citation

If you use these models in your research, please cite:

@model{residual_autoencoder_ensemble_2024,
  author = {ash12321},
  title = {Residual Convolutional Autoencoder Ensemble},
  year = {2024},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/ash12321/residual-autoencoder-ensemble}}
}

License

MIT License - See LICENSE file for details

Contact

For questions or issues, please open an issue on the Hugging Face model page.

Downloads last month
22
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support