🧠 ArcFace Industrial Face Recognition

ResNet-50 backbone trained with ArcFace loss on VGGFace2 β€” production-grade face embeddings for real-world identity verification.

This model is part of a broader comparative study of deep face recognition loss functions (ArcFace, SphereFace, Triplet Loss). After systematic evaluation across three experimental rounds, ArcFace was selected as the production model based on its training stability, numerical reliability at scale, and consistent generalization gains as data grows.


πŸ“Š Model Performance

Metric Value
Training Accuracy 99%
Validation Accuracy 85%
Training Loss 0.03
Validation Loss 4.00
Training Identities 3,000
Images per Identity ~200
Epochs 100

Validation loss reflects expected open-set generalization behavior β€” the model is trained on a closed identity set and evaluated against unseen faces. This gap narrows with more training data.


πŸ—οΈ Architecture

Input Image (112Γ—112Γ—3)
        β”‚
        β–Ό
  ResNet-50 Backbone
        β”‚
        β–Ό
  512-D L2-Normalized Embedding
        β”‚
        β–Ό
   ArcFace Head
   (Additive Angular Margin, m=0.5, s=64)
Component Details
Backbone ResNet-50
Embedding Dimension 512
Loss Function ArcFace (m=0.5, s=64)
Input Resolution 112 Γ— 112
Embedding Normalization L2
Optimizer SGD with cosine decay
Learning Rate 0.1

πŸš€ Usage

Load the Model

import torch
from src.config import load_config
from src.models.face_model import build_face_model

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
cfg = load_config("configs/base.yaml")

model = build_face_model(cfg, num_classes=None)  # inference mode
ckpt = torch.load("model.pt", map_location=device)
state_dict = ckpt.get("model_state", ckpt)

backbone_state = {
    k.replace("backbone.", "", 1): v
    for k, v in state_dict.items()
    if k.startswith("backbone.")
}
model.backbone.load_state_dict(backbone_state, strict=False)
model.eval().to(device)

Extract Embeddings

import cv2
import numpy as np
from src.data.preprocessing import PreprocessingPipeline

pipeline = PreprocessingPipeline(
    preproc_cfg=cfg.preprocessing,
    image_size=cfg.data.image_size,
    apply_detection=False,
)

@torch.no_grad()
def get_embedding(img_bgr: np.ndarray) -> np.ndarray:
    img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB)
    chw = pipeline(img_rgb)
    tensor = torch.from_numpy(chw).unsqueeze(0).to(device)
    emb = model(tensor)
    return emb.squeeze().cpu().numpy().astype(np.float32)

img = cv2.imread("face.jpg")
embedding = get_embedding(img)  # shape: (512,), L2-normalized

Compute Similarity

from numpy.linalg import norm

def cosine_similarity(emb1: np.ndarray, emb2: np.ndarray) -> float:
    return float(np.dot(emb1, emb2) / (norm(emb1) * norm(emb2)))

sim = cosine_similarity(embedding_a, embedding_b)
# sim ∈ [-1, 1] β€” higher = more similar
# Typical threshold for same-person: β‰₯ 0.5

πŸ—„οΈ Production Pipeline

This model powers a complete two-stage attendance system:

Stage 1 β€” Database Population (build_database.py)

Registers known identities by computing gallery embeddings and storing them in ChromaDB (vector similarity search) linked to identity metadata in MongoDB. Runs a built-in Top-K evaluation on held-out probe images after registration.

Stage 2 β€” Real-Time Inference (realtime_attendance.py)

Reads a live webcam feed, detects faces with MTCNN, embeds each crop through this model, and queries ChromaDB for the nearest registered identity. Recognized faces are labeled with name and similarity score; unknown faces are flagged. A cooldown timer prevents duplicate attendance logs.


πŸ“ˆ Gallery Evaluation Results

Evaluated on 50 registered identities (250 held-out probe images, never seen during training or registration):

Metric Result
Top-1 Accuracy 92.00% (230 / 250)
Top-3 Accuracy 96.00% (240 / 250)
Top-5 Accuracy 96.80% (242 / 250)
Failed reads 0 / 250

Probe images are drawn from the same VGGFace2 distribution as the gallery but are a completely separate split β€” never used during model training or gallery registration.


πŸ”¬ Why ArcFace for Production

ArcFace was selected over SphereFace (the Round 3 LFW leader) based on engineering considerations critical for deployment:

  • Additive angular margin has a direct geometric interpretation on the hypersphere β€” the decision boundary is fixed and predictable, making threshold calibration reliable across unseen identities.
  • Numerical stability at scale β€” SphereFace's multiplicative margin becomes sensitive as class count increases. ArcFace's formulation remains stable regardless.
  • Consistent data scaling β€” validation accuracy improved monotonically from 83% (1,000 identities) to 85% (3,000 identities), confirming predictable generalization gains as the training set grows.
  • Industry standard β€” ArcFace is the de facto choice in production face recognition systems, with extensive tooling for quantization, ONNX export, and edge deployment.

πŸ“¦ Training Data

Dataset Identities Images Resolution
VGGFace2 (subset) 3,000 ~600,000 112 Γ— 112

Full dataset: VGGFace2 112Γ—112 on Kaggle


πŸ”— Related Resources

Resource Link
πŸ““ Training & Evaluation Notebook Kaggle β€” ArcFace Training
πŸ“„ ArcFace Paper arXiv:1801.07698
πŸ€— Triplet Loss Model AbdoSaad24/TripletLossModels
πŸ€— SphereFace Model AbdoSaad24/BestSphereFaceModel
πŸ€— ArcFace (Research, R3) AbdoSaad24/BestArcFaceModel

πŸ“‹ Citation

@inproceedings{deng2019arcface,
  title     = {ArcFace: Additive Angular Margin Loss for Deep Face Recognition},
  author    = {Deng, Jiankang and Guo, Jia and Xue, Niannan and Zafeiriou, Stefanos},
  booktitle = {CVPR},
  year      = {2019}
}

⚠️ Limitations & Responsible Use

  • This model was trained on a subset of VGGFace2. Performance may degrade on faces from demographics underrepresented in the training data.
  • The model is intended for attendance and access control systems where subjects have consented to enrollment.
  • Do not use for surveillance, tracking, or identification of individuals without explicit consent.
  • Threshold selection (default: 0.5 cosine similarity) should be calibrated to your deployment environment β€” lower thresholds increase false acceptances, higher thresholds increase false rejections.

Part of the Face Recognition Comparative Study β€” ArcFace Β· SphereFace Β· Triplet Loss.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for AbdoSaad24/IndustrialFaceRecognition