StyleID: A Perception-Aware Dataset and Metric for Stylization-Agnostic Facial Identity Recognition
Abstract
StyleID presents a human perception-aware dataset and evaluation framework for facial identity preservation under stylization, featuring two datasets derived from psychometric experiments and calibrated semantic encoders that improve correlation with human judgments.
Creative face stylization aims to render portraits in diverse visual idioms such as cartoons, sketches, and paintings while retaining recognizable identity. However, current identity encoders, which are typically trained and calibrated on natural photographs, exhibit severe brittleness under stylization. They often mistake changes in texture or color palette for identity drift or fail to detect geometric exaggerations. This reveals the lack of a style-agnostic framework to evaluate and supervise identity consistency across varying styles and strengths. To address this gap, we introduce StyleID, a human perception-aware dataset and evaluation framework for facial identity under stylization. StyleID comprises two datasets: (i) StyleBench-H, a benchmark that captures human same-different verification judgments across diffusion- and flow-matching-based stylization at multiple style strengths, and (ii) StyleBench-S, a supervision set derived from psychometric recognition-strength curves obtained through controlled two-alternative forced-choice (2AFC) experiments. Leveraging StyleBench-S, we fine-tune existing semantic encoders to align their similarity orderings with human perception across styles and strengths. Experiments demonstrate that our calibrated models yield significantly higher correlation with human judgments and enhanced robustness for out-of-domain, artist drawn portraits. All of our datasets, code, and pretrained models are publicly available at https://kwanyun.github.io/StyleID_page/
Community
Stylization-Agnostic Facial Identity Recognition
that psychometric supervision trick is neat, StyleBench-S effectively seeds alignment with human perception across styles and strengths. but i’m curious how robust that alignment is when the supervision data is imperfect or biased, like if the 2AFC judgments skew toward certain artists or stylizations. did you run ablations where you perturb StyleBench-S by subsampling or injecting label noise to see if the angular-margin plus supervised-contrastive losses still track human judgments? btw, the arxivlens breakdown helped me parse the method details, especially how the adapters sit on top of a frozen clip backbone and how the losses interact with the manifold.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Towards In-Context Tone Style Transfer with A Large-Scale Triplet Dataset (2026)
- PixelSmile: Toward Fine-Grained Facial Expression Editing (2026)
- A2BFR: Attribute-Aware Blind Face Restoration (2026)
- NearID: Identity Representation Learning via Near-identity Distractors (2026)
- MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Text-to-Image Style Mapping (2026)
- Mixture of Style Experts for Diverse Image Stylization (2026)
- CleanStyle: Plug-and-Play Style Conditioning Purification for Text-to-Image Stylization (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2604.21689 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper