Sentin-Edge AI Models

Sentin-Edge AI is a suite of privacy-preserving, edge-optimized neural networks designed to detect human stress, emotional states, and gaze direction in real-time using only a standard webcam.

This repository contains the INT8 Quantized TensorFlow Lite models. These models are mathematically optimized for deployment on CPU/NPU edge hardware (like Android devices or Raspberry Pi) via the XNNPACK delegate, requiring less than 2 MB of total storage.


Model Inventory

This repository hosts three interconnected models:

1. Affective CNN (affective_cnn_int8.tflite)

  • Task: 7-Class Emotion Classification & Feature Extraction
  • Architecture: 4-Layer Convolutional Neural Network (32โ†’64โ†’128โ†’128)
  • Input: 1x48x48x1 (Grayscale Face Crop)
  • Output: 1x7 (Emotion Probabilities) + 1x256 (Deep Feature Embedding)
  • Size: 1.4 MB
  • Quantization: INT8 Full Integer

2. Stress TCN (affective_tcn_int8.tflite)

  • Task: Temporal Stress Level Estimation
  • Architecture: Multi-Modal Temporal Convolutional Network (TCN)
  • Input: 1x15x268 Sequence (256-dim CNN Embedding + 12-dim temporal geometry metrics)
  • Output: 1x1 (Stress Score [0.0 - 1.0])
  • Size: 137 KB
  • Quantization: INT8 Full Integer

3. Gaze Hybrid (gaze_hybrid_int8.tflite)

  • Task: Screen Coordinate Gaze Regression
  • Architecture: MLP Feature Extractor + TCN + Head Pose Fusion
  • Input: 1x15x10 (Eye/Iris Geometry Sequence) + 1x3 (Head Pose Pitch/Yaw/Roll)
  • Output: 1x2 (Normalized Screen X, Y)
  • Size: 57 KB
  • Quantization: INT8 Full Integer

System Architecture & Pipeline

Sentin-Edge AI uses a Dual-Head Multi-Task Learning approach. All processing runs entirely on-device without any network calls.

High-Level Architecture

High-Level Architecture

Frame Processing Sequence

Frame Processing Sequence

Benchmarks & Quantization To achieve real-time performance on edge devices, these models undergo strict INT8 quantization.

Quantization Strategy

Performance Metrics The graph below illustrates the trade-off between model footprint and accuracy preservation compared to the original FP32 PyTorch models.

Model Original FP32 Size Quantized INT8 Size Storage Reduction Accuracy Variance Affective CNN ~5.6 MB 1.4 MB ~75% < 5% drop Stress TCN ~540 KB 137 KB ~75% < 3% drop Gaze Hybrid ~220 KB 57 KB ~74% < 4% drop

Intended Use These models are intended for privacy-first edge computing research. Because they process frames entirely in-memory and return lightweight heuristic scores, they are ideal for:

HCI (Human-Computer Interaction) accessibility research On-device cognitive load and stress monitoring Privacy-preserving biometric telemetry Out-of-Scope Use Cases:

Medical diagnosis Automated proctoring or disciplinary surveillance without consent High-stakes biometric security authentication

Python Usage Example You can download and run these models via the Hugging Face Hub using the TensorFlow Lite Interpreter.

from huggingface_hub import hf_hub_download
import tensorflow as tf
import numpy as np

# 1. Download the Edge Model
model_path = hf_hub_download(
    repo_id="YourUsername/sentin-edge-ai", 
    filename="affective_cnn_int8.tflite"
)

# 2. Load the TFLite Interpreter
interpreter = tf.lite.Interpreter(model_path=model_path)
interpreter.allocate_tensors()

input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# 3. Prepare Input (48x48 Grayscale normalized to [-1, 1])
dummy_face = np.zeros((1, 48, 48, 1), dtype=np.float32)

# Manually quantize input if model expects INT8
scale, zero_point = input_details[0]['quantization']
if scale > 0:
    dummy_face = np.clip(np.round(dummy_face / scale) + zero_point, -128, 127).astype(np.int8)

interpreter.set_tensor(input_details[0]['index'], dummy_face)
interpreter.invoke()

# 4. Extract Output
emotion_probs = interpreter.get_tensor(output_details[0]['index'])
print("Emotion Probabilities:", emotion_probs)

Training Data & Limitations

Affective CNN was trained on the FER-2013 dataset. Stress TCN was trained on synthetic sequences correlating facial micro-tremors and AU4/AU12 activity with emotional arousal. Gaze Hybrid was pre-trained on MPIIGaze.

Limitations: Due to INT8 quantization, the models exhibit up to a 5% variance compared to their FP32 PyTorch equivalents. Gaze tracking requires per-user calibration (a 5-point screen registration) for sub-degree accuracy. Accuracy degrades significantly in extreme low-light environments where MediaPipe cannot resolve the iris contours.

Downloads last month
49
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support