3. Method / Model / Algorithm
3.1 Overview
3.1.1 Objective
The objective of this study is to design and implement a real-time facial emotion
recognition system for estimating students’ engagement levels during classroom learning
sessions. By analyzing students’ facial expressions from live or recorded video streams, the
system aims to provide quantitative indicators of classroom engagement to support
teaching evaluation and learning behavior analysis.
3.1.2 Model Selection
The system is built upon the DeepFace framework, which is an open-source facial analysis
library integrating multiple deep learning models for face recognition, age, gender, and
emotion analysis.
For emotion recognition, we use DeepFace’s built-in Emotion CNN model, which has been
pre-trained on large-scale facial expression datasets such as:
ExpW (Expression in-the-Wild)
FER2013
These datasets contain facial images captured in real-world environments, with large
variations in lighting, pose, and background, making the model suitable for classroom
environments.
The system recognizes seven basic emotional categories:
happy
sad
angry
fear
surprise
disgust
neutral
3.1.3 Experimental Setup
The proposed system was developed and tested on a workstation with the following
specifications:
Hardware: Intel Core i7 Processor, 16GB RAM, Integrated Webcam (720p
resolution).
Software Environment: Python 3.10 on Windows 10.
Key Libraries: DeepFace (for emotion extraction), OpenCV (for image processing),
and Pandas/Matplotlib (for data analysis and visualization).
3.2 System Architecture and Pipeline
The overall processing pipeline is described as follows:
Video Input
↓
Face Detection & Alignment
↓
Face Preprocessing
↓
Emotion Classification (DeepFace CNN)
↓
Temporal Smoothing
↓
Engagement Score Computation
↓
Visualization & Data Logging
3.3 Detailed Processing Steps
Step 1: Video Acquisition
The system acquires visual data from either:
A live video stream via a webcam.
A pre-recorded classroom video file (MP4, AVI, etc.).
Using OpenCV:
import cv2
cap = cv2.VideoCapture(0) # 0 = default webcam
ret, frame = cap.read()
Each frame is processed individually in real time. The default frame rate is approximately
20–30 FPS, depending on hardware performance.
Step 2: Face Detection and Alignment
The system performs face detection using MTCNN or RetinaFace, integrated within
DeepFace.
DeepFace automatically performs:
Face detection
Face alignment (eye & face orientation correction)
Face cropping
Face resizing to model input size
Example code:
from deepface import DeepFace
results = DeepFace.analyze(
frame,
actions=['emotion'],
detector_backend='mtcnn',
enforce_detection=False
)
Key configurations:
detector_backend = 'mtcnn' ensures robust detection.
enforce_detection=False prevents the program from crashing when no faces are
found.
If no face is detected, a fallback algorithm based on OpenCV Haar Cascades can be applied
to attempt secondary detection.
Step 3: Face Preprocessing
Each detected face undergoes the following preprocessing steps:
1. Alignment using facial landmarks.
2. Resizing to required input shape for the CNN (typically 48×48 or 224×224).
3. Correction of illumination where needed (optional grayscale conversion).
4. Normalization of pixel values to range [0, 1].
If lighting is poor:
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
These operations help improve robustness under real classroom conditions.
Step 4: Emotion Recognition
After preprocessing, each face image is passed into the DeepFace CNN model.
The output format for each face includes:
Emotion probabilities for all 7 classes
The dominant emotion
Example output structure:
{
"emotion": {
"angry": 0.02,
"disgust": 0.00,
"fear": 0.03,
"happy": 0.80,
"sad": 0.04,
"surprise": 0.07,
"neutral": 0.04
},
"dominant_emotion": "happy"
}
The dominant emotion is selected as:
emotion = result[0]['dominant_emotion']
Step 5: Temporal Smoothing
Facial expressions change rapidly and might fluctuate due to blinking or slight head
movements. To reduce noise, the system applies temporal smoothing using a sliding
window.
For each detected student:
Collect emotion predictions over the last N frames (e.g., N = 10).
Compute the dominant emotion as the most frequent label in the window.
Alternatively, average emotion probabilities.
This approach produces more stable emotion predictions.
Step 6: Engagement Score Computation
To quantify classroom engagement, we map emotions to engagement scores based on
educational psychology.
Emotion Engagement Weight Interpretation
Happy
1.0
Highly engaged
Surprise 0.9
Curious and active
Neutral 0.6
Attentive but passive
Emotion Engagement Weight Interpretation
Sad
0.4
Low interest
Fear
0.3
Nervous or uncomfortable
Angry
0.3
Negative engagement
Disgust 0.2
Strong disengagement
The engagement score is defined as:
[
Engagement = \frac{\sum_{i=1}^{n} Score(emotion_i)}{n}
]
Where:
n = number of detected faces in a frame
emotion_i = dominant emotion of face i
Example:
If three students show: happy, neutral, sad:
[
Engagement = \frac{1.0 + 0.6 + 0.4}{3} = 0.67
]
Step 7: Visualization
The system visualizes the results both in real time and offline.
1. Emotion Distribution Pie Chart
import matplotlib.pyplot as plt
emotions = ['happy', 'neutral', 'sad']
values = [0.8, 0.15, 0.05]
plt.pie(values, labels=emotions, autopct='%1.1f%%')
plt.title("Emotion Distribution")
plt.show()
2. Engagement Score Over Time
import pandas as pd
import matplotlib.pyplot as plt
data = {'time': [1, 2, 3, 4, 5],
'engagement': [0.6, 0.7, 0.8, 0.75, 0.65]}
df = pd.DataFrame(data)
plt.plot(df['time'], df['engagement'])
plt.xlabel("Time (minutes)")
plt.ylabel("Engagement Score")
plt.title("Engagement Trend")
plt.show()
3.4 Fallback & Error Handling
If no face is detected → log "No face detected".
If an exception occurs during analysis:
o
Skip the frame.
o
Continue processing next frame.
If lighting is poor → convert frame to grayscale.
If frame processing time is too slow:
o
Process every 2nd or 3rd frame instead of every frame.
3.5 Example Results
Time (s) Detected Emotions Engagement
0–10
happy, neutral
0.80
10–20
neutral, sad
0.55
20–30
happy
0.95
30–40
neutral
0.60
40–50
sad
0.40
50–60
happy
1.00
Average engagement ≈ 0.72, indicating a generally attentive classroom.
3.6 Pseudocode
import cv2
from deepface import DeepFace
cap = cv2.VideoCapture(0)
while True:
ret, frame = cap.read()
if not ret:
break
try:
results = DeepFace.analyze(
frame,
actions=['emotion'],
detector_backend='mtcnn',
enforce_detection=False
)
for res in results:
emotion = res['dominant_emotion']
print("Detected emotion:", emotion)
except Exception as e:
print("No face detected or error:", e)
cv2.imshow("Emotion Recognition", frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
3.7 Summary
The system integrates real-time video capture with CNN-based facial emotion
analysis.
DeepFace enables robust emotion recognition without training from scratch.
Engagement scores provide an interpretable metric for classroom behavior.
The pipeline is scalable and can be extended with additional modalities such as
audio or eye-gaze tracking.