by Katinka Gereb

We used AI exam proctoring assistant as a use case for AWS Rekognition

The education sector is one of the industries that has been impacted the most during the pandemic, with lectures and many exams moving online. This has resulted in increased workloads for teachers and lecturers not only in the classroom but also when it comes to exam proctoring. We’ve been thinking about this problem a lot at Eliiza and wondering how we could be using technology to make things just a little bit easier for people and not more complicated. We started looking at AWS Rekognition and how we could adapt this technology for the good of the education industry.

Firstly, what is exam proctoring?

Exam proctoring traditionally involves a supervisor monitoring student activity during an online exam. The goal of exam proctoring is to detect suspicious activity that might indicate cheating, and it covers a range of tasks from identity verification to incident reporting, and even intervention. 

During an exam a supervisor’s role is however broader than just monitoring and detecting suspicious incidents. They also provide support with student enquiries related to accessing the exam questions and the submission of exam results, and their presence is crucial in case any health crisis arises.

The limitations of exam proctoring

Online exam proctoring is a labour intensive task where supervisors simultaneously have to support students with their enquiries, pay attention to every participant in the call and take notes if suspicious activity occurs. They have to keep this up for hours on end, with hundreds of exams taking place every semester at any given university.

An additional problem arises due to the privacy concerns when it comes to exam proctoring software. In many cases the software has to be installed on a student’s computer, hence opening up the space for privacy invasion, raising the question if the software can access the content of a student’s device. If the software is very sensitive to detecting suspicious activity in the form of movement, noise, the appearance of certain objects, they can create too many false positives and alert way too often. There are known cases where the software cut off students in the middle of the exam rendering them unable to continue.

An AI exam proctoring assistant

In this blog we propose a solution at the intersection of automated computer vision and a human supervisor working together and performing the role of an exam proctor. 

Our solution doesn’t require software installation, and it’s a tool that is designed to assist supervisors in the proctoring process, substantially easing their workload instead of taking the human out of the loop completely. 

There are two different aspects of the proctoring process that our solution addresses. These two aspects are student identity verification and suspicious activity detection (e.g. if a student is out of view for more than 5 mins).

Identity Verification using AWS Rekognition

The goal of our solution is to support the natural flow of an online exam and provide assistance to supervisors in their efforts of exam proctoring. In the Zoom exam setting, as part of the process the supervisor takes a front-facing screenshot of the students with the purpose of carrying out identity verification. The screenshot of the student sitting in the exam is compared to their ID images from a pre-existing database. Face comparison and identity verification is not an easy task as sometimes ID images were provided years ago and students look different now, furthermore variance in applying make-up or no make-up, lighting conditions can also influence how a face is perceived.  

We perform a similar task, using the exam screenshot to compare the faces to the ID images using AWS Rekognition’s “compare faces” API. AWS Rekognition’s underlying models are pre-trained and readily available, meaning that the user does not need to build and train the machine learning model. The response of the API returns information about face matches (and unmatched faces), bounding box of the face, facial landmarks, pose details (pitch, roll, and yaw), quality (brightness and sharpness), and even more importantly information about the similarity level and face detection confidence. One can set a similarity threshold above which the face comparison is acceptable. In case of new arrivals, or if students leave the room and come back, re-identification can be done easily in a similar manner, just by providing a screenshot.  

With the purpose of testing the output, we identified a few cases where the performance of the API declines. The most obvious one is due to image quality. If the image is too dark or too blurry, it is hard to see the face even for a human. An AI algorithm is no exception. Other reasons that result in a lower confidence detection are if a face is partly covered (person holding their chin for example) or looking in a different direction. However, the overall performance of the AWS Rekognition API is very promising, with a 96% face-matching success rate when tested on roughly 25 screenshots with 370 faces present in them. 

Suspicious Activity Detection

The exam proceeds after the identification step, during which suspicious activity detection is performed by the supervisor. Our solution focuses on one type of suspicious activity, defining it as a student being 5 mins away from the camera. This might happen due to exam paper printing, bathroom breaks, however time spent more than 5 minutes away has to be reported. 

Designing a product that is able to measure student presence is made challenging by several factors. Students are dropping in and out of the call, therefore their location in the Zoom gallery view is changing. This means that any solution has to be able to re-identify students at any given time. However, students are not always facing the camera during the exam depending on the camera angle and head movement (e.g. looking down at the exam papers), therefore facial recognition can not be used for identification purposes during the exam. 

Thankfully, AWS Rekognition has various APIs that can be used to address the detection and identification of the students. When faces are not visible, the name of the students is still displayed on the screen, and we can extract this text and use it for identification purposes. To understand if the student is in view, label detection can be applied, the AWS Rekognition API returning a response on whether a “person” is detected in the image or not. This response focuses on the detection of the full body (even if partially visible) instead of a face. Then location-based matching of the name and persons can give us a view whether or not the student is away from the camera. If the student has dropped out of the video completely, neither their name nor their person would be detected. 

False positives/negatives will inevitably appear from time to time. Since we are looking at a 5 min window for suspicious activity detection, the solution becomes robust against a few misdetections, as we are only reporting cases where students are consistently away for at least five minutes. When an incident is detected, a screenshot of the Zoom image is sent to an S3 bucket so the evidence is preserved for future reference.     

At the end a report is generated to show the number of incidents for each student and notes on when and for how long students were away. This type of incident accounts for 50% of all incidents that are generally reported by supervisors. Without an automated assistant a supervisor can only support a low number of students otherwise the workload becomes unsustainable. However, if 50% of the supervisor’s time is freed up, more students can be hosted in one exam, meaning that fewer exam sessions are needed, reducing the cost and time commitment invested during the exam period.

The underlying AWS architecture

The ultimate goal is to design a solution that can be used in real-time. Thus the architecture behind the solution is serverless, eliminating the need for expensive and hard-to-maintain servers. This not only allows for a more efficient solution, it is also cheaper. 

In the final solution we decided to perform our analysis on image frames taken 10 seconds apart. This is a sufficiently detailed time range for our needs (the goal being suspicious activity detection over 5 mins), and significantly reduces the processing time and cost. We used infrastructure as code so the deployment of our processing pipeline is repeatable. The solution architecture diagram is shown in Fig. 1.

Fig. 1 – AWS architecture diagram, consisting of 3 Lambda functions, 2 SQS queues, AWS Rekognition, S3 buckets, 2 DynamoDB tables, and SNS.

On average, there were about 1260 frames to be processed for a 3.5 hours exam. It took approximately 4 seconds for the pipeline to process an image and more than 75% of the processing time was spent on person and text detection of AWS Rekognition.

Exam proctoring pipeline evaluation

After the pipeline finished processing all the exam videos, we manually compared the incident reports generated by the pipeline with those given by supervisors. 

Overall, the processing pipeline was able to correctly detect 22 out of 27 incidents. 

The 5 missed detections were mainly caused by false positives (the pipeline detecting a student while they are not in front of the camera) and can be easily verified by the supervisor in case of a real-time solution with a built-in feedback loop. 

Surprisingly, there were also 2 new incidents detected by the pipeline that the supervisors missed. It is also interesting to note that the pipeline produces a report that is objective, as current reporting procedures and the way incidents are categorised are often affected by a supervisor’s subjective view.

Looking at the scalability of AWS Rekognition

The performance of the AWS Rekognition APIs is naturally expected to decrease if the faces in the image are too small. In order to understand the scalability of our solution and push the boundaries, we have tested our solution on a larger Zoom setting. The upper limit of Zoom’s gallery view is 49, and we managed to recruit 25 and 42 participants from the Mantel group to participate in two simulated Zoom settings and help us test out the performance of AWS Rekognition. 

We find that AWS Rekognition is performing at an optimal level in both scenarios, with the results of face comparison for 42 participants presented below in Fig. 2. We also successfully perform text and person detection on this larger group setting (not shown for privacy considerations). 

Based on these results we conclude that the AWS Rekognition technology is not a limiting factor for the scalability of the solution, and the exam proctoring solution will be able to handle a larger Zoom setting.

Fig. 2 – Face comparison results for 42 people

In summary – creating a exam proctoring tool that actually makes things better for everyone

In this blog we presented an exam proctoring assistant that doesn’t invade students’ privacy due to software installation requirements, and it works as a support tool rather than a replacement for humans in the loop.

The solution can assist a supervisor to handle identity verification and the heavy load of accurately detecting when a student is present or away, so the supervisor can focus on the important parts, helping the student with their enquiries and removing any blockers that might occur during the exam.

An advantage of designing a solution that works with images rather than videos, and uses face comparison on an on-demand basis for student identification makes the solution truly affordable.