Using machine learning to improve student safety
The Cal Poly Digital Transformation Hub (DxHub), powered by Amazon Web Services (AWS), engaged with school administrators, to understand and determine the best approach to improve student safety at school sites. Using Amazon’s “Working Backwards” philosophy the team initially met with subject matter experts within the district that included a principal, the director of school safety, a local police officer, school staff, and IT administrators, to better understand issues impacting student safety and ideate potential solutions. The team identified items like impending weather events, wild animals on campus and student fights as examples of things that impact student safety. The team explored how we might address some of those issues so we began to review what type of information was available as input. Given the schools existing investment in on-site cameras, a decision was made to leverage those cameras to quickly inform administrators about possible dangers that could be inferred from the cameras using machine learning.
In a crisis information flows in from many sources and can often get diluted or get relayed incorrectly and the facts become obscured. This haphazard flow of information creates chaos when responding to events even for highly trained personnel. Teachers and staff members need to know about threats and how to appropriately communicate and respond to them as soon as they emerge in order to maximize the likelihood that students and others on campus will remain safe where every second matters. Since the district already had a large number of cameras at each campus, real-time monitoring by humans was not practical. Given all the possible ideas we could explore, the team wanted to try to build a computer model to see if it was possible to evaluate real-time footage of public areas and detect a fight as soon as it began. Once a fight was detected an image and the location of the fight could be relayed directly to school administrators to verify.
Innovation in Action
To train a computer vision model, you need sample data of what a fight looks like so the computer can understand the characteristics as a human does. To do this you need sample scenes that include students fighting and students behaving as the normally would to train the model on the differences. The team built an automated solution that could ingest camera footage as video in AVI format and convert that video into sliced images. This conversion was triggered automatically on video upload to Amazon S3 while the conversion was done in Python using Amazon Elastic Container Service (ECS). Once the videos had been converted and split into images, they were uploaded and ready for labeling.
Using Amazon SageMaker Ground Truth, the team labeled each image as a fight or no fight for later training. While the initial approach attempted to identify the actual fight itself by drawing a box around each set of students fighting, this dataset proved less useful. By switching from object detection to binary classification the model could leverage more information in the image, like students forming a circle around the fight or pulling cell phones out of their pockets to record the event.
The input dataset was curated carefully to select images from the same scene when viewing a particular fight vs no fight so the model could clearly evaluate subtleties between the scenes. Once the dataset was finalized the team used Amazon Sagemaker Notebooks to train the model using the PyTorch framework and transfer learning based on the pretrained ResNet 34 image classification model. Special attention was needed to make sure all images had the same input resolution.
False Positive Example
After several iterations the resulting model was able to achieve 93% accuracy with 83% precision in our initial test dataset. After training and testing was complete the team was asked to deploy the model at northern Colorado School Innovation Center during the summer as a proof of concept. Amazon SNS was used to send and alert subscribers as soon as a fight was detected. This text message included a picture of the fight for the human in the loop to verify. Notifications were received within seconds of the initial detection.
In our field deployment at the Innovation Center the results were very promising. While no actual fights between students occurred at the center during our testing, staff at the school were able to mock fights to test the model. In multiple examples, the system was able to detect a fight within seconds of it starting and deliver alert texts in near real time. The model correctly identified the staff pretending to fight along with a few false positives. When the false positives were analyzed, it was clear what characteristics were selected to predict a fight. In the anonymized image to the left, students are in a circle around two students that are leaning in toward each other. These characteristics also resemble that of a fight. The model was retrained with this type of additional data to lower the number of false positives. This solution can also be extended to include other use cases as needed.
The DxHub team sees a promising future for this solution and are working with different interested parties on how this technology could integrate into future products that promote student safety. We are looking forward to see how others continue to improve its features and functionality and what other use cases follow.