Graffiti image recognition

Problem

Urban graffiti is a significant problem in the United States and is no less prevalent in Orange County, California. Graffiti is a form of vandalism that can significantly reduce property values and is a key indicator of crime and gang activity in the area. Orange County spends roughly $1 million annually to repair and restitute the damage caused by graffiti vandalism. To combat the damage inflicted from graffiti vandalism on homes, businesses, government buildings, and other establishments, the Orange County Sheriff’s department built an application called TAGRS. Established in 2008, TAGRS is a shared service platform that allows law enforcement officers and professional graffiti repair companies to document graffiti vandalism damage and tag it to a specific ‘moniker,’ defined as unique symbols of identity used by vandals and gangs. It also allows investigators to assess the amount of damage caused by a particular vandal and apply a restitution value for damages if the vandal is convicted in court. Since TAGRS inception, the Orange County Sheriff’s Department has provided access to dozens of law enforcement jurisdictions throughout Southern California. In collaboration with these departments, TAGRS has accumulated upwards of 800,000 graffiti images that are stored and documented, many of which have been annotated by graffiti experts. While these numbers are impressive, the process of analyzing this volume of graffiti is difficult for investigators to justify.

Approach

The Orange County Sheriff’s department approached California Polytechnic State University’s Digital Transformation Hub (DxHub) powered by Amazon Web Services, to inquire about the possibility of augmenting or automating the annotation and analysis workflow of existing graffiti tag images stored in the TAGRSapplication. This streamlined process would enhance law enforcement’s ability to approach graffiti vandalism cases with insights relating to the magnitude of damage that a vandal has created. During the innovation and solution workshops, the depth and quality of the graffiti image data was assessed and the ideal workflow of a Machine Learning (ML) application was conceptualized. After careful consideration, the DxHub decided to work with a select subset of imagery that has been pre-annotated by its specific moniker, and has over a thousand images per moniker. DxHub staff and students have created a proof of concept ML model to demonstrate the performance and training requirements needed to assess the basic feasibility of applying this technology to the TAGRS application.

Technology

The images were initially labeled with just what moniker was in the image, not where the moniker was located within the image. This led the team to start with trying multi-class image classification as the basis of our ML model. 

The team then split the data into three distinct sets to train, validate, and test the ML model. The training dataset contained 80% of the data, and was used to train the model through supervised ML. This practice informed the model when it correctly vs. incorrectly assessed the contents of an image, with the goal of improving accuracy over multiple iterations. The validation dataset included 10% of the data, and was used to test how well the ML model was assessing new images, and not detecting image noise (i.e. a bird sitting in the background of an image) or misinterpreting graffiti images (i.e. graffiti on a train versus a train in the image background). The test dataset contained 10% of the data, and was used at the end of the training/validation process to test whether the ML model was ready to assess monikers in unexperienced images.

Next Steps

Using Amazon Sagemaker and its built-in image classification model, the team reached 76% validation accuracy, after tuning hyperparameters with two monikers that contained the largest data set. While the validation score is promising, the team will continue to explore other solutions like object detection, data augmentation and other techniques, to improve the ML model’s accuracy and achieve better results for monikers that have less image data collected. 

About the DxHub

The Cal Poly Digital Transformation Hub (DxHub) is a strategic relationship with Amazon Web Services (AWS) and is the world’s first cloud innovation center supported by AWS on a University campus. The primary goal of the DxHub is to provide real-world problem-solving experiences to students by immersing them in the application of proven innovation methods in combination with the latest technologies to solve important challenges in the public sector. The challenges being addressed cover a wide variety of topics including homelessness, evidence-based policing, digital literacy, virtual cybersecurity laboratories and many others. The DxHub leverages the deep subject matter expertise of government, education and non-profit organizations to clearly understand the customers affected by public sector challenges and develops solutions that meet the customer needs.