E-Scooter Counting on Sidewalks with Machine Learning


In 2018, the City of Santa Monica launched a pilot program that allowed commercial operators to deploy dockless e-scooters throughout the city. The scooters are referred to as dockless because they can be rented via smartphone and be returned anywhere within city limits. The introduction of this new form of transportation has strained the city’s resources and ability to monitor and enforce the transportation medium the scooters are being ridden on, such as streets or sidewalks. The city has tried several different methods of preventing scooter riding on sidewalks, such as the implementation of signage, infrastructure, education and outreach campaigns. Despite this, sidewalk riding is still one of the most prevalent issues regarding e-scooters facing city officials.

The physical and cultural clashes between pedestrians and riders have caused a backlash against e-scooter companies. These complaints could jeopardize the implementation of this alternative mode of transportation for the people of Santa Monica. While the city knows that the public has concerns with e-scooters, the overall magnitude of the issue is not well understood. The only data currently available to highlight these difficulties are public reports of conflict, enforcement records, and citation metrics. The options for measuring the problem are limited and time-consuming, including stationing city officials at intersections to record e-scooter activity. Due to the inefficiency of this method, the city decided to look for other ways to solve the problem.


The City of Santa Monica approached the Cal Poly Digital Transformation Hub (DxHub), powered by Amazon Web Services (AWS), with a desire to measure the magnitude of dockless e-scooter sidewalk riding and measure the potential likelihood of conflicts with pedestrians. To accomplish this, a solution was identified that would utilize a machine learning algorithm on data from video recording devices to count e-scooter traffic on sidewalks and in the streets. To realize this solution, existing camera feeds would be processed by a specially-trained machine learning algorithm that recognizes specific sidewalk activity from past, present and future video data. It is worth noting that the machine learning model will only identify that there is a person on a scooter, but does not have the capability to determine the identity of the rider. This algorithm will produce statistics for city planners to use as a robust measure of locations and times that are most popular for e-scooter sidewalk riding. The city will also be able to use this data to inform pedestrians about areas where people should increase awareness of e-scooter presence. The data will be used to support city planners, law enforcement, and city officials in making management decisions and policies that are based on evidence. By utilizing this information, the city staff will be able to better implement interventions such as infrastructure improvements, signaling and signage, education, outreach programs, and other policy strategies.

Technical Solution

To count the number of people riding on the sidewalk and on the street, two machine learning algorithms are used. The first is object detection, which is used to draw a box around people riding scooters in the video. The second is semantic segmentation, which identifies the area in the image that represents the sidewalk and the street. Using Amazon Sagemaker Ground Truth, images from traffic cameras are labeled with bounding boxes around people riding scooters and these labeled datasets are used to train the machine learning models. The team chose RetinaNet for its object detection algorithm. RetinaNet is specifically designed to be able to detect many objects in an image or video without requiring excessive processing power. Using Jupyter Notebook, a code development service, in combination with Amazon Sagemaker, the team was able to train the model to detect people riding scooters and draw a box around them. The same process was followed to train the semantic segmentation model with the training data for it provided by the Mapillary Vistas dataset. To count the number of people riding on the sidewalk and the street, each frame of the video is run through both machine learning algorithms. Code then overlays the bounding box from the object detection algorithm over the results of the semantic segmentation algorithm and a calculation is done to determine if the rider was on the sidewalk or the street. A count is kept of all of the riders that have been seen on both the sidewalk and the street. To facilitate this, a web interface was made using Django web framework hosted on an Amazon EC2 instance.  The web interface allows the end user to upload any footage from a traffic camera and then facilitates the processing of the video through the machine learning algorithms. It outputs a video showing the bounding boxes as well as an overall count of riders on the sidewalk and the street.


Going forward, the City of Santa Monica can better understand the issue by using this more efficient approach of measuring the magnitude of dockless e-scooter sidewalk riding and the potential likelihood of conflicts with pedestrians.

Supporting Artifacts

Amazon’s Working Backwards process results in several artifacts that help inform and guide the end result. Below is a description of each and their purpose in the process:
Press Release During the Innovation Workshop, a fictional Press Release is drafted. This is a tool that is used to define the solution and why it matters to the customer.
Storyboard A series of frames designed to illustrate the problem and the impact of the solution visually.
Source Code All of the code and assets developed during the course of creating the prototype.
Scooter Detection Example Scooter Detection Web Interface These videos highlight some examples of the prototype and demo the working solution.

About the DxHub

The Cal Poly Digital Transformation Hub (DxHub) is a strategic relationship with Amazon Web Services (AWS) and is the world’s first cloud innovation center supported by AWS on a University campus. The primary goal of the DxHub is to provide real-world problem-solving experiences to students by immersing them in the application of proven innovation methods in combination with the latest technologies to solve important challenges in the public sector. The challenges being addressed cover a wide variety of topics including homelessness, evidence-based policing, digital literacy, virtual cybersecurity laboratories and many others. The DxHub leverages the deep subject matter expertise of government, education and non-profit organizations to clearly understand the customers affected by public sector challenges and develops solutions that meet the customer needs.