Teaching Data Science in the Cloud Using Arduino

Overview

Kurt Colvin is a professor at California Polytechnic State University in the Industrial & Manufacturing Engineering department. As a faculty member in Industrial Engineering, he wanted to introduce data science and analytics to students in his department. The evolution of modern technology has resulted in the development of data-collecting devices, which can help inform and guide our approach to solving real-world problems. This often results in a surplus of data and a shortage of people who are able to understand it. To combat this issue, Kurt proposed implementing a new experimental course within the College of Engineering that allows students to collect and analyze large sets of data points. In order to explore emerging technologies and future options for analyzing these data sets, Kurt approached California Polytechnic State University’s Digital Transformation Hub (DxHub), where he inquired about the possibility of integrating the cloud into his class as a method of data analysis.

Problem

Current campus resources require a hardware provisioning cycle and software configuration that is consistent and stable throughout the quarter. These resources would prevent Kurt from perfecting the objectives of his data science course, as he needed flexibility in adjusting lab activities over the course of the quarter based on trial and error, which current resources did not allow. He also wanted to create an environment where students could experiment and fail safely. These solution criteria resulted in Kurt’s Industrial Engineering students partnering with the DxHub in order to build a data-calculating device using an Arduino, an electronic prototyping platform for creating interactive technology. The result of this collaboration was an electronic device that could collect latitude and longitude GPS coordinates, timestamps, and temperatures, among other measurements. Students were then instructed to bring the device with them for their daily commute to campus and throughout the duration of the school day, including in between classes. The data associated with these activities was downloaded and converted into the appropriate format for analysis.

Innovation in Action

Kurt began collaborating with Cal Poly’s Industrial Technology Services (ITS) team to create a virtual lab where each student was granted access to a compute resource in Amazon Web Services (AWS). The evolution of how to use resources in the cloud was a learning experience for both ITS and the Industrial Engineering department. In gaining this new understanding, Kurt determined that he wanted his students to develop skills needed to comprehend data, as well as share results amongst peers in the class. He decided a simple web application with framework that could consume GPS data files and output the formatted content would meet these requirements. As the DxHub team began to architect this solution, developers realized that AWS’s automation layer would allow students to avoid the details of installing and configuring a web server, in addition to eliminating the need for a database install on top of an initial operating system install.

Results

By utilizing Amazon’s CloudFormation language, DxHub developers were able to automate the build of each server in minutes. Each student was then given an EC2 instance that was provisioned with the software required to consume and display the data. Each machine was left on for the duration of the quarter and shut down after 10 weeks. The total cost of this solution was $430 per month to run 26 t2.small instances. One larger instance was deployed to analyze the entire class dataset. The total cost of the project (including data transfer charges, EBS storage, and the larger instance for the class) was $1215.78.

In evaluating the outcome of the project, it was determined that Kurt’s Industrial Engineering students needed workstation to run other data analytic tools. Kurt implemented the use of lab computers the following Winter Quarter to resolve this issue. Additionally, several configuration controls imposed by lab machines limited the flexibility of compatible data tools that could be utilized after the machine image was created. To improve this in future courses, the DxHub will likely pilot Workspaces as an alternative work environment where students can interact with GUI rich tools such as Jupyter notebooks. Lastly, it was found that EC2 instances do not need to run daily throughout the course of the quarter. To preserve unnecessary machine usage, DxHub developers will implement a solution that will power off instances when they not in use. Developers will also construct a simple web application using a server-less solution in AWS, which will allow future students to start up and stop instances with ease. Through implementing these changes, many of these components will form the various facets of a virtual lab, which can be utilized based on demand and reused only when needed. the cost of implementation will decrease, which will make the service both scalable and affordable.

Update

In the Fall of 2018, the DxHub improved the design of the EC2 resources to automatically power down when the resource is idle for 1 hour. This will reduce the number of billable EC2 hours from around 4000 of hours per week to a few hundred per week. Through utilizing various Amazon Web Services (AWS) resources (Cognito, Lambda, API Gateway and an S3 website), students in Kurt’s data science classes can now use an EC2 instance based on a tag name on demand. DxHub developers estimate that these improvements will dramatically reduce the cost of the service.

About the DxHub

The Cal Poly Digital Transformation Hub (DxHub) is a strategic relationship with Amazon Web Services (AWS) and is the world’s first cloud innovation center supported by AWS on a University campus. The primary goal of the DxHub is to provide real-world problem-solving experiences to students by immersing them in the application of proven innovation methods in combination with the latest technologies to solve important challenges in the public sector. The challenges being addressed cover a wide variety of topics including homelessness, evidence-based policing, digital literacy, virtual cybersecurity laboratories and many others. The DxHub leverages the deep subject matter expertise of government, education and non-profit organizations to clearly understand the customers affected by public sector challenges and develops solutions that meet the customer needs.