Using Soft Skills to Predict Labor Market Outcomes


In countries where there is a mismatch of skills and high youth unemployment, it can be difficult for low-skilled youth to signal their strengths on their CV (resume) to potential employers, and hence find employment. Yet, 21st Century skills (soft skills) are in high demand globally from employers, and these same youth often have extensive soft skills. And, these skills are malleable and can be developed at low to no cost for high impact. On the demand side, it is often a challenge for employers to first understand which soft skills are most critical for the position, and then understand which candidates have the set of soft skills required. A better understanding of which soft skills matter for which industries, and at varying position levels within, will help both demand side as well as program design for unemployed youth to build and assess the soft skills of importance, and signal their soft skills better to potential employers.

The Cal Poly Digital Transformation Hub (DxHub) powered by Amazon Web Services (AWS) collaborated with Cal Poly faculty and students and the World Bank to develop a model that would analyze patterns of soft skills measured over panels of the youth to understand which soft skills drive which labor market outcomes, and especially, which soft skills matter for which industries and job level within (especially entry-level jobs). Using a comprehensive set of publicly available anonymized data from five countries to construct this initial model, the data-agnostic approach would allow for these soft skills to be measured and then used to predict the success of individuals across various industries and position levels.


To address this problem, the Cal Poly faculty and student team developed a comprehensive model through various approaches to extracting soft skills and their success in certain markets from provided publicly available anonymized datasets. While each country and region have varying occupations and skills required, by segmenting results by industry, students were able to start finding patterns with regards to which soft skills were most important in each occupation. Therefore, it is our hope that information which can be further refined through the modeling to then be utilized by education and labor development practitioners to inform the design of skills development programs for unemployed youth to minimize the cycles of poverty.

Innovation in Action

This project was led by a group of students from the Data Science Capstone class under Dr. Dennis Sun and Dr. Jonathan Ventura. The first step in this challenge was to efficiently categorize soft skills from psychometric questions, which could then be combined with a model that would be able to predict market success using such categorizations along with demographic variables using supervised machine learning. Once this was complete, students would subset their data by industry to determine which soft skills were more advantageous in certain occupations.

The team of students used Amazon S3 to share and store the large datasets that were needed to train the model.   The team deployed a JupyterHub server and used AWS SageMaker for statistical analysis and machine learning model development. By using supervised machine learning to classify soft skills and run their support vector machine, they were able to identify nine separate soft skills and assign them to each question from their data. Factor analysis was then implemented for dimensionality reduction as students moved towards predicting market success, using salary as their main way of defining success for the U.S. dataset. A multiple regression model was used with soft skills as predictors of market success, which resulted in almost every soft skill having a significant p-value. Upon developing another model to determine which soft skills were the most relevant to market success, students found the three most important soft skills in their data – grit/work ethic, teamwork and conscientiousness — which was then segmented by industry.


Through analyzing the ways in which soft skills can be associated with labor market success in the U.S. using supervised machine learning, such models can be run in other countries with a variety of industries. While comprehensive data is required to ensure that certain soft skills can accurately be attributed to success in certain industries, this is a promising first step in utilizing soft skills for employment opportunities, especially among those who might lack the education and resources for technical skills.

Supporting Documents

Student Paper
This document was coauthored by students in DATA 452 for their Data Science Capstone Project, Spring 2021. Student participants include Ben Cahill, Sahil Bobba, Mason Ogden, and Jay Ahn.

About the DxHub

Cal Poly’s Digital Transformation Hub (DxHub) was one of the earliest collaborations between Amazon Web Services (AWS) and an educational institution focused on innovation and digital transformation. While providing students with real-world learning experiences, the DxHub applies proven innovation methodologies in combination with the deep subject matter expertise of the public sector and the technical expertise of AWS to solve challenging problems in ways not contemplated before. For more information, visit