Using Soft Skills to Predict Labor Market Outcomes


In countries where there is a mismatch of skills and high youth unemployment, it can be difficult for low-skilled youth to signal their strengths on their CV (resume) to potential employers, and hence find employment. Yet, 21st Century skills (soft skills) are in high demand globally from employers, and these same youth often have extensive soft skills. And, these skills are malleable and can be developed at low to no cost for high impact. On the demand side, it is often a challenge for employers to first understand which soft skills are most critical for the position, and then understand which candidates have the set of soft skills required. A better understanding of which soft skills matter for which industries, and at varying position levels within, will help both demand side as well as program design for unemployed youth to build and assess the soft skills of importance, and signal their soft skills better to potential employers.

The Cal Poly Digital Transformation Hub (DxHub) powered by Amazon Web Services (AWS) collaborated with Cal Poly faculty and students and the World Bank to develop a model that would analyze patterns of soft skills measured over panels of the youth to understand which soft skills drive which labor market outcomes, and especially, which soft skills matter for which industries and job level within (especially entry-level jobs). Using a comprehensive set of publicly available anonymized data from five countries to construct this initial model, the data-agnostic approach would allow for these soft skills to be measured and then used to predict the success of individuals across various industries and position levels.


To address this problem, the Cal Poly faculty and student team developed a comprehensive model through various approaches to extracting soft skills and their success in certain markets from provided publicly available anonymized datasets. While each country and region have varying occupations and skills required, by segmenting results by industry, students were able to start finding patterns with regards to which soft skills were most important in each occupation. Therefore, it is our hope that information which can be further refined through the modeling to then be utilized by education and labor development practitioners to inform the design of skills development programs for unemployed youth to minimize the cycles of poverty.

Innovation in Action

This project was led by Cal Poly students across disciplinary programs. The initial group of students to take on the project were enrolled in the cross-disciplinary Data Science program at Cal Poly. A team of Quantitative Economics Masters level students developed this work further. The first step in this challenge was to efficiently categorize soft skills from psychometric questions, which could then be combined with a model that would be able to predict market success using such categorizations along with demographic variables using supervised machine learning. Once this was complete, students would subset their data by industry to determine which soft skills were more advantageous in certain occupations.

All student teams used Amazon S3 to share and store the large datasets that were needed to train the model.   The teams deployed a JupyterHub server and used AWS SageMaker for statistical analysis and machine learning model development. By using supervised machine learning to classify soft skills and run their support vector machine, the Data Science team were able to identify nine separate soft skills and assign them to each question from their data. Factor analysis was then implemented for dimensionality reduction as students moved towards predicting market success, using salary as their main way of defining success for the U.S. dataset. A multiple regression model was used with soft skills as predictors of market success, which resulted in almost every soft skill having a significant p-value. Upon developing another model to determine which soft skills were the most relevant to market success, students found the three most important soft skills in their data – grit/work ethic, teamwork and conscientiousness — which was then segmented by industry.  The Quantitative Economics student team built upon this work by adding the soft skills agreeableness, openness, and stress to their models. They utilized decision trees, random forests, neural networks, and multinomial logit with a regularization parameter to analyze the data. They found evidence that machine learning and specialized econometrics methods can help place disadvantaged workers into jobs based on their soft skills and personality traits, acknowledging that future research, including the use of additional machine learning and econometric models, is needed. 


Through analyzing the ways in which soft skills can be associated with labor market success in the U.S. using supervised machine learning, such models can be run in other countries with a variety of industries. While comprehensive data is required to ensure that certain soft skills can accurately be attributed to success in certain industries, this is a promising first step in utilizing soft skills for employment opportunities, especially among those who might lack the education and resources for technical skills.

Supporting Documents

Student Paper
This document was coauthored by students in DATA 452 for their Data Science Capstone Project, Spring 2021. Student participants include Ben Cahill, Sahil Bobba, Mason Ogden, and Jay Ahn.
Student Paper
This document was coauthored by students in the Master of Science Quantitative Economics program in Spring 2022. Student participants include Matt Gevercer, Zoe Krieger, Diego Saavedra, Benjamin Schneider, and Benjamin Zwarg

About the DxHub

Cal Poly’s Digital Transformation Hub (DxHub) was one of the earliest collaborations between Amazon Web Services (AWS) and an educational institution focused on innovation and digital transformation. While providing students with real-world learning experiences, the DxHub applies proven innovation methodologies in combination with the deep subject matter expertise of the public sector and the technical expertise of AWS to solve challenging problems in ways not contemplated before. For more information, visit