Can Machine Learning Promote and Improve Diversity in Hiring?

Businesses are setting challenging diversity targets. HR and Recruiting teams are asking their software vendors, “Can Machine Learning build a more diverse talent pool?”

Vin Vashishta | Originally Published: October 7th, 2020

Book time with me for Career Coaching or sign up for my Business Strategy for Data Scientists class.

6 years building machine learning software for hiring and recruiting. What have I learned? Through terabytes of data I have seen the causes and challenges of diversity. Women represent 25%-35% of general software focused roles. Niche roles, like data science, drop that number to 20%. Information security drops it further to 8%-12%.

Those are based on companies with over 10 employees in a specific role category. Expanding research to other protected classes shows even deeper divides. Most large technical teams have less than 10% non-white members.

Smaller teams show the larger challenges for diverse candidates. Smaller teams are overwhelmingly under representative of the general population demographics. Since small businesses account for a significant percentage of roles, their impact on diversity numbers are greater.

Looking at the macro picture, manufacturing, construction, repair/maintenance, logistics, and some sales roles have less than 5% female representation. Non-white males fair better in these roles but still fall short of being fully represented based on location specific demographics.

Aggressive Diversity in the Workplace Goals

The hard numbers are pushing companies like Google to set aggressive, long overdue diversity targets. The business world has spent the last 3 years focused on diversity and inclusion. However, the trends show underperformance for reaching those targets.

HR and Recruiting are asking, can machine learning put a finger on the scale towards more diverse candidates? Yes, and as you would expect, we need to rethink how machine learning is applied to solve the problem.

Existing software suites cannot shoehorn a diversity solution in. Applicant Tracking Systems are a great example. There is nothing in the ATS that can improve diversity enough to meet hiring targets. That is because the data fed into the system is designed to avoid bias against protected classes based on known candidate pools. The way their models are built prevents exclusion instead of expanding inclusion beyond traditional candidate pools. Their approach and datasets cause underperformance.

The Research: Causes and Impacts of Exclusion

Businesses need a higher quantity of diverse applicants. To reach hiring targets, an overwhelming quantity. Machine learning can achieve that by modeling the causes and impacts of exclusion on diverse candidates. Why do existing systems fall short? They are trained on employees and candidates who are not excluded from the process.

When the problem is reframed to look at people who are excluded, the model changes. It seems obvious, if we want to improve diversity, we need to study those people who never make it into the hiring process. What causes exclusion?

Causal machine learning can provide solutions to these types of busines problems. Causal models use data to answer why while traditional models describe what. Research on the impacts of exclusion lines up with the results of causal modeling.

Access to education, skills, upward mobility, and experience are the main causes of exclusion from the hiring process. Bias against protected classes impacts participation, discouraging diverse candidates from applying in the first place. Improving diversity in hiring starts here before a candidate ever applies.

Filtering or ranking applicants is a common machine learning use case. However, implicit filtering starts with disadvantage. Diverse candidates assume their disqualification for roles they can be successful in. The assumption of disqualification is at the heart of why candidate pools lack diversity.

Including Diverse, Disenfranchised Candidates to Drive Diversity

This is the part of causal modeling that got me interested in the first place. Now I am researching the impacts of assumed disqualification. The professional presence of diverse candidates is different. Their resumes and online profiles have different characteristics.

An external candidate search fails to discover and appropriately score them because it is trained on resumes from those people who are included in the hiring process. Simply put, if a person never applies for or identifies themselves with the role, their data is not included in the machine learning system’s training. Those models do not generalize to diversity candidate discovery and scoring.

If a person has been discouraged from career mobility, they may not add skills to their resume with the goal of moving into a better role. Someone who has experienced bias is more likely to have imposter syndrome. Their resumes are less likely to accurately highlight expertise.

Those are 2 examples of the many differences that lead to candidate pools without enough diversity to meet hiring goals. Small changes in the data can cause a machine learning model to eliminate qualified candidates. This flaw hides in most models.


In this post, I summarized my research on diversity. The most important takeaways are support for the concepts of implicit filtering and assumption of disqualification. These are not novel ideas and have been the focus of diversity research. The applications of machine learning to model their impacts is poorly understood. Current machine learning models do not have the data necessary to build and accurately score diverse candidate pools.

Can machine learning promote and improve diversity in hiring? Yes. This post is meant to support that conclusion based on a different approach to building models. In my next post, I will explain how those models achieve that potential in the real world.