I have studied the Data Science career path for the last 4 years. Where do we, those of us in the data science and machine learning field, come from? How did we arrive in the field? What jobs did we have before we entered the field?
There is confusion about education and career path because none of these questions are comprehensively answered. How are you supposed to break into a field when no one defines how to break into the field?
This post is based on my analysis of over 24,000 active data scientists. Prior research and analysis use job description data or recruiter/HR surveys. Those results assume the job description is an accurate representation of the person hired. What I have done is look at the resumes and LinkedIn profiles of people who were hired.
Job Description Requirements vs Profiles of Hired Candidates: They Are Different
Data Scientist job descriptions are very inconsistent. Some variations are expected, and those variations are similar to any other technical job:Company and Business Domain SpecificsTechnology Stack and ToolsProject Specifics
Other variations come from a company’s lack of familiarity with the role.Broad Sets of Deep Learning Domain KnowledgeFamiliarity With non-Machine Learning Supporting Architectural ComponentsAdvanced Degrees for non-Research Roles
Finally, there are the outright ridiculous job descriptions. Research and results have been skewed by these. Myth and misconceptions come from including the absurd in the dataset.
Some requirements in the first two categories of job descriptions are indicative of the person who is eventually hired. These are well connected requirements. Python and R are common between the job requirement and the profile of the new hire. The lowest level of educational requirement is common between requirement and profile. Spark and Hadoop as well as TensorFlow are other examples.
Other requirements are not reflected in the new hire’s profile. Years of experience differ from the original job description. Specific majors and level of education differ. Domain knowledge differs. There are many requirements that do not end up eliminating a candidate from being hired.
Aspiring data scientists should be ignoring some parts of the job description. Apply for jobs you are qualified to do. If the job description asks for 3 years of Python experience, being proficient in Python is good enough. If the job asks for a bachelor’s or master’s in Computer Science, having either degree level in a relevant field is good enough. If the job asks for 2 to 3 years of experience in the field, substitutes or strong project work are enough.
I explored the requirement of prior data science experience further. Looking at purely frequency-based career paths for those who currently hold the “Data Scientist” job title, the results do not support the enforcement of that requirement. Many data scientists come out of academia having held jobs in research or teaching. The next highest percentage come from variations of the data analyst role.
To say that no one’s hiring data scientists without data science experience is false. There are more hires (aggregating external roles) from outside of the field than lateral moves from within it. Research assistant, analyst, and software engineer are all common paths into the field.
Data Science is a diverse field requiring capabilities crossing over several disciplines. It makes sense that:An analyst with data science education would be qualified.A software developer with experience deploying and maintaining machine learning models would be qualified.A research assistant who works on machine learning projects in academia would be qualified.
The concept of equivalent qualifications and/or experience makes sense in a field like this. I built a classification model that takes a LinkedIn profile or resume and predicts the person’s job title. Examining the trained model’s architecture and weights, specific language around topics like Decision Trees, GANs, NLP, CV, etc. were the only reliable features. That conclusion is analytical though strongly correlated with the Data Scientist label and weakly correlated with any other job titles.
I also cannot build clear capability boundaries between analysts, researchers, machine learning engineers, and several other fields. Asking for a definition of Data Scientist is like asking for a definition of a salad. There is no simple answer.
Breaking into the Field
Most aspiring data scientists are doing all the right things to break into the field. They are working on projects as part of their degree program or online training. They are landing internships to get experience in the corporate world. They are working in roles that support data science teams. They are analysts or software developers who reskill into the role.
What I learned from the classifier is the importance of capabilities. Data Scientists’ (and those about to be hired into that role) profiles talk about models in a very specific way. Language varies, but these are some patterns I found. Sentences and sections contain core skills like Python and verbs like built, wrote, implemented, etc. Algorithm types are also connected with verbs like trained, deployed, improved, etc. There is a lot more nuance to most patterns but they all describe applying skills.
This type of active language might explain why some hiring mangers do not enforce all job requirements. There is support for the concept of equivalent qualifications. Breaking into the field allows for multiple sources of capability. It seems like hiring managers can tell the difference between candidates who are knowledgeable and candidates who are capable.
Changing How You Approach Getting Hired
Change how you read job descriptions:Ignore the Ridiculous.Infer the Capabilities They Are Looking For.There is No Such Thing as the Perfect Candidate.Projection. Imagine Yourself in the Job. Could You Realistically Do It?
Projection is a sales tactic that translates well into your job hunt. In projection, I will talk to you as if you already have the job. How has your first month gone? What do you think of your new boss and coworkers? What do they have your working on?
With projection, you change your understanding of the job description from flat words on a page to an actual role within the business. You insert yourself into that role and assess whether you can succeed.
After that thought exercise, you can customize your resume to explain your capability to do the job. As you project yourself into the job, answer the question, “How are you applying your education, project work, and experience to be successful?” Make small changes to each section so they answer that question.
Now you have inferred the capabilities they are looking for and built a resume that uses more active language.
Hiring managers rarely get the perfect candidate. Do not self-screen. If you have all the capabilities, you should apply. Most hiring managers are forced to put in arbitrary years of experience or education specifics. It is an outdated but widely adopted HR compliance practice.
Projection also helps you eliminate roles you are not ready for or that do not match your profile very well. Reduce the number of jobs you apply for by only choosing the ones you can realistically project yourself into.
Answering, “What do you think of your new boss and coworkers?” forces you to look at who you will be working with. This is a good opportunity to network. Many jobs stay open for months. You have time to connect, be an active follower, and build the early relationships that can help you network your way in.
Removing the assumption that a job description is an all or nothing document will change the way you look at breaking into the field. Change the way you look at your skills and focus on applications. Python is a programming language you use to build with. Talk about the process and potential to build rather than just the tool. Change your objective from being the perfect candidate to being a capable candidate and you will have a lot more success on your job hunt.