How do I become a data scientist? How do I get into machine learning? I get this question more than any other. It is a long, challenging road. I will not lie about that. That road is different depending upon where in your career you happen to be. I am going to outline roadmaps from a few common career checkpoints. Before I do, I need to start with some prerequisites.
What You Must Have
There is a great deal of flexibility in the data science skill set. Backgrounds include psychology, physics, math, economics, computer science, biology, and many others. Coding languages are all over the map. Areas of focus and industries have great variation. However, there are a few hard and fast requirements.
You must get math. I do not mean you have taken advanced math classes. I mean they made sense at an applied level. I am biased towards physics, but economics has similar advantages; they teach a student how to apply advanced math, not simply how to solve advanced problems.
Coding and data structures come naturally. Platforms are easy to pick up. I have learned a dozen, maybe more, programming languages over my career. I never had trouble with them because they are just objects, functions, interfaces, data structures, and loops redone in different formats. (I know that Quipper and other quantum programming languages break that paradigm…but let’s not go there now.) That is the level of comfort you need to have with programming languages to succeed as a data scientist. I can go from C to R to Java to Python to C# in the span of a year’s worth of projects. Mongo to MySQL to Cassandra to MSSQL to Dynamo. For real world projects, Excel and SAS will not do the job.
You run towards problems, not away from them. An analytical, solution driven mind is critical to succeed as a data scientist. If you get a little excited every time something goes wrong and get a rush when you solve a problem, you are exactly the right person for this field. Models are like puppies. They need constant attention and you must be ready to clean up a lot of messes.
Starting Point – Recent College Graduate (Bachelor’s)
As a recent grad, you have two jobs over the next couple of years. Graduate from a good Master’s program. Get some hands on, real world experience. Here are my recommendations for Master’s programs.
The Standards:UC BerkleyUSCStanfordMITUniversity of Washington
Excellent Emerging Programs:University of IllinoisNorthwesternArizona StateUniversity of San FranciscoNYU
These recommendations are based on a combination of reputation in the community as well as my experiences with students and graduates. It is by no means exhaustive. I get asked a lot about how important your undergrad program is. It can help to start in the program you want to get your Master’s in but otherwise it is not particularly important. Can you get a data science job without a Master’s? Yes. It will limit your options and might put a ceiling on your career but there are several companies that hire data scientists with just a Bachelor’s.
Your second job is to get some real-world experience. I started working in tech at 19 doing odd jobs installing networks, building servers, creating websites, and administering databases. I graduated with 3 years of experience in my field and a portfolio of projects. Getting hired for my first job was a lot easier.
Getting real world experience can be tricky while you are in school. Internships have a wait list and few companies want to hire a part time, in school data scientist. My advice start going to hackathons sponsored by well-known companies like Google, Facebook, and/or IBM. I was at the IBM Connect conference earlier this year and they had a hackathon associated with it. The winners were chosen at 4:30. By 9:30 they all had job offers in the Watson group.
Contribute to open source projects in the data science and machine learning space. This is another area that companies source talent from. It is also a great way to get your name out in the community. Quality contributions are the start of a strong reputation in the field. Podcasts and videos can do the same. Share what you are learning with others. Interview your professors and colleagues. You could be one view away from your first job offer.
Look for opportunities to be part of research projects. Getting one or two peer reviewed papers published while you are going to school looks great on a resume. An interesting project can become a research paper with a little extra work. Remember that your Master’s Thesis should be published so make sure to select a topic in a field you’d like to work in. Also think about starting a company. A research project can easily be the foundation of a marketable product. Think about practical applications for everything you learn. Running a startup during college, even one that does not get very far, is good project and business experience.
Starting Point – Recent College Graduate (Master’s)
If you are coming out of a data science program with some experience in the field, you are headed into your first job. It will not be hard to find. There is a massive shortage of talent. A few pieces of advice here. For your first job, avoid recruiters. Once you have a better feel for your career path, the field, and what companies are looking for, recruiters are an amazing resource. As tempting as it will be, do not jump on the first offer. Explore your options fully, then decide on the best place for you and your career. Find a good mentor. Having someone to bounce ideas off of who has been there is invaluable.
That last paragraph was for the people who do not really need this article. If you come out of a Master’s program in data science or machine learning with experience and a clear idea of what you want to do, you’re pretty much set. The reality for most people is far different. Many have just graduated from a Master’s program outside of data science. Many do not have experience in the field. A lot of recent graduates are just starting to explore the idea of a career in machine learning or data science.
Start by making a choice between two major branches. Do you want to build solutions or research solutions? Building is applied, practical, and hands on. You enjoy seeing your work in production and in users’ hands. You want to see it in real life and be able to point at products you have built. Research is theoretical and free from the constraints of making it work as a product. Proof of concept is as far as research goes. Most of my career is on the applied side. I give the theoreticians a hard time but the reason my solutions sound so great to my clients is because I draw heavily from research.
If you want to do research, you are headed back to school for a PhD. It is an absolute necessity for the theoretical side. This is typically the path recent graduates outside of the data science field go down. If you want to build solutions, it is time to get some real experience. Refer to the last section’s overview of getting some data science and machine learning experience. If you’re coming from a Master’s program outside of data science and you feel drawn to the applied side, your journey is a bit tougher. Skip ahead to the next section where I discuss coming into the field from a different discipline.
In the meantime, look for a job related to the field. There are a lot of software development, analyst, quant, and research assistant roles which will get you exposure to data science. If you are qualified for one of those, start sending out your resume. You will be promoted into the field as fast as your skills and experience advance. As I said, there is a talent shortage and companies are getting creative to fill their openings. Do not be surprised if your company is willing to pay for training or additional certifications to get you up to speed.
Starting Point – Working Professional
This can be the toughest transition point into data science or machine learning. If you are in a tangential field like data engineering, there is probably only a small skills gap. For a software developer, product manager, scientist, or any other field that does not have much overlap with data science, the skills gap is much larger. Step 1 is assessing your relevant skills.You need a lot of math. Calculus, finite math, logic, linear algebra, probability and statistics, and graph theory at a minimum should be well understood.You need average coding skills across a backend language like Java, C#, or C/C++ as well as a data science/machine learning heavy language like Python or R.You need to know at least one common SQL database and one NoSQL database. Understanding framework pieces like Hadoop, Spark, Amazon ML, and Azure ML is also important.You need to be able to convert a business problem(s) into an algorithmic/model solution. You also need to be able to convert a research paper into a solution to a business problem(s). Math provides the tools and common language. Analytical thinking ties those tools to solutions.Familiarity with data visualization software/libraries and best practices is an often overlooked must have. Excellent communication skills are needed. Without the ability to visualize and communicate results, all the other skills go to waste.
If you have a Bachelor’s or less, you’re probably going back to school. There are several good programs with work friendly options for students. Pick one that works with your schedule, so you stick with it. Your current employer will likely help with tuition. Select classes to fill your individual skills gap. If you have a small skills gap or need a refresher on topics you have not been using, look at one or two certifications. Code boot camps are great refreshers if you have been out of coding for a few years. No matter how great your skills gap, work with your employer on a career path which will take you into a data science or machine learning role. If your current company does not offer one, look for a job somewhere that does. Transitioning within your current employer will be much easier than getting all the skills then making a big jump somewhere else.
Do not let age, skills gap, or your current job deter you from making a change if it is what you really want to do. If you are 40, work in accounting, and have not taken any math higher than Algebra, you can still have a 20+ year career in machine learning once you are done with school. If you have the requirements from the first section and the desire, you will be fine.