Data Scientists need to understand the tools and technologies they use to build models. Especially for these types of questions, memorization will not get you hired. You need to go beyond showing knowledge of, to showing competency with.
In some interviews, these questions are not explicitly asked. You need to know when and how to include them in answers to broader questions.
Why Are You Getting Asked Technical Questions?
The interviewers have a few objectives:Assess your coding maturity.Assess your ability to develop in a team.Assess your familiarity with basis development tools.
What Kind of Questions Can You Get?
These are high level questions, and you can get a number of variations, or no question at all. You should cover these technical concepts in any case as part of model development or project work/prior experience walk throughs. If you are given a sample project, make sure you incorporate these into your final submission or walk through during the interview.
What are Coding Best Practices?
”I write code to be readable, maintainable, easily reused, and to fit the team’s development standards. Comments and documentation should explain my code, so it is readable by data scientists and people outside of the field. I format my code to match the team’s style guide. I include unit or functional tests where possible.”
The basics of best practices are look and feel more than functionality. Your answer needs to focus on your part of a larger team. I have explained that my code does not exist in a vacuum. Coding is always a team sport even if you start out as the only team member. Show your basic technical maturity in this answer.
Do you prefer R or Python?
“I am more familiar with Python so that is my go-to. They both have support and advanced functionality for machine learning, I just know Python better.”
This is a bear trap question. Use this answer and replace R for Python if you have used it more. Space vs tab indenting and variable naming conventions are other bear traps. Mature developers answer these questions diplomatically. You must answer with flexibility and without taking shots at different approaches.
What source control have you used?
“I am most familiar with Git. I version both code and datasets. I will use branches for different versions of the model especially when I am customizing. Detailed comments have saved me when I needed to rollback after multiple, unsuccessful improvement attempts.”
In Data Science there are a lot more types of source control options. I like Git but replace your favorite or what you are most familiar with. Hit the basics of versioning data and code. This is a basic question, and your answer must speak to your maturity with applied machine learning in a team environment.
What are unit tests and functional tests?
“Unit tests are code written to run during the build or check in and validate individual component functionality. Functional tests validate, as the name implies, core functionality. Unit tests for models are time consuming to run and can slow down check ins and builds. I use mock objects to speed up unit tests as much as possible.
There is a time balancing aspect to unit testing. The first version often gets released with critical unit tests if there is a time crunch. During maintenance and improvement, we can go back and add coverage.
I have used Unittest, PyTest, and some TensorFlow test. However, I spend a bit of time Googling if I haven’t written unit tests in a while.”
Unit tests are necessary, so you need to be able to talk about the basics. You may be asked to describe what unit tests can cover. The answer is everything. You can unit test data. You can unit test training and on and on.
Unit testing is code simple, implementation complex. For machine learning, unit testing requires a lot of judgement. My answer brushes on that by referencing the balancing act of coverage, impact on check ins and builds, and realities of deadlines.
Data Scientists write unit tests because most downstream developers do not understand models well enough to write effective unit tests. This is an applied machine learning concept. Answering this question or including it in a project description gets you hired.
You do not need to be able to pass a software developer’s test to be an excellent Data Scientist. You need best practices to get hired. You need to be able to walk through at least these questions to show a level of maturity. Applied machine learning is not knowledge of, it is capability to build. Displaying that in an interview will get your hired.