How to Answer Questions about Statistical Significance and Model Selection in a Data Science Interview

Model selection questions require advanced answers that focus on your process for Applied Machine Learning. A theoretical answer will not get you hired.

Vin Vashishta | Originally Published: October 13th, 2020

Book time with me for Career Coaching or sign up for my Business Strategy for Data Scientists class.

This is a complex and common line of questions you will encounter during a data science interview. You can give an accurate answer and still bomb an interview. You need to know how to communicate your expertise.

Applied machine learning is real world, results driven. Hypothesis testing, statistical significance, model selection, optimization, iterative training, and testing cycles are all the basics of building a solution. Answers that get you hired cover foundational statistics and their applications. Without both of those parts, the interview is all but over right here.

Why Are You Getting Asked Statistical Significance Questions?

Interviewers have multiple assessment criteria:

  • Your comprehension and synthesis of applied statistics.
  • Your capability for model evaluation and selection.
  • Your ability to objectively support model performance in production.

  • How Do You Statistical Significance Questions?

    These are process questions. Your answers must cover statistics first principals, application to model evaluation, and impact on production ready models. Even a question that sounds simple needs a process answer.

    What is a Null Hypothesis?

    “There is no single answer here, only tradeoffs. Foundationally, a Null Hypothesis is the beginning of an experiment, its purpose. We can evaluate existing data to create a best guess about some condition. The support or lack thereof for that guess is simplified using a Null Hypothesis.

    Now where do we go from here? We have made an assertion that is supported by some amount of data. The level metric of support is one of a large range of statistical significance measures. There is no one size fits all, or standard approach based on the problem or model evaluation type. There are too many factors.

    Now we must be more rigorous in our experiment. Can we provide a level of proof for our assertion which will stand up to review?

    The Null Hypothesis and initial data exploration have indicated an experimental design. The statistical significance measure has given us supporting evidence that the experiment is worth performing. From this foundation, we move forward with everything from model selection to model performance evaluation.”

    If you evaluate this answer strictly using statistical concepts, it is incomplete at best and could be considered wrong at worst. However, foundational statistics operate under precise assumptions which do not extend to the real world. For that reason, an applied answer is different than the theoretical answer.

    I have walked you through my process. The Null Hypothesis starts as a business problem or model selection problem, etc. I either validate or refute based on a subset of available data if there is a lot or using all available data if there is very little. I use this as the justification, or a reason to abandon, the experiment I designed based on the initial problem statement and solution(s) under consideration.

    Your answer must explain, first that you have a process and second that process is foundationally sound. You must synthesize statistical concepts and demonstrate your capability to apply them in the real world.

    What is a Significance Test?

    “I am going to make an assertion and you are going to challenge my assertion and I am going to have to support my assertion. I can use a significance test to provide some level of support for my assertion and it is possible you can find a significance test which might call my assertion into question. You can also find fault in my use of that specific significance test because each has them.

    I spend time researching my specific problem, solution, and data spaces’ potential significance tests. The nuances of their shortcomings are the focus of my research. I will use the significance test metric to support my model selection, my model performance estimation, sometimes diving into supporting model parameters, etc.

    I research them because part of my presentation needs to cover my blind spots. It is insufficient to present a score alone. A thorough defense or justification requires me to explain what I cannot support. That allows for informed peer review.”

    That is the point of all this, review. These lines of questioning start with modes of evaluation but lead to rigorous evaluation frameworks. In very small teams, evaluation could be the responsibility of an individual. That is scary because there is no safety net. If you make a mistake, it ends up in production and could blow up at any time.

    The capability to self-evaluate in small teams and peer evaluate in larger teams is what you need to demonstrate to get hired. Your answer must give interviewers confidence in your ability to be a contributor who has a deliberate, mature approach to model selection and validation.


    I have skipped over customized model and significance test concepts. There is too much depth to cover in a post of any size.

    As you answer interview questions, keep your communication objectives in mind. Your goal is to improve your odds of getting hired. You have a limited amount of time and attention. Your answers must convey capability over memorization.