How to Answer a Statistics Questions in a Data Science Interview

Your answer to an interview question aims at getting you hired. Everyone will tell you what to memorize. Everyone knows memorization does not get you hired. You need to know how to answer a statistics question.

Vin Vashishta | Originally Published: October 8th, 2020

Book time with me for Career Coaching or sign up for my Business Strategy for Data Scientists class.

Why are You Getting Asked a Statistics Question?

The interviewers have a few objectives:

  • Gauge how well you understand the inner workings of machine learning models.
  • Gauge your comprehension of statistical core concepts.
  • Gauge your ability to clearly communicate complex concepts to the team.

  • How Do You Answer to Get Hired?

    Basic statistics, averages to standard deviation. These are core concept questions. Give simple, concise answers. What is Standard Deviation?

    “Standard Deviation gives you a simplistic understanding the characteristics of your data relative to the population/sample mean.”

    Let’s break down that answer. “Simplistic” tells interviewers that you understand the limits of what Standard Deviation can reveal. “Population/Sample” tells the interviewer that you understand the basic concept can be applied to either and there is a differentiation worth mentioning. “Characteristics of your data” tells the interviewer that you connect the concept to an application in data analysis.

    Can you poke holes in that answer? Yes, and that gives you the chance to go deeper. The initial response is both correct and general enough to defend by elaboration.

    Explain Central Limit Theorem.

    “In probability theory, the central limit theorem (CLT) establishes that, in many situations, when independent random variables are added, their properly normalized sum tends toward a normal distribution (informally a bell curve) even if the original variables themselves are not normally distributed.” – Wikipedia

    Wrong. This is memorization, not comprehension. Let’s try again.

    “CLT relates to linear machine learning models in two ways. It allows us to complete significance testing to compare performance between models and it allows us to understand the confidence interval for those models’ practical performance.”

    Go back to why you are being asked statistics questions. The second answer hits all objectives. Notice I did not give the formal definition. I did not blunder my way through the differences between LLN and CLT or drone about the Gaussian and multiple sample means. Comprehension, applied, clearly communicated.

    Again, you can poke holes in the answer.

    Why Does This Help You Get Hired?

    Remember, interview answers are going to be interpreted and often questioned. Accept it and be ready to follow up. Trying to build an answer that covers every possible base will lead to more questions and interpretations, not less.

    Statistics questions are applied math and that makes them different from most questions. There is too much nuance to fully cover and once you start down that road, you are stuck trying to fill in all the gaps. That is not going to get you hired.

    In both cases, my full answer would be most peoples’ concluding remark. This is critical to answering statistics questions. I have extrapolated the concept in my own words. I have spent most of my answer describing applications. Both pieces establish confidence in my understanding to deviate beyond the canned answer.

    That is how you get hired.

    What Else Could You Be Asked?

    There is a lot more ground to cover. This was week one of an introductory statistics class. However, these are two common Data Science Interview questions. In my next posts I will be diving into more statistics questions. Remember, your answers need to get you hired. That requires a better answer.

    If you have specific questions you want me to answer, send them to me on Twitter @V_Vashishta. I will answer the best ones in posts to come.