How to Answer a Data Science Interview Question About Sampling

Answering an interview question is different from answer a question in class or most other settings. The point of your answer is to get you hired. A textbook answer will not get you hired. Getting hired requires you to stand out.

Vin Vashishta | Originally Published: October 9th, 2020

Book time with me for Career Coaching or sign up for my Business Strategy for Data Scientists class.

Why Are You Getting Asked a Question About Sampling?

Interviewers have several reasons for asking sampling questions.

  • Gauge your applied statistical comprehension.
  • Hear your experience with large or complex datasets.
  • Explore your understanding of experimental design.
  • Gauge your understanding of the impacts of sampling on model performance.

  • How Do You Answer a Question About Sampling?

    Sometimes you get a simple sampling question. Name a few types of sampling methods.

    “There are probability and non-probability methods. A few probability methods are random, stratified, cluster, systematic.”

    The interviewer may push you for more or ask about non-probability methods. Use brief answers and try to move on from simple questions quickly.

    Explain random sampling.

    “Random sampling gives each data point an equal chance of being included in the sample. You can use Monte Carlo methods or different sampling methods, for example, to avoid some of the sampling issues you will encounter in more complex datasets.”

    Again, a simple, brief answer. However, notice the last sentence. It opens the door to a more complex line of questioning. That is a key aspect I will come back to repeatedly. Leave openings to showcase your knowledge.

    Explain stratified sampling. If you get asked to define this method rather than simple random sampling, you could be heading down a more complex line of questioning. Anticipate these types of questions. Recognize them immediately and shape an answer to cover multiple bases.

    “Stratified sampling assumes you have done a complete initial analysis of your data. You have begun to explore the system under measurement with respect to multiple variables. Those variables are used to segment different classes of data. Each class is assumed to be important for the model to learn the system more completely. In many cases, stratified sampling reveals classes with insufficient data to model the system completely. Stratified sampling is useful in complex systems models where massive datasets are unavailable for early model exploration or the objective is causal modeling. It is very useful for research in behavior, marketing, and many other experiment or survey design heavy projects.”

    If you have spent most of your time focused on machine learning basics, this answer makes no sense. It does not contain references to Python or imbalanced classes in the way you are probably accustomed to discussing sampling.

    Conclude your answer with follow up questions.

    “Do you want to see a simple implementation of this in Python?”

    You could do a screen share for pseudocode or write a code snippet to show your technical capabilities. Implementations for complex sampling are a sign of applied statistical capabilities. This is real world data analysis and wrangling.

    “Do you want me to walk through a few examples of how to explore datasets to determine which variable(s) I will create my segments around?”

    You would explain how to select the variables so the resulting model would have access to representative sets for each class of data. Think about an example in marketing. Customer segments/cohorts behave differently. A company will often gather several data points about each customer. Which data points should you use to define a segment?

    This is a key concept in initial data analysis and experimental design. You want to show the interviewers that you do more than dump a large dataset into a deep learning model for training. The reason behind sampling is tied into the flaws in that approach. Sampling improves efficiency to accuracy. Make sure your answer covers both no matter what sampling question you get.

    Answer Questions to Get Hired

    Always keep your objective at the front of your mind. Your answers should each improve your probability of getting hired. Optimize your answers. Sampling questions give you an opportunity to stand out from the crowd. Be prepared to take that opportunity.