How to Answer Questions About GPUs, Optimization, and Scaling in a Data Science Interview

Optimization and scaling are core competencies of Applied Machine Learning. If you can cover these topics well, you will stand out from the crowd.

Vin Vashishta | Originally Published: October 18th, 2020

Book time with me for Career Coaching or sign up for my Business Strategy for Data Scientists class.

Why Are You Getting Asked Scaling and Optimization Questions?

Optimization and scaling are advanced technical interview lines of questioning.

  • You can save your company millions with a few lines of code.
  • Interviewers are assessing what level of model complexity you can build and implement.
  • Interviewers are assessing your familiarity with tools and frameworks.

  • What Kind of Questions Can You Be Asked?

    These questions are contrived for me to build you an answer framework. This is often covered as part of you describing a past project or as an offshoot from a model development question.

    What is the Difference Between a CPU and a GPU? What Impacts Do They Have on Optimization and Scale?

    “The main differences are what operations they are optimized for and how their memory enables those processes. The CPU is a general-purpose processor with a few cores, low memory bandwidth, and high available memory per core. The GPU has more cores and higher memory bandwidth, but a limit on available memory per core.

    GPUs perform repetitive instructions on the same data more quickly than a CPU. GPUs use data-parallelism and handle floating-point operations efficiently making them better at linear algebra for instance. This is the end of the story for most models.

    There is a limit to GPU’s advantage. As arithmetic intensity increases, I need to adapt the code to optimize its use of the GPU. CUDA comes into play here. Optimization at this low of a level is complex and CUDA works with the compiler to make recommendations for optimization.

    NVIDIA provides GPU benchmark tools that show processing and memory usage statistics. Those give me an idea of what needs optimization. A little code can go a long way. I review the code for anything running on a single thread. A lot of older NumPy and Pandas are single thread. Migrating to Dask data frames works to fix that. (Expand with a tangible example here.)

    Running a standalone Spark cluster handles threading optimization and scaling to multiple GPUs. That is another easy optimization win. Batch sizes can be reduced if the profiler shows a memory bottleneck. I have a lot of latitude for changes that impact the training side.

    I usually rely on a Machine Learning Engineer or Software Developer involved in deployment, integration, or maintenance to do a code review once I get past those, and a few other simple changes. The larger the changes I make, the more likely there will be downstream impacts.

    I keep a version compatibility matrix. The dependencies between Python, ML libraries, hardware optimization tools, architecture components like Spark, etc. are difficult to track. I use Docker to manage environments. I use Kubernetes to manage resources and have used YARN in the past. (Expand with an example configuration here.)

    How do I select what hardware to use for a new, complex model? A Roofline Model takes several performance metrics, I do not know all of them offhand, and estimates the performance of the model or components of the model based on processor speed and memory bandwidth. It provides an upper bound on performance.

    This is used to evaluate what hardware will give the best performance. Then we have to do a cost analysis. Hardware costs versus the decrease in training time or inference latency. Latency requirements usually justify the expense of higher end hardware.

    Machine Learning does not really scale infinitely. Eventually there are cost constraints even though we could just spin up more resources. There is also the refactoring of the model code to account for. Legacy models can require a large effort to optimize for GPUs, well beyond the cheap wins I talked about earlier.

    I have only had a few instances where this level of hardware analysis and optimization was required. Most times, long training cycles mean I am using the wrong approach or too much data. It is easier to work through different models or scale back customization. The gains of the more complex approach are difficult to quantify early in a project, so the level of effort is hard to justify. (You probably do not have an example to share. Knowing these concepts is enough even for a mid-level interview.)

    I have built complex models out of necessity, but I have learned to avoid them. Excessive complexity or training data is expensive, especially for maintenance cycles.”

    You need to cover CPU vs GPU at some point during a Data Science interview. You can dive deeper into optimizing Pandas, or others, for GPUs and parallel processing or simply explain coding for parallel processing. If your strengths lean towards infrastructure, elaborate on specific configurations of things like Sparks/Kubernetes/Docker/AWS for scaling and optimization.

    Data Scientists are a different kind of technical. You are not a software developer. You are a model developer. The fundamentals of model development must be covered in your interview answers. Tasks can run in parallel. Tasks can be written for GPUs and CPUs. Tools and architectural components play a large role in building complex models. You must understand these concepts.

    My first exposure to GPUs and CPUs, scaling, model optimization, and reducing latency was building a large pricing model. Recommender models have kept getting more complex and I have had to do the same work for those projects. Even with that experience, I still rely on code reviews to help with optimization and scaling.

    Optimization has changed significantly and continues to change very rapidly. This answer is obsolete even as I write it. NVIDIA has tools in limited release that go beyond what I discuss here. After your answer, interviewers may bring up advances in optimization. They are not contradicting you or saying your answer is wrong. Avoid getting defensive.

    TensorFlow and PyTorch provide additional ground for you to cover in an interview. There are implementation specifics for each. If you have project experience with optimizing models built with those libraries, they are worth discussing in detail. Data training pipeline optimization is another specific area worth spending time explaining.

    I also build in Java and C. If you have experience here, share it. Those programming languages, Scala is also worth mentioning, make you stand out.

    I only added a couple of tangible examples in my answer. Fill in detail about your hands-on experience with an example. People write books about this topic. I have condensed a lot of information into a single page answer. It is not complete or comprehensive, but it is a good guide for you to outline your answer around.


    Talking through optimization shows you are capable of building complex models. Answering these questions will separate you from the crowd.