Adversarial Machine Learning: A Horror Story

Our field has discussed security threats to machine learning for six years. We have built solutions rapidly over the last year. However, we still have a long way to go.

October 20th, 2020

Vin Vashishta

Machine Learning opens an entirely new category of software vulnerability. The concept of exploiting a model is new to many. The potential impacts of a single exploit are difficult to trace. Mitigation and counter measures are poorly understood across the field.

People like Ian Goodfellow have been working on this problem for over 6 years. Research and applications have come a long way during that time. There are toolkits, openly available, to assess and improve the robustness of machine learning models against adversarial attacks.

A lot has happened in the last year on the Adversarial Machine Learning (AML) front. We are in the early stages of applied AML and the real-world use cases are revealing gaps in research. Our initial assumptions are being challenged and I will discuss two gaps I see emerging.

Early Research Concerns

The overwhelming focus of research has been on computer vision. While this is applicable to other classification tasks, mitigation approaches do not generalize well to other applications.

Researchers have validated their assessment and mitigation with canned models. These are openly available, pretrained models or models built from open data sources. This approach to validation does not cover currently deployed models in a comprehensive way.

Most evaluations of in place models depend on the assumption that the validation objectives are comprehensive. The potential flaw goes back to the research’s reliance on canned models and known threats. Models built and trained outside of this limited framework are not evaluated. Applications of research assumes conclusions will be broadly generalized and threats evaluated are comprehensive.

It also overlooks different attacker objectives. Not every attack is designed to force misclassification. Attacks can be focused on reverse engineering the model. By understanding how the model functions, an attacker can use intended functionality to their advantage.

Model Development Lifecycle Adversarial Vulnerabilities

Research has focused on model hardening. Researchers and practitioners are aware of other vulnerabilities in the model development lifecycle. However, there is diversity in model development lifecycles and machine learning capability maturity. The top and bottom end of maturity introduce unexplored vulnerabilities.

Low maturity development lifecycles result in machine learning models that produce descriptive analytics. These models can be overextended to prescriptive analytics uses. In these cases, an attack can be model agnostic. Hardening the model provides no mitigation value.

A company hosts an event with open, online registration. An attacker uses a botnet to create thousands of phony registrations. Basic heuristics can inform the attacker’s strategy to avoid detection. Why? The anomaly detection model is analytical and does not generalize to undetected classes of malicious behavior.

Hardening approaches test generalization to adversarial data for known classes. Immature development lifecycles result in a set of classes that insufficiently represent the true scope of classes. The known classes are easily reverse engineered, sometimes using simple heuristics. Even though the model is hardened, it is still vulnerable to attacks based on model development flaws.

Beyond the impacts on event planning, the registration dataset may be used for additional analysis and model training. The input data is poisoned. Immature development lifecycles do not include model maintenance and logging. This allows models with built in vulnerabilities to be exploited over extended periods.

Data is sold without transparency into the gathering mechanisms. Poisoned data can propagate due to immature data gathering, validation, and transformation processes.

On the other end of the spectrum, advanced, mature model development lifecycles are more complex. There are a limited number of companies that operate at this level of maturity. Vulnerabilities may exist in each phase of the model development lifecycle.

However, a lack of available data and awareness of complex lifecycles makes research difficult. There is significant research coverage of mitigation strategies for complex models. The assumption is model evaluation is sufficient to mitigate threats from prior phases of development. That assumption has not been tested based on attacks targeting a comprehensive set of development lifecycles.

Complex, Robust Deep Learning Model Adversarial Vulnerabilities

Hardened models, functioning as designed, can be exploited by attackers. Recommender systems are intended to increase engagement. Engagement is broadly defined from marketing to social domains. On Twitter, as a benign actor, I want you to engage with my content. Twitter’s timeline recommendation system wants to serve content that keeps users engaged and active on the platform.

The timeline recommendation system is easy to understand. Users who engage with my content see my content more often. The implications for similar content being show to that user is more complicated. It is safe to assume that the subjects of my content have some influence on the recommender.

That fundamental understanding of Twitter’s timeline recommendations has allowed attackers to inject themselves into users’ timelines. I am not going to discuss the variety of impacts because the line between benign and malicious is objectively undefined. Still, this is a vulnerability in the timeline recommendation model.

The model is robust against adversarial attacks targeted at the model. Through network effects, new adversarial threats are possible. This is an example of an obscure vulnerability, with high impact, and expensive mitigation strategies. Simple rules can lead to complex systems that produce unexpected emergent behavior.

Recommender models predict the emergent behaviors under measurement. They prescribe actions to influence those behaviors towards a desired outcome. Attackers have the same desired outcome. The attacker and the recommender are collaborative. That is the root cause of them escaping detection and in many cases, this also amplifies their effectiveness.


These are the two main gaps in research I am working to fill. This post is an introduction to my work. Over the next several posts in the series, I will present solutions. Some are simple, focused on building models using best practices. Bringing the development lifecycle in line with the assumptions of maturity baked into model mitigation strategies covers the first gap. The second gap requires technical solutions. My posts will focus on the proving out the exploit and mitigation.