I am going to summarize and review a large body of work, covering research from 2005 through to present day. I include my own experience to fill in practical details as well as implementations from Microsoft and Facebook. This introduction covers both theory and application.
Adversarial Machine Learning (AML) has a lot of moving pieces and gaps in coverage. Constant advances and research keep the pieces moving around.
AML is a 3 headed monster. Models, threats, and mitigations are all responding to each other. An advance on 1 front has implications for the other 2. Threats and mitigations are built competitively. Threats are partially built in the blind to model and AML researchers. That is why AML is difficult to learn.
I am going to build forward from very simple summaries of attacks, vulnerabilities, and mitigations. Plain language to start is a powerful mental warm up. Skip the next 3 sections if you already have good definitions for those key concepts.
What is the Machine Learning Version of Hacking?
Attackers want to highjack your model. They are going to use an exploit to get in and there are several points where they can find a weakness. Attackers use that weakness to gain control of the model.
It helps to start out thinking in familiar terms. I get a virus onto your computer and I…Encrypt all your files or delete everything. That attack was meant to break things.Turn it into a bot that I can use. The attack was meant to intentionally control your computer.Steal data from it. That attack was meant to get access to something private.
The machine learning version of attacks have similar goals. An attack that gets your model to lose accuracy is meant to break things. An attack that gets your model to serve a specific, incorrect inference result is meant to intentionally control. An attack that gets enough inferences to reverse engineer your model is meant to get access to something private or steal your model architecture details.
Facebook is working to stop NYU researchers from gathering users’ data through a voluntarily installed browser extension. The researchers are creating a novel dataset to understand how Facebook targets user segments to serve political ads. This is a reverse engineering attack, and it has Facebook worried enough to take legal action.
What Makes a Model Vulnerable?
Traditional security lapses apply to machine learning training data. An unsecured database or a dataset downloaded from a malicious actor can lead to a pre-exploited model. In the Facebook example, their ad serving model is vulnerable to inference data gathering. The model serves an ad to a user and both data points (ad type and user segment) are gathered across several users.
Most models mimic a system. The way the model works and the way the system under measurement works are different. The greater those differences, the more vulnerable the model is to an adversarial attack.
The gaps in the model’s performance allow an attacker to find input data edge cases that cause model inaccuracy. Sometimes the attacker just wants to expose the model’s flaws. Other times they want a hedge fund’s algorithmic trading model to run up the price of a stock the attackers own.
Models can get attacked during training and learn to respond predictably but inaccurately. That is the basis for adversarial learning. It is a threat mitigation too. If adversarial training is used to build or secure the model, a lot of data edge cases get covered during training.
In another vulnerability case, the model is not trained (through poor validation and security) to respond predictably to data edge cases. Data is maliciously engineered to be out of sample and the adversary eventually learns what edge cases are exploits.
What Makes a Model Secure?
You now know one way to make a model more secure, adversarial training. There is a pattern to current AML threat mitigations. Reduce the total number of possible data edge cases. If that sounds like brute force, you are not wrong.
There are threat detection mechanisms. Many involve using a second model to discover attacks. The attack detection model applies some default behavior to prevent the main model from being exploited. Training a model to detect attacks is limited by available training data and has the same vulnerabilities as any other model.
The most effective, in my opinion, mitigation is to model the system under measurement rather than the dataset. That is expensive and difficult to justify for simple business cases. A middle ground is explainable machine learning. Understanding how your model works also exposes what data edge cases it could be vulnerable to. Capability awareness is a similar model security strategy.
There are few Red Teams working from the attacker’s perspective to secure models. AML research is model centric. Traditional security teams are infrastructure and software centric. Making a model secure is a combination of the two.
The Foundational Paper
Adversarial Machine Learning has been worked on for at least fifteen years. This paper was my introduction to the field. It made the case for real world implications of AML. Prior to this most work was interesting but not practical. This is the start of what I call AML in practice.
In this paper, the authors explore reverse engineering a model to build targeted evasion attacks. They focused on spam detection, but broadly the concept of misclassification. If you look back at the earlier bullet points, the proposed attacker stole something private to learn how to intentionally control the model. The word soup (evasion attack, misclassification) gets dense so refer to those early sections for a baseline if you need one.
This paper explains that some models are too difficult to reverse engineer while others are relatively simple to. They specify a method for calculating the cost of reverse engineering attempts. The point of reverse engineering in this example is to learn how to word an email to evade detection by the spam filter.
Misclassification is achieved by sending an input data edge case, the specifically worded email, which causes the model to respond in a predictable (by the attacker), inaccurate (classifying the email as not spam) way.
Removing some words, like the name of the product the spammer is trying to sell, has a high cost and the authors detailed a way to calculate that cost. Some evasion methods worked but were not useful. The email could succeed in fooling the spam filter but fail to deliver its message.
Dominant strategy is an email that evades detection and has a high chance of getting the desired response. In this case, response means the marketing email is successful and moves the customer forward in the sales funnel/closer to buying the product. The paper uses real world examples of reverse engineering trained spam filters then successfully applies their research to find dominant strategies.
This whole framework is still in use today. Most of the research I will explore going forward are variations and improvements on this theme.
In 2013, I worked in an area of the Business Strategy field called Competitive Intelligence. In 2015, I started reverse engineering competitors’ models using social data and inference data gathering. My objective was to learn the functional capabilities of their models so my clients could understand the product landscape. I discovered exploits as well. I want to emphasize this is happening in practice and is no longer an academic exercise.
The Modernization and Advancement of AML in Practice
Ian Goodfellow, supported by many others, has published some of the most thorough research into AML in practice. I want to highlight three papers in particular and this blog post. In 2013/2014, this paper explained why models are vulnerable to adversarial attacks.
That explanation focuses on two main points that I simplified in the What Makes a Model Vulnerable section. Deep learning models, especially when this paper was written, can be opaque. Using incomplete validation methods, data scientists can assume their models generalize when they really do not.
The authors extended explainable machine learning research by deconstructing models. They found a governing dynamic present in many model types. The models were reconstructions of the training data in different forms. There were hard boundaries around classes and large gaps in the high dimensional data spaces between them.
An offshoot of that makes models vulnerable to adversarial data. The authors exploited this vulnerability in a generalizable way. They described a method to easily discover the data edge cases then minimize the noise required to cause a misclassification. In this case, dominant strategy caused model misclassification using data edge cases that also avoided human detection.
This is the classic image example where I create a picture that a person can accurately identify but the image recognition model misclassifies. Think stop sign for autonomous vehicles or people identified as animals, etc.
The cause of this vulnerability was clarified in a subsequent paper. This time the authors identified and refuted proposed mitigations for data edge cases. They also introduced, maybe clarified is a better term but I give them credit, the concept of adversarial training. Adversarial methods have advanced forward from there.
They also touch on the capability awareness of machine learning models. Some models give more accurate measurements of certainty than most. Capability awareness is a strong defense against data edge cases because the model does not overestimate the accuracy of its own inference. The model avoids the vulnerability by saying, “I am not sure.”
In the third paper, the authors detailed AML in practice. They conducted real world, successful attacks. Model security often did, and still does, get labeled as an academic thought experiment. This was an important proof to help change that.
Recent AML Attack Evaluations and Frameworks
Research in 2017, and much of it going forward, fell into two categories:Proving Threats to Models in Real World EnvironmentsDiscovering Hypothetical Threats and Proposing Mitigations
Much of it focused on computer vision. As the first paper showed, natural language and cybersecurity/anomaly detection have similar vulnerabilities. 2015 and 2016 revealed the vulnerabilities of recommender systems. Content moderation has pushed Facebook, Google, and others to advance AML in practice.
Efforts are ad hoc and uncoordinated. Companies are working with better structure and publishing regularly. However, that work is hard to find unless you are actively looking for it. There are gaps in research which are not obvious. AML needed structure.
Summaries like this one worked to cover existing threats and pull all the AML research into a practical framework. The starting point defines what can be attacked. The authors pulled from multiple other works to define the model development process as a data pipeline. That became the model Attack Surface.
Research explored each phase of model development for vulnerabilities. They proved how an attacker could exploit that vulnerability and detailed the potential impacts. The authors, and many others, pulled that body of knowledge into categories.
The first three sections in this post are an overview of most categorical assessments:The Attacker: Intent and Capabilities.The Model and Data Pipeline: Vulnerabilities, Exploits, and Implications.Mitigations: Preventing and Responding to Threats.
Attacker capabilities describe how well they understand the model and the level of access they have to the data pipeline. There are three main categories of model understanding and attacker capability:Black Box: The Attacker Has No Knowledge of the Model.Grey Box: The Attacker Has Some Knowledge of the Model.White Box: The Attacker Knows Everything About the Model.
Data Pipeline Exploits and Mitigations
There is also attacker access to the input/training data. Microsoft has a very comprehensive outline for how to secure data. That page is part of a larger overview of AML in practice. It is rigorous. I am going to pull from the summary paper and the Microsoft deep dive. I will inject my own knowledge to add specifics from real world applications in these final sections.
Each attacker capability category frames the starting point for a real-world scenario. Training data can be poisoned in a few ways. Most common are accessing the data used for training before it is gathered or while it is being gathered.
These vulnerabilities follow a more traditional Information Security attack type. However, the attackers’ understanding of machine learning training allows them to poison the data to create a specific flaw that they can exploit in the deployed model. These flaws are very difficult to detect once they are part of a deployed model.
Microsoft details a way to defend against data poisoning. Trust Boundaries start with the assumption that all input data is or could be poisoned. To mitigate that threat, they explain the concept of data pedigree also called a chain of custody. This metadata allows for a threat assessment at each stage of data gathering and transformation. The data lifecycle, master data management, data security, and data quality all tie into implementing Trust Boundaries in practice.
Three Attacker Capability Types
A White Box attack scenario most commonly occurs due to a production implementation of a pretrained model. These models are publicly available and well documented. The attacker knows the training dataset, algorithm details, and the trained model architecture. This level of access allows for a complete exploration of vulnerabilities and exploits. Attacks are targeted and effective.
Grey Box attack scenarios imply the attacker can make educated guesses about the pipeline. This is very common. Specific applications of computer vision, natural language, recommenders, etc. have a limited set of algorithm choices. Most implementations have little to no customization. Training data characteristics can be inferred by the company’s access to third party and internally gathered data. This allows for a generally targeted, best guess exploration of vulnerabilities and exploits. That greatly reduces the number of exploit attempts to be successful.
Black Box attack scenarios require the attacker to either apply exploits blindly or reverse engineer the model to reveal vulnerabilities. The effectiveness of a Black Box attack is related to the attackers’ access to the model as well as its complexity and hardening. These attacks are most likely to succeed when the target model is built using an immature development process. Legacy models, partially monitored or maintained models, models built using poor validation practices, and simple, unhardened models are all examples.
Recent Work on Attacks and Mitigations
Summary articles and Microsoft’s outline provide a list of specific attacks. This list is constantly growing. Mitigations follow the same documentation and growth, mostly reactive to new vulnerability discovery. Mitigations and attacks are being built to generalize against each other.
Generalization in both spaces pits algorithm against algorithm with the intent of creating a dominant attack or defense strategy. Defense algorithms maximize robustness against attacks by minimizing data edge cases without sacrificing model accuracy and functionality. Attack algorithms maximize error in the target model, in some cases with the intent of maximizing a specific error or failure behavior.
Defense algorithms have two main barriers. Project timelines are limited, and accuracy must be maintained. Research into defense algorithms suggests brute force solutions. Applied research proposes improvements to reduce the effort, maintain accuracy, and increase overall model robustness. These algorithms attempt to train models on every possible data edge case. These defenses are supported by researchers mocking up attacks and mathematical proofs.
Defensive algorithms are forced to rely on brute force because attack algorithms are diverse. There are multiple potential attack teams, and they have long timescales to perform attacks. Those teams can try multiple attack algorithms. As a result of several trials, they may develop a more effective attack algorithm. They have more sample data to train with and that is a massive advantage.
Over the last year, tools that evaluate model robustness against adversarial attacks have been publicly released. These offer a reduction in effort and knowledge required to harden algorithms. Some examples are CleverHans and Adversarial Robustness Toolbox. There are several others and no clear leader as the best choice.
This introduction takes you through 2019 and some of 2020. After reading it, you will be up to date but not on the leading edge. The leading edge is problematic because of the cat and mouse game. Malicious attackers are not part of the research community. Most of the leading-edge attack algorithms are unknown to AML researchers and security practitioners.
The concept of Red Team versus Blue Team is well known in the technical security community as a simulation for attacks and defenses. AML suffers from a lack of people who have real world malicious attack experience. AML is limited to what is discovered by benign actors. Calling anything we have leading edge omits a significant blind spot.
There is also the open question of detecting a successful AML attack. There are logging techniques that address attack detection but no unified framework. Businesses are introducing model maintenance and logging so many are not looking for signs of a successful attack in these early efforts.
These are the areas I recommend you focus on for additional reading. Explainable machine learning (XAI) and reliable machine learning overlap with AML. I briefly reference game theory in this post. In my opinion, it will be part of a more comprehensive algorithmic defense.