Subverting AI's Intellect: How to Thwart Data Poisoning

Finger pointing at digital wave
Author: Prithiv Roshan Sudhakar
Date Published: 1 January 2024
Related: Artificial Intelligence Training and Resources

Artificial intelligence (AI) is having a profound influence on nearly every aspect of people’s lives, including driving productivity and improving efficiency. AI-powered systems use algorithms and models to process and synthesize data, learn patterns and dependencies within the data, and make decisions akin to human reasoning. Along with the substantial opportunities presented by AI comes a diverse range of risk and potential harm that can impair AI’s trustworthiness. A notable adversarial attack technique is data poisoning, which involves the manipulation of AI training data sets to introduce biases that can benefit adversaries.1

Enterprises that fail to effectively manage the risk may be putting themselves at a competitive disadvantage.

In the context of AI, data poisoning is far more than an academic abstraction; it is a ticking time bomb that can have profoundly detrimental effects in real-world situations. Imagine the repercussions of an intelligent chatbot spewing hate speech, a compromised email spam filter allowing malicious traffic to pass through or a self-driving car making a grievous misjudgment due to data tampering. These are not just hypothetical scenarios; they are real-world examples: Tay, Microsoft’s Twitter chatbot, turned offensive after a coordinated attack,2 a spammer attempted to skew the Gmail spam classifier,3 and a research study on autonomous driving systems found that fake traffic signs can deceive AI.4

Figure 1

Figure 1 illustrates how poisoning attacks can inject malicious training data and alter the results of an AI model. In the scenario of an autonomous driving system, for example, an attacker may insert falsified data label pairs, substituting a speed limit sign for a stop sign. This trains the AI model with wrong decision boundaries, causing the vehicle to continue driving through an intersection instead of stopping.

As AI systems proliferate and gain more widespread adoption, safeguards to thwart data poisoning attacks and ensure decisions that reflect human values become more important. Enterprises that fail to effectively manage the risk may be putting themselves at a competitive disadvantage.

Data Poisoning Explained

Figure 2 illustrates adversary behaviors and techniques in a typical data poisoning attack scenario and the measures to counteract the attack.

Figure 2

Attack Motives
An AI poisoning attack is any intentional effort by an adversary to corrupt training data or manipulate AI algorithms. The adversary launching an attack could be an insider, such as a disgruntled current or former employee, or an external hacker whose motives could include the desire to cause reputational and brand harm, to tamper with the trustworthiness of AI decisions or to slow down and disrupt the AI-powered system. In some cases, a negligent user can inadvertently put the AI model at risk by accidentally injecting malicious data, perhaps collected from public sources, that affects the AI model’s behavior.

Attack Types
Advancements in AI have created new frontiers in business while rapidly altering the threat landscape. This widened attack surface attracts adversaries with sophisticated tools and attack methods. Key attack types related to data poisoning include:

  • Split-view poisoning—AI-powered systems, particularly generative AI-powered language models, utilize sizable data sets collected through large-scale web scraping. Split-view poisoning takes advantage of the change in the data snapshot between the time of collection and the time of training the AI model. An attacker who controls a range of domain names where the training image data is obtained can inject malicious content, resulting in future redownloads of poisoned data.
  • Label poisoning—Label-switching attacks focus on changing the labels of data points rather than the data points themselves.5 This technique aims to mislead machine learning models by assigning incorrect labels and misclassifying data, leading to erroneous model outcomes and decisions. For example, a malevolent actor infiltrates the training data set and manipulates the labels assigned to a subset of illegitimate emails, changing their classification from phishing to legitimate. After this surreptitious label manipulation, the AI model undergoes retraining, potentially assimilating the poisoned data set into its decision-making process.
  • Model inversion attacks—An attacker can exploit the AI model’s responses and reverse-engineer them using an inversion model to deduce personal information about the data subject the AI was trained on. For example, an attacker can use a data subject’s medical condition or biomarker as input to deduce the individual’s personal details.

Attack Intensity
An attacker’s knowledge of the classifiers used by AI-powered systems can accelerate the attack intensity and bypass the security defenses of AI-powered systems. Data poisoning attacks can be classified into white-box, gray-box and black-box attack types based on the extent of the attacker’s knowledge of the targeted AI-powered system:

  • White-box attacks—In these attacks, the adversary has knowledge of the AI-powered system, including the training data, the learning algorithm and the training parameters. With these insights, the adversary can launch a targeted worst-case scenario attack and corrupt the machine’s training process.
  • Gray-box attacks—In these attacks, the adversary has partial knowledge of the AI-powered system, including surrogate training data sampled from a similar distribution, the learning algorithm and parameters trained by a surrogate classifier, which enables the adversary to launch a more realistic attack scenario.
  • Black-box attacks—In these attacks, the adversary has minimal knowledge of the AI-powered system and can only query the target system using the model outputs (such as probability or confidence score).
Data poisoning attacks can be classified into white-box, gray-box and black-box attack types based on the extent of the attacker’s knowledge of the targeted AI-powered system.

Countermeasures
Data poisoning attacks are dynamic, and continuous monitoring is required to identify the imminent and emerging threats and develop the proper defenses against them. Countermeasures against data poisoning attacks can be employed to identify and remove malicious data that makes the model’s performance worse.6 Because it is exceedingly difficult to isolate and examine patterns in high-volume, high-velocity data, probabilistic or statistical defense techniques are recommended, including:

  • Outlier elimination—When carrying out a data poisoning attack, adversaries attempt to pollute or modify as few data points as possible and maximize their impact on classification. Therefore, it is reasonable to assume that the points altered by an adversary will be outliers in the data set.7 By eliminating outliers, it is possible to remove a considerable portion of the tampered data from the data set. Alternatively, the existing data set can be augmented to mitigate data poisoning. This proactive approach intentionally introduces controlled variations into the training data set, such as random rotations, translations and noise additions. Data augmentation has the benefit of expanding the data set while maintaining its inherent characteristics. In addition, the increase in the sample size dilutes any attempt to poison the data set.
  • Ensemble modeling—This process involves creating multiple diverse models to predict an outcome by using either many different modeling algorithms or different training data sets. The general principle is that models with slight differences are less likely to make the same errors, leading to better overall performance and security. There are several common approaches to ensemble modeling:8
    • Model averaging—Although AI models are capable of learning incredibly complex relationships between variables, they suffer from high levels of variance. The classifications made by any given model depend on the random initial weights assigned during training. These variations lead to different predictions, even among identical models trained on the same data set. To reduce the level of variance, the predictions of several models trained on the same data can be added up; then the class with the highest combined prediction score is selected, resulting in an ensemble prediction.
    • Weighted model averaging—One limitation of the model averaging method is that it considers each model to be equally adept at solving the problem. However, in many cases, there are both very good models and others that are less successful. To make the average reflect this, weights can be introduced for each model’s vote. These weights can be derived by evaluating all models against a separate training set and assigning them a weight based on their performance.
    • Model stacking—This method builds on weighted model averaging by using a higher-order model to generalize ensemble modeling. A set of m ensembled models will produce m classifications. A higher-order model trained on another data set attempts to learn how to use the ensembled classifications to predict the correct output. Using the trained higher-order model provides an effective and accurate way to combine multiple classifications and reduce variance.
  • Data partitioning—Data partitioning is the process of separating the training data into discrete subsets for the purpose of training and testing the learning models. The goal is to ensure that all models are not trained on the same data, thereby minimizing probability of data poisoning. There are two types of data partitioning:
    • K-fold cross-validation—This technique involves partitioning the data set into k equally sized subsets, or folds. These k different models are trained, with each fold serving as the validation set once; the remaining k-1 folds are used for training. Once all the models have been trained, their classifications can be combined through averaging to produce a cumulative prediction.
    • Random splits—A variation of the k-fold cross-validation method is to randomly split the data set into training data and validation data. Similar to the previous approach, this makes the model more resistant to data poisoning by varying training data and reducing the influence of poisoned data points. This approach may be preferable when it is suspected that large sequential portions of the data set are poisoned.
  • De-Pois method—The De-Pois method was designed to prevent data poisoning attacks of various types.9 It operates by creating a model that imitates the behavior of the original model and is trained with clean data. To obtain the clean data, the De-Pois method employs generative adversarial networks (GANs) to generate synthetic data that closely resembles clean training data. This synthetic data effectively increases the size of the training set and helps teach the mimic model how the original model should behave. Once the mimic model is trained, it can be utilized as a metric to evaluate new data. If the mimic model’s predictions differ significantly from those of the original model, this signals a potential data poisoning attempt and prevents the model from training on tampered data.
Data poisoning serves as a stark reminder of the vulnerabilities inherent in AI systems.

Conclusion

In today’s era of exponentially increasing amounts of data, AI-powered systems and ubiquitous digitalization, the risk is more real and significant than ever. Data poisoning serves as a stark reminder of the vulnerabilities inherent in AI systems. Therefore, the importance of vigilance and innovation in protecting data-driven systems cannot be overstated.

Although organizations are challenged to find new avenues of sensing and managing data poisoning risk, successful risk mitigation strategy starts with understanding the evolving threat landscape, understanding the ways that defenses of AI-powered systems can be compromised and applying safeguards such as outlier elimination, ensemble modeling, data partitioning and the De-Pois method to minimize the damage an attacker can inflict.10

Endnotes

1 Schwartz, R.; A. Vassilev; K. Greene; et al.; US National Institute of Standards and Technology (NIST) Special Publication 1270 Towards a Standard for Identifying and Managing Bias in Artificial Intelligence, NIST, USA, March 2022, http://www.nist.gov/publications/towards-standard-identifying-and-managing-bias-artificial-intelligence
2 Day, M.; “Microsoft Takes AI Bot Offline After Offensive Remarks,” The Seattle Times, 24 March 2016, http://www.seattletimes.com/business/microsoft/microsoft-takes-chatbot-offline-after-it-repeats-offensive-remarks/
3 Bursztein, E.; “Attacks Against Machine Learning— An Overview,” EliE, May 2018, http://elie.net/blog/ai/attacks-against-machine-learning-an-overview/
4 Sitawarin, C.; A. Bhagolji; A. Mosenia; et al.; “DARTS: Deceiving Autonomous Cars With Toxic Signs,” Cornell University arXiv, 31 May 2018, http://doi.org/10.48550/arXiv.1802.06430
5 Jebreel, N. ; J. Domingo-Ferrer; D. Sánchez; A. Blanco-Justicia; “Defending Against the Label- Flipping Attack in Federated Learning,” Cornell University arXiv, 5 July 2022, http://doi.org/10.48550/arXiv.2207.01982
6 Nelson, B.; M. Barreno; J. Fuching; et al.; “Exploiting Machine Learning to Subvert Your Spam Filter,” Proceedings of the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats, April 2008, http://dl.acm.org/doi/10.5555/1387709.1387716
7 Paudice, A.; L. Muñoz-González; A. Gyorgy; E. Lupu; “Detection of Adversarial Training Examples in Poisoning Attacks Through Anomaly Detection,” Cornell University arXiv, 2018, http://doi.org/10.48550/arXiv.1802.03041
8 Brownlee, J.; “Ensemble Learning Methods for Deep Learning Neural Networks,” Machine Learning Mastery, 6 August 2019, http://machinelearningmastery.com/ensemble-methods-for-deep-learning-neural-networks/
9 Chen, J.; X. Zhang; R. Zhang; et al.; “De-Pois: An Attack-Agnostic Defense Against Data Poisoning Attacks,” Cornell University arXiv, 2021, http://doi.org/10.48550/arXiv.2105.03592
10 ISACA, The Promise and Peril of the AI Revolution: Managing Risk, USA, 2023, http://yvi2.aksarayyeralticarsisi.com/resources/white-papers/the-promise-and-peril-of-the-ai-revolution

PRITHIV ROSHAN SUDHAKAR

Is a freelance developer with a passion for artificial intelligence (AI) and a focus on creating optimized and scalable models in the domains of computer vision, metaheuristic algorithms and generative AI. Sudhakar can be contacted at prithivskr@gmail.com.