Attacking State-of-the-art NLP Systems

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

The continuing advancements in artificial intelligence have enabled us to push innovation to new heights. In this blog, we will discuss one such AI-driven technology that is being rapidly employed across all verticals- Natural Language Processing and the vulnerabilities it posses.

Let’s start with the basics

Natural language processing or NLP is a technology that is powered by the combination of computational linguistics and artificial intelligence to make human language comprehensible to machines. It basically aims to provide seamless communication between humans and machines. 

NLP is underpinned by Machine Learning which amps up the model’s learning and processing capabilities so that it can perform tasks that once required human intelligence to execute. Moreover, ML algorithms have become highly capable of performing text-related tasks owing to headways in the sphere of deep learning. Despite its freeform nature, the state-of-the-art NLP systems are used in a myriad of applications like chatbots, spam-filters, voice assistants and the list goes on. 

Attacks on NLP models

It is quite evident that NLP has already established its indispensable position in the current digital revolution. But at the same time, we can not overlook the precipitous rise in the number of text-based attacks in recent years. Malicious intent actors are expertly utilizing the limitations of NLP models to craft attacks and manipulate the system. There are several research papers and open-source tools openly available that portray the unique ways in which NLP models can be attacked.

Minute changes to the inputs of NLP models can drive them to change their behavior. For instance, applying small changes to the text of an email might cause a spam classifier to label it falsely. Slight changes in the text or noise in the background that is sometimes indecipherable to humans can confuse NLP models and make them statistically confident enough to classify it incorrectly. This also emphasizes the differences between AI and human intelligence. These small changes, for the most part, are non-intentional but that does not mean that this characteristic of NLP models will not be exploited by a hacker. State-of-the-art NLP systems, like other conventional systems, can also be embedded with backdoors. But unlike conventional applications, a neural network model only consists of a set of parameters, without a source code making it highly difficult to detect. One of the recent examples of such filter deception in NLP- is the recent glitch detected in Gmail email filters.

Data poisoning by bad quality samples or by injecting carefully crafted samples in training data is another way malicious intent actors can take hold of the system. To explain this better, let’s take the infamous example of Microsoft’s AI chatbot. Tay was an experimental chatbot released via twitter originally meant to resemble the language patterns of a 19-year old American girl. But due to the use of anonymized public data as training material, Tay quickly turned into a racist, misogynistic, and anti-semitic speech spouting machine in less than a day. 

Re-purposing old models has become one of the latest trends in the NLP domain but many are unaware of the security implications of fine-tuning preexisting models. In this case, an adversary can input nonsensical randomly-sampled sequences of words to a model in hopes of compromising the model and then reverse-engineer their own copy-model based on the labels predicted by the compromised one. Not only is this a theft of intellectual property but the stolen copy could then be used to craft attacks in the future.

NLP-based models have infiltered every corner of our existence. but we should not undervalue the possibility of these very systems to be compromised and turned against us. Scanta is constantly innovating and researching new approaches to mitigate attacks on ML-based models. Our featured product, VA Shield, a chatbot security solution, is a testament to that research. VA Shield analyzes requests, responses, and conversations to and from the system to provide an enhanced layer of supervision. Companies can put customizable policies in place to guard against online trolls or check malicious use of language and prevent compromised models from exposing confidential data.

Closing thoughts

NLP is a fairly novel domain and not much is understood about how these algorithms work. This makes it challenging, if not currently impossible, to detect if an NLP model has been compromised, or just not performing well. As these systems continue to get integrated with more products and services, the concern about the cyber-security threats it yields is also increasing. These types of AI-driven technologies will foster a new era of cyber attacks and we need to act before it is too late.