Researchers develop new way to help machine-generated language systems reduce toxic language

A young child’s ability to acquire language is nothing short of astonishing. But occasionally a toddler will pick up an inappropriate word, often with no comprehension of its meaning. Fortunately, with the right guidance, the child can be steered away from using the offensive term.

A similar situation can occur with natural language generation (NLG) in machine learning, even though the process is very different from how people use words. NLG systems like OpenAI’s GPT-3 language model are neural networks that have been pre-trained on a corpus — an enormous collection of writings. Using deep learning methods, the model is used to automatically create human-like text from a simple input prompt. The results can be incredibly realistic and at times difficult to discern from something written by an actual person.

Unfortunately, the approach also frequently leads to toxic language generation, making it difficult to trust such systems for automated business uses. Like the young child, the system doesn’t understand the words it’s using; it only knows that people have used them in a previous similar context.

Now researchers at the Allen Institute for Artificial Intelligence (AI2) and University of Washington have developed a novel way to guide these machine learning systems, including reducing their potential for toxic language.

“DExperts: On-the-Fly Controlled Text Generation with Experts and Anti-Experts” is a new paper by a group of researchers working on this problem. The team wanted to understand if they could control attributes of text generated by language models (LMs) at decoding time, without sacrificing fluency or diversity. The result was an approach that uses two smaller LMs that model text having desirable and undesirable attributes in order to “steer” larger LMs like GPT-3.

“These fine-tuning based methods are modifying the original language model themselves. So, at decoding time they can generate from the model directly,” said Alisa Liu, lead author on the paper. “We’re not modifying the original language model at all. Instead, we’re fine tuning these smaller experts.”

The team’s approach builds on a traditional machine learning technique known as a “product of experts,” which allows a series of simpler outputs to be combined to determine the output of a larger, more complex system. This method allows each smaller model to specialize in analyzing one particular aspect of the problem.

Instead of using only experts, the DExperts team added an anti-expert to the mix. The study’s researchers believe theirs may be the first use of a combination of expert and anti-expert LMs, two contrasting language models that have been fine-tuned on certain domain-specific attribute data. By combining these, the domain-specific signal can be cancelled out while still steering the target LM toward a certain sentiment or away from a result, such as toxic language.

This approach occurs at decoding time, which has a number of advantages. It isn’t possible for end users to download enormous datasets like GPT-2 and GPT-3 to run on their own computer or device. Instead, these deep learning language models must operate on large computer clusters and are typically accessed via an API (application programming interface). Because of this, the large LM can’t be altered directly, which is where the smaller LMs come in.

“We’re operating in this space of language models that are so big that we can’t even run them ourselves,” said Maarten Sap, one of the paper’s authors. “And yet surprisingly, our method still works on that big of a model. So, in using smaller experts, we can actually steer a model like GPT-3, which is really cool.”

According to the researchers, DExperts outperforms existing sentiment steering methods, as well as existing detoxification methods. Though the study only explored steering toward or away from a single attribute, the framework is general enough to extend to a number of experts and anti-experts. Presumably, this would allow it to benefit further from the multi-attribute “product of experts” method that is widely used in machine learning.

While numerous businesses are currently developing products that utilize GPT-3, the potential for inaccurate or inappropriate output still makes it challenging to rely on these for dependable results, especially in certain commercial settings. Improvements like DExperts could lead to much greater trust and utility when using these models. From drafting a letter or proposal to automating movie and book reviews to building virtual assistants, being able to more purposefully direct natural language generation can only benefit its many applications.

But though developments like DExperts will probably lead to many new advances and benefits, there is also a potential for misuse that shouldn’t be ignored. The researchers acknowledge the potential exists for their methods to be used to automatically generate hateful or extremist texts. As automated natural language generation becomes more powerful, it could also be used by scammers and hackers to manipulate the unwary. Text-based chatbots and AI-driven email scams have already become increasingly prevalent in recent years. Extending such capabilities to more sophisticated interactions, including synthesized voice communications isn’t far behind.

These concerns aren’t novel. Since every new technology results in unexpected uses and unanticipated consequences, it’s good to think about how safeguards might be incorporated early in the development cycle.

How might natural language generation become more resilient and reliable in the future? Looking ahead, we could see a method like neural algorithmic reasoning play a role. Recently described in a paper by researchers at DeepMind, this approach fuses neural networks like these language models with algorithmic, rule-based computation to build a more dependable reasoning pipeline that benefits from the strengths of both.

DExperts’ ability to steer powerful LMs like GPT-3 at decoding time could have huge potential for businesses and consumers, automating many repetitive administrative tasks and simplifying everyday routines in our daily lives. It also has the potential to make these applications more environmentally friendly.

“Because this approach works at generation time and you’re not retraining the entire model, you’re doing a lot less compute,” Sap noted. “So, it reduces the carbon footprint and is in the spirit of green AI, which is something that we’re also really interested in at AI2.”

Researchers develop new way to help machine-generated language systems reduce toxic language

Most Popular on GeekWire

Job Listings on GeekWork

Related Stories

Most Popular on GeekWire

Job Listings on GeekWork