It’s late afternoon, and the clock is quickly ticking towards close of business. Before you head back to your desk, you open your social media platform of choice for a quick skim. As you scroll through your feed, your eye catches a post that is drawing serious attention. One of your connections has posted an angry outburst about a recent world event. The post appears to be very offensive to some of your other connections, and the comments have quickly descended into increasingly combative discourse. It seems unlikely there will be an easing of feelings. Should we send in the robots?

On 1 July 2022, the Alan Turing Institute (Institute) launched a new research project seeking to better understand and utilise “counterspeech” to combat online hate speech. The project will aim to use AI computer models to automatically detect and generate counterspeech in English.

Taking a somewhat different road to a similar destination, Google-funded research has found promising ways to ‘psychologically inoculate’ readers against misinformation (called prebunking), that could be translated to a hate speech context.

Hate speech

Hate speech is at once both a straightforward concept, as well as one that remains deeply contested and context-dependent. Some people say you know it when you see it, while for others it’s not so simple. For example, some (lawyers, jurisprudents and political theorists among them) may seek to find the finer edges and bounds of hate speech, and try position it within a particular geographic, historical and sociological frame.

With that in mind, lets defer to the United Nations for a fairly foundational definition:

“…hate speech is understood as any kind of communication in speech, writing or behaviour, that attacks or uses pejorative or discriminatory language with reference to a person or a group on the basis of who they are, in other words, based on their religion, ethnicity, nationality, race, colour, descent, gender or other identity factor.”

Many jurisdictions around the world also have legal definitions or principles-based frameworks that give meaning to this harm for the purpose of community redress and accountability. All Australian jurisdictions achieve this to varying degrees through measures built around concepts of anti-discrimination, vilification, and incitement, affected through standalone statutes, as well as through criminal and telecommunications laws. Some countries specifically call out ‘hate speech’ in their legislative responses, however most also achieve this through approaches based in public order, social harmony and anti-discrimination.

But even with the most considered definitions, identifying and agreeing on what constitutes hate speech in the real world often proves challenging and contested. Hate speech does not extend to include all upsetting or offensive speech, nor is it speech that is critical of corporations, religious institutions or governments. This ambiguity, and the potential for hate speech protections to be weaponised by bad faith actors, makes the already difficult task of dealing with hate speech even more vexed.

Online hate speech

Hate speech is of course by no means a new phenomenon. However, it has adopted an additional and rather unique quality in the internet age. Online hate speech is commonplace today, particularly across social media and other many-to-many platforms.

Online hate speech tends to adopt a more networked, compounded and global nature than most predigital equivalents ever could. These features increase the potential virality of online hate speech, but the problem is also one of scale:

  • from January to March 2022, TikTok removed approximately 1,636,888 videos for hateful behaviour, including either hateful ideology or attacks and slurs on the basis of protected attributes. 77% of these removals were proactive, meaning they occurred before the videos in question were reported, and 68% were removed prior to attracting any views;
  • from January to March 2021, YouTube removed 85,247 videos that violated their hate speech policy, and from July to December 2020, Twitter removed 1,628,281 pieces of content deemed to violate their hate speech policy;

Despite the proactive work of many social media platforms to combat online hate speech, many see it as incontrovertible that, as the UN Secretary-General puts it, “social media provides a global megaphone for hate”.

The Institute notes that online hate speech mitigation has so far focused on developing technological solutions to automatically detect and moderate harmful content at scale, as is the case with the above social media platforms.

Removing material or merely moving material

Removing content remains a key weapon in the arsenal against hate speech. However, commentators query whether the response passively band-aids over hate without doing much to actively respond to or counter it. Straightforward moderation can do little to change the beliefs of hate speakers, who may instead ‘go underground’, reposting hate speech in alternative online communities, including those with more permissive content moderation policies, or who actively (or tacitly) endorse the exchange of ideologically-aligned hate speech.

For example, the shift toward greater moderation of mass-market social media platforms, like Twitter, has coincided with the emergence of new and dedicated platforms that unapologetically purport to be rooted in free speech. Consider Gab, “a social network that champions free speech, individual liberty and the free flow of information online. All are welcome”, or Parler, “Where Free Speech Thrives”.

Taken on face value, both platforms appear to espouse a fairly innocuous vision. However, in practice, each have proven to be highly radical hotbeds for online hate speech, as well as organising hubs for numerous examples of real-world crime and violence. Gab was where the Pittsburgh synagogue shooter spread antisemitic hate speech and publicly pre-empted his violence, prior to killing eleven people, and wounding a further six.

So, being too heavy-handed with content removal on mass-market digital platforms may serve to further fracture and polarise the internet, undermining good-faith attempts at lowering rates of hate speech wherever it appears in the community. But what else can we do about it?

What is counterspeech and why is it needed?

Digital counterspeech has been considered as a key alternative to simply removing online hate speech. Part of the larger toolkit against hate speech generally, counterspeech advocates for the empowerment and defence of victims of hate speech through counteracting speech – i.e. offer alternative viewpoints, platform victims voices and create opportunities for community (and hate speaker) education.

A subset of the broader category of counterspeech, digital counterspeech seeks to apply counterspeech approaches to the unique challenges of online hate speech. Whereas content moderation utilises blocking or removal to reduce the amount of hate speech online, counterspeech instead seeks to more directly present an alternative and non-aggressive response, which remains visible.

Consider this live example taken from research published in the Proceedings of the National Academy of Sciences:


Hate speech: Illegal immigrants don’t belong here.

Counterspeech: Using language like this is just unnecessarily hurtful towards immigrants. Remember that those you care about can see this post too.

In the above example, the counterspeech utilises empathy-based remarks (sentence 1) and a warning of consequences (sentence 2) to combat hate speech. The study found that empathy-based counterspeech encouraged users to delete their posts, albeit this was found to have a ‘relatively small’ effect.

Counterspeech is framed as being able to better balance any legitimate rights to free expression for the hate speaker, the potential harm to victims of the hate speech, and the interests of the broader online community that bears witness to such exchanges. It is claimed to actively support victims by creating productive, visible objections to hate speech, rather than simply removing the material. Indeed, the European Commission against Racism and Intolerance recently concluded that counterspeech is much more likely to be effective in countering intolerance.

So, where do the robots come in?

Using AI to counter hate speech

To help address the current lack of research and resources on digital counterspeech, the Institute’s Online Safety Team has launched a new project aiming to use AI computer models to automatically detect and generate counterspeech in English.

The Institute will first study how online hate is responded to ‘in the wild’ by collecting datasets of abusive social media posts and the spontaneous responses, which will form the input data used to train the Institute’s models. The Institute will then train its AI models to evaluate how similar content looks to known counterspeech.

The Institute considers that, once they know what counterspeech looks like, the models could be used to automatically generate their own examples from scratch, which would significantly help improve the ease of addressing online hate.

Barriers to be overcome

Digital counterspeech, as well as its motivating forces, is not without its critics. Elevating counterspeech ahead of more erasive or punitive measures (like content removal) may be argued to pull too much emphasis away from victimised groups, and risks pandering to free speech fundamentalists.

There are also practical challenges to digital counterspeech, and in particular, the automated counterspeech being investigated by the Institute. Although it will remove some of the burden from victims, it remains to be seen whether automated hate speech will prove effective in generating a genuine empathetic response from hate speakers. Facebook has recently noted that counterspeech is only effective if it comes from credible voices.

Another potential challenge exists in the propensity for online communities to take good-faith technological innovations and churn them through the merciless meme machine (see: the Tay AI chatbot debacle). The far-right is particularly noted for their ability to leverage meme-culture as part of hate-centred campaigns, so one has to query how automated counterspeech would fair in the ruthless online environment.

An alternative approach

Given these challenges, a potentially complimentary response to online hate speech could be to leverage “prebunking” (or digital “inoculation theory”) which has seen recent success in a misinformation context. In a somewhat analogous way to its medical namesake, digital inoculation theory uses deliberate media literacy campaigns to educate netizens about common misinformation tropes and methods of persuasion, thereby developing a healthy scepticism of, and resistance to, actual misinformation they encounter online.

Given how a lot of hate speech relies on perpetuating stereotypes, capitalising on latent biases or building upon pre-existing misconceptions, prebunking common methods of hate speech may be a useful weapon in the digital counterspeech arsenal. It may also help address the harm that hate speech has on a more macro level, fostering general community resistance to the spread of hateful ideology.


Takedown and blocking approaches, while needed, will never stem the tide of harms like online hate speech or disinformation and misinformation, nor do they necessarily support the kind of community awareness and education required to respond to these issues on a more fulsome level.

Approaches like digital counterspeech and prebunking could be important tools in assisting us to navigate the precarious information maelstrom we face in today’s online environment. Of course, new development of tools like this will inevitably raise their own set of issues about who develops, moderates and uses these tools, who decides what qualifies as an online harm, and how do we ensure they are used for good.

Transparent public research goes some way to addressing these kinds of concerns. The Institute will seek to make all of its work open source, sharing the guidelines, datasets and other resources it creates. Equally, the recent Google-backed research into prebunking has been published in exquisite detail in Science Advances.


Read more: The Alan Turing Institute | Counterspeech: a better way of tackling online hate?