How a new language code is being used online to beat the algorithm

A new language code is being used online to dodge alorithms that spot violence and hate speech. Can developers catch up?

How a new language code is being used online to beat the algorithm

Manav Mishra believes that algospeak terms are only used to either say something problematic, fun or to be sarcastic. Pic/Ajay Kumar

Did you hear about “leg booty”, “seggs”, “cornucopia” or “being unalive” during the “panoramic” or “panini press?” This is not gibberish, bizarre though these words may be, and the sentence itself, incomprehensible. The literal translation is: Did you hear about LGBTQ, sex, homophobia or being dead during the pandemic?

But why not say it out straight?

Algospeak, derived from ‘algorithm’ and ‘speak’, is the mangled spelling of words that would otherwise raise red flags on the Internet and risk censorship of content. Using a code of symbols, emojis and words that sound alike, these words evade algorithms that filter content.

Arunima J

“In 2017-18,” says 21-year-old Manas Mishra, “when I joined Twitter, I observed people tweaking words to say something problematic or make fun of someone. You can’t directly tell someone to go ‘kill yourself’ as a joke; so instead you say ‘kys’ or ‘K!ll yourself’.”

While Algospeak is seen more on TikTok globally, in India, Twitter and Instagram speak it primarily. A cyber-security consultant, who plays an integral part in developing Artificial Intelligence (AI) and algorithms to detect emotions on social media, explains, “Algospeak is mainly used by content creators who want to reach a larger audience. Take for instance a gamer, who wants to say ‘I killed that guy’ or ‘look at this headshot’. To the AI, this would be promoting violence.”

Some outliers are sticking to their grammar, such as Bengaluru-based illustrator Akshara Ashok. She makes comics supporting body positivity and sex education, and doesn’t shy away from writing sex, penis, vagina, vulva as the dictionary intended, instead of “seggs,” “$ex,” or “S3x” like her peers. “My whole stand is that I won’t censor anything,” she says. Ashok, who has over two lakh followers on Instagram, admits that her posts have been removed and she has been shadow-banned, but she has “stopped caring how many people see my work”. “Many people don’t understand the difference between sex education and erotica,” she shrugs. Conversely, Mansi Pandey uses “seggs” instead of sex while texting because it’s cute and less awkward. “It makes the conversation lighthearted,” says the 22-year-old social media marketer and content creator.

Mansi Pandey, Akshara Ashok, Yash Agrawal and Brijesh Singh

Mishra, who just graduated in mass media, tweaks words when he wants to have fun or say something sarcastic without being direct. He replaces ‘a’ with “@” or ‘I’ with “!”. “People only use such words either in a problematic context or while being light-hearted. On Discord [a social media platform for communication and streaming music], many words are banned,” he says, “so people use Algospeak when a new person joins and it is usually taken in good spirit.” The words also come in handy while sharing spoilers. “If I want to comment on a trending topic, but don’t want my tweets to be grouped together [and highlighted under hashtags] to avoid being targeted by fan groups, I use keywords and symbols.” For example, when fan accounts dedicated to Korean boy band BTS began reporting accounts that tweeted negatively, Mishra “would type BT$”.

Juhu-based Arunima J replaces the words ‘suicide’ and ‘kill myself’ with “Sudoku” and “kms” on Twitter. “Sad or depressed people vent on Twitter without worrying about judgement,” she explains. “But they can be reported out of concern. Sudoku wouldn’t ring a bell, and they would continue scrolling.”

Yash Agrawal, a data scientist at Jio, believes this hack is short-lived. “If programmers are smart enough to build such astute algorithms, they are smart enough to detect these changes,” he says. Agrawal helps build fundamental NLP (Natural Language Processing) models, a branch of computer science and AI that gives computers the ability to understand text and spoken words the way humans do. It’s simply a matter of adding these coded words to the dataset, he says. “At the border of evolution, these models may fail, but as more and more people start using such words, they will become mainstream.”

In fact, he says that lately, a lot of people are working on models where AI considers the context as well, and not a particular word to define hate speech. It’s called sentence modeling, and the AI also quantifies the probability saying it has a high probable percentage of being hate speech, or a lower percentage. “So if it sees k!ll, it won’t label it as hate speech, but admit a much smaller percentage of confidence that it isn’t,” he says. “The systems are in tandem with a human who tracks these unique terms. The framework is called active learning, and a human can feed ‘k!ll’ into the data set to train the model to pick at it.”

Hate mongering online is the elemental crux of this issue, and screenwriter of Scam 1992 and Inside Edge, Vaibhav Vishal, has been on the receiving end. He believes that virulent elements adapt vernacular abuses to use them effectively. “Whenever I write something against mainstream ideologies, the hatred is spewed in the form of these random spellings that cannot be reported.” He holds that moderation isn’t working at all. “AI and the algorithm have to be smarter,” he says. “Developers have to be one step ahead.”

These words mask dark and illegal intentions too. “Cheese pizza” are accounts that trade in pornographic pictures of children, “touch the ceiling” is coaxing young girls to share explicitly sexual photographs. “PDF file” stands for paedophile, and the corn emoji or “pron” means porn. Additional Director General of Police Brijesh Singh agrees that such terms slip under the radar, and honest conversation between parents and children can help. “Anything illegal offline is illegal online as well,” he reminds us, and says methods beyond keyword and symbol detection will have to be enlisted. “It is time to build appropriate safe protocols for Internet and social media usage.”

A senior cyber security consultant believes that there is no real way to clean up the ever-evolving Internet. “Policy makers should draft censorship and moderation policies such that they allow both pro and anti factions to express themselves freely,” he says.

Sex
$ex
Seggs
S3x

Dead
De@d Unalive

Suicide
Sudoku

Lesbian
Le$bian