Over a video call, Ganesh Birua excitedly flips the camera to show a tree sitting in front of the school. This is the only place in his entire village where his phone gets network service. When he is not at work, this is where he spends much of his time, obsessively pursuing his life’s calling: spreading the Adivasi language of Ho on the internet.
Birua’s face breaks into a grin when he is asked to describe his journey. He adjusts the oversized earpads of his black headphones sitting snug against his wavy hair and begins. The year was 2014. He was staying in a hostel in Baripada, a town 100 kilometres from his village of Dighiabeda in Odisha, doing an arts course. At the hostel, a mention of Facebook by a friend intrigued him. Nobody from his village had ever been on the internet, he says.
Birua created a profile and promptly joined the Facebook group Ho Society of India. From the group, he learned something nobody had told him before – that his tribe, Ho, has a script called Warang Citi. The hitch was, “I couldn’t see our script anywhere online,” the 23-year-old recounted in Hindi.
An inspired Birua took it upon himself to redress the problem. A child of farmers, he learned the script and created a Facebook page on which he posts Ho letters, along with Ho words and their closest translation in Odia and English.
For the first few years, the effort was barely noticed. On a good day, his post would get five or ten likes. But, usually, there was no response. “Even though nobody noticed what I was doing, I kept on going,” said Birua. In 2018, he stepped up his efforts and opened accounts on Twitter, YouTube and Instagram apart from starting a blog titled Elabu Etona Warang Citi (Let’s Learn Warang Citi).
The days of being an arts student were behind him but the passion sparked during those days was not lost. While working at a photo studio in Baripada, he began learning Bengali, Hindi and Santhali, all to translate Ho words into these languages on social media. Nothing else seemed to hold his interest. Asked to name his hobbies, he came up blank. But at least his work was beginning to pay dividends.
A student in Silicon Valley created a Braille script for Ho and sought Birua’s approval on Facebook in April 2021. Not much later, a Mexican American graphic designer made a Coca Cola can with a logo in Ho and shared it with Birua. The biggest encouragement, however, came in November when Birua’s dictionary project was discovered by Subhashish Panigrahi nearly 2,000 kilometres away in Bengaluru.
Panigrahi is a digital language researcher who has built a career by fostering digital integration of fading languages. When he came across Birua’s “digital activism”, he was working with a volunteer community on creating the building blocks of foundational technologies in Santhali and Ho (categorised as a vulnerable language by Unesco).
Panigrahi recognised that anyone wanting to evangelise about Ho on the internet had their work cut out. There were logistical obstacles in the path that couldn’t be nudged away. For instance, he explains, if you want to create a spell check for a language, you need a large word list, usually drawn from digitally published content. But, as Panigrahi found out when he began scraping data, “Birua’s handle and one other site were the only available primary sources for Ho.”
Panigrahi was nevertheless able to publish a list of 5,000 words. Now, he says, it is up to the “Ho community to build the repository over time”.
This meandering journey shows the struggle indigenous Indian languages face in making digital inroads. As per the last Census, there are as many as 19,500 mother tongues spoken in India, but on the internet, it is dominant languages like Hindi, Bengali and Telugu that reign supreme. At least 100 languages spoken by more than 10,000 speakers are languishing on the sidelines of the internet because of the absence of digital “tools and technologies”, says the Indian information technology ministry.
“I think it’s a chicken and egg problem,” said Kalika Bali, a researcher at Microsoft Research. “Is there enough content for there to be tools? And if you create the tools, who is going to use them? I think that’s a problem that most low-resourced, marginalised and endangered languages have. For a script to get accepted, the community [requires] not only activism but also resource generation.”
For languages like Ho, that resource generation has been mostly organic, with enthusiasts like Birua toiling selflessly to make word lists, fonts, keyboards and codes in the hope of leveraging their mother tongues into the online age.
Ho is spoken by a little over one million speakers, mostly belonging to the Ho tribe, in Odisha, Jharkhand and West Bengal. It falls under the Munda language family, which also includes Santhali, Mundari and Kurukh. Along with Vietnamese and Khasi, the Munda classification is part of the larger Austroasiatic family, unlike the majority of Indian languages, which are Dravidian or Indo-Aryan.
Asoka Kumar Sen has closely studied the Ho language and its roots for years. A retired professor in Chaibasa, Jharkhand, Sen says the Ho community descended from the highlands into Jharkhand’s Singhbum roughly 1,000 years ago, cleaving off from the rest of the Munda family into a distinct culture and language. For a millennium or so, Ho remained scriptless until a community leader named Lako Bodra designed Warang Citi (also spelt Varang Kshiti) in the 1950s.
The next big steps in Ho’s journey came relatively recently. On one hand, Ho intellectuals began strengthening the language base to seek its inclusion in the Eighth Schedule of the Constitution. And on the other, a linguist in Scotland launched a campaign to get Warang Citi script included in the international Unicode system.
The Unicode system is the great standardiser of the internet. By assigning a unique code to each character, it ensures that emojis, symbols for scientific notation and scripts used by languages look the same on all devices and in all countries. For a small language, being included in the Unicode Standard is important but not easy. The complexity of encoding a tongue requires communities to accede to the standardisation process, which is often led by an outsider technologist-linguist. And the final nod comes from the technology companies and governments that are members of the Unicode Consortium.
Michael Everson, a linguist and type encoder living in Scotland, recognised the significance of the Unicode Standard early on, when personal computers were still not common around the world. He first petitioned the Unicode Consortium on behalf of Ho in 1999, but the non-profit wasn’t convinced.
Several more pleas were filed over the years. In a proposal in 2009, Everson said, “Even if only 10% [of Ho speakers] were users of Warang Citi, that is still many times more people that most languages of the world have as their total number of speakers. The absence of Warang Chiti in the UCS [Universal Coded Character Set] presents a true barrier for Ho to enter the digital age with its own unique writing system intact.”
Everson’s wish was finally granted in 2014 with the script getting integrated into Unicode. But, in 2021, it was excluded because of “the absence of a modern native user community that would be able to use these scripts for useful mnemonic identifiers in a familiar language” and the “problematic and little understood nature of these scripts”.
Linguists had feared that something like this might happen. In 2007, linguists from Swarthmore College in Pennsylvania had warned that Everson’s proposal needed more “consultation” and “fact-checking” by the “user community”.
“As a practical and ethical matter, we urge the Unicode consortium to accept only proposals that emerge from or are formulated in close consultation with native speaker communities,” the Swarthmore College linguists wrote. “To do otherwise is to espouse a kind of linguistic colonialism that will only widen the digital divide.”
Since then, calls to decolonise the language technology world have become louder. Critics say the power structure views communities as commodities, easily exploited for data. But Bali, who has worked on bringing many languages online, including Mundari, says the process is not straightforward. “Everyone says the community should have more power,” she said, “but the community is not a homogenous entity.”
Bypassing these institutional hazards, individuals from the Ho community have independently made eager strides in pushing their language online. One of Birua’s happiest moments was when he met Mangu Purty, who gave him the ability to write in the script from his mobile instead of a computer.
Purty, who is doing a master’s in Chinese at Jawaharlal Nehru University in Delhi, is a member of the Ho tribe. The first time he learned about Unicode was in the Class XII computer science class in Chakradharpur, Jharkhand, in 2015. “I discovered that Warang Citi was included in Unicode, but the real problem was elsewhere,” said Purty, who spoke only in Ho till he was eight. “We still didn’t have a keyboard. I knew we had to make this, but I didn’t have any resources.”
When Purty came to Jawaharlal Nehru University for his bachelor’s degree, his identity sharpened. Students who heard he was from Jharkhand assumed he must be “similar to Biharis and speak Bhojpuri”. “Our identities are not reflected – that’s why I went about making this,” the 23-year-old said, referring to his mobile keyboard.
But first came the question of the font. “Creating a font is not easy,” said Purty. “I researched a lot about typography and learned that letters need a common width and height, a uniform space between them, and standard angles.” The older fonts for Warang Citi, he noticed, didn’t follow these rules.
In 2020, Google’s NOTO Project, which draws fonts for marginalised languages, released a version called Noto Sans Warang Citi. “It looked very crude,” said Purty. “And for some reason, next to Latin letters, the Warang Citi letters look quite small.” But with no other viable fonts available, Purty went ahead with the Google font and, taking the help of online resources, made a mobile keyboard that rolled out with the Android Version 11 operating system in September 2021. Until then, a keyboard was only accessible through a computer.
Birua was elated when Purty sent him the tool. “It’s very difficult to access a computer,” said Birua. “Now, my WhatsApp group of 50 Warang Citi writers can exclusively chat in our own script from our phones.”
In another domino event, Hercules Munda, who belongs to the Munda tribe and is studying linguistics at the School of Oriental and African Studies in London, developed an online app using Purty’s keyboard to create language games for Ho and other Munda languages. After the app went live, he found that many of its users were Adivasi youth whose parents had left their villages to raise their children in urban centres.
“Our languages need to be integrated into technology,” said Munda. “Saying that we won’t adopt technology won’t work. I believe we can be indigenous while progressing into the modern world.”
Purty still wishes for more. He is working on standarding the language, which, he realises, won’t be easy. Ho, unlike Santhali and Hindi, doesn’t have aspirated consonants (like ka versus kha). Aspirated letters, he says, sound “unnatural” when spoken in Ho.
“When Bengali, Odia, Santhali borrow words from Hindi to fill in for the lack of terminology, it works because they have a similar sound system,” said Purty. “But we can’t pronounce them properly.” For example, a word like adhikar (rights) doesn’t have an equivalent in Ho and would be pronounced adikar.
“When I speak, I have no choice, so I mix Hindi,” Purty said. “But if I want to write content online, like my blog, I find it difficult to express many things because we don’t have the terminology, and foreign words sound unnatural.”
Like other Ho writers online, Purty has started creating new words after poring through Ho lexicon in the 1950 book Encyclopaedia Mundarica. For inspiration, he often looks to other tongues. For instance, he found that the word for government in Urdu, hukumat, comes from the word for order, hukm. So, when thinking up the word for government in Ho, he reached for the word for order (achu). ‘To hold’ in Ho is ‘sab’, and holding, managing or controlling is sanab (Ho uses infixes).
“Achusanab can be government or administration in Ho,” said Purty. “This is my personal creation. Unless it is accepted by everyone, it won’t be an established word.” Still, Purty believes it’s a start that hopefully can take Ho to the stage its cousin Santhali has reached online.
Santhali, spoken by over seven million people and recognised in the Eighth Schedule of the Constitution, has a repertoire of textbooks, news portals and television channels. It was accepted into the Unicode system in 2008 and got its own Wikipedia in 2018. In contrast, Ho’s Wikipedia remains in incubation.
Nishaant Choksi, an academic who studies Santhali, says the language’s increased reach online led to a proliferation in publications in the offline world. Those who had to earlier travel to a city like, say, Jamshedpur to find a printing press with Ol Chiki script could use the digital font and find anyone nearby with computer training to print publications.
“The films also were somewhat homogenous earlier,” said Choksi. “But when people started filming locally, you saw more Santhali identity with local signs, styles and festivals. It gave another layer of heterogeneity as the technology became available.”
Ho is some way off from this dynamism. The next big thing Birua hopes for is an online search in Ho. “We get everything from search nowadays,” he said. “We should be able to search in Warang Citi so that my people can learn whatever they want through the internet.”
Karishma Mehrotra is an independent journalist. She is a Kalpalata Fellow for Technology Writings for 2021.