Vodka is good, but the meat is rotten?, that?s the machine translated version of the Biblical saying, ?the spirit is willing, but the flesh is weak? in Russian. This is an old joke, but it drives home the difficulties in translation. Complexities of human languages are still a hard nut for the software. But that?s hardly a deterrent, as India?s Internet gets ready to touch the ?next billion?.

The Centre for Development of Advanced Computing (C-DAC) already has a repertoire of software for 12 Indian languages?Assamese, Bengali, Gujarati, Hindi, Kannada, Marathi, Malayalam, Oriya, Punjabi, Sanskrit, Tamil and Telugu. With English remaining the dominant language, adoption of regional language is emerging as one of the strongest levers to widen the Internet user base. The aim is to expand the access to over five billon people who have no access to the Internet at all.

Internet gurus at the United Nations-sponsored gathering on Internet governance issues wrapped up in Hyderabad last week seemed more or less unanimous on the need for an Indian touch to the multilingual Interent. Work has been on ever since C-DAC launched its multilingual advanced news automation system (Manas). Indigenously developed in collaboration with the department of information technology and Doordarshan, it was the first of its kind news automation system developed in India. N Ravi Shanker, joint secretary, department of IT, ministry of IT, confirms, ?The role of Internet can be widened only if the linguistic barriers can be overcome. In fact, we are working on linguistics data resources, content creation, language processing tools, optical character recognition, text speech recognition and machine translation of Indian languages is being carried out. Besides, we are also working to enable country code top level domain (cc TLDs) and operationalisation of .in local languages.?

Challenges, however, are not easy to surmount. As Ajit Balakrishnan, CEO, Rediff.com puts it, Internet has created a revolution to bridge the digital divide. However, there are challenges when it comes to implementation, such as the translation tools. While there is a great diversity of content, there are still cultural issues, he adds.

To begin with, there is no unanimity on what ?local? means? John Klensin, an independent consultant, questions, ?Does it mean localisation to a country, a village or cultural group? Will a multi-lingual Internet become international and multi-cultural?? Right from the domain names to URLs and conversion gateways, interoperability with multi-lingual systems needs local context and content with proper navigation systems, he says. However, there is a need to avoid politicising the process through claims of sovereignty over languages, culture and scripts.

According to the United Nations department of public information, there were about 13.5 million Internet subscribers in the country representing 1.15 per 100 habitants in 2007 and broadband subscribers accounted for over 3.1 million, but still 81 million people did not have online access. Globally too, only about 1.7% of world?s population had access to the Internet.

Lynn St Amour, CEO, Geneva-based Internet Society, an independent not-for-profit organisation set up by Internet pioneers Vint Cerf and Bob Kahn expresses that to encourage broader participation, the Internet has to embrace openness, enable creativity and empower community.

The trigger was from Manas to revolutionise the concept of multi-lingual automated news production in India. After Manas, C-DAC has over the years evolved the graphics and intelligence-based script technology (GIST) to extend the benefits of IT to diversified multilingual population of India. GIST facilitates the use of Indian languages in IT and has led to proliferation of the use of computers and their applications in all major Indian languages with a remarkable increase in the user ratio.

Explains Patrik F?ltstr?m, member of the board, Internet Society, India, like other territories which use multiple languages, has to work hard on supporting all of them. The challenges are the same as for territories that use one language only, but of course more complicated. India?s problems are not limited to the myriad of languages, but also many scripts. Scripts used are both left-to-right and right-to-left. Creating applications that can be used in all scripts and all languages is hard.

But the basic question still remains? In a truly global Internet world, how much localisation can happen? Can language and culture inter-operate? Work is currently being done in three specific areas: creation of content (including translation to all languages), creation of localised tools that can be used for managing this content, and of course, the implementation of international top level domains for the .in top level domain. However, a lot of work has to be done as most of the work is still focusing in the major languages in the world, specifically the official six UN languages, he says.

?Most work is done by individuals that have interest in their own local language and culture. They translate open source software and tools. Some examples are?the Linux operating system in Kiswahili, content management systems like Wikipedia in Hindi, or the Internet Corporation for Assigned Names and Numbers (Icann) initiative on internationalised domain names.

Another aspect for multi-lingual Internet is character coding. Character coding, like the technical standards used, must be truly internationalised. Global standards must work for all scripts and all languages. Local character coding systems and local standards create local solutions.

The more local content can be created in the local language and local script, the more interest will be created for use of the Internet. Using Internet might be of no interest for people who do not understand English. And as multilingualism in many cases is more common in urban areas, localisation helps rural areas more than urban areas and helps bridging the divide, says Patrik.

?At the same time, one should not mix IT policy issues (localisation of content, etc) with regional politics. In many cases, the divide among urban and rural areas is the same for the Internet as for roads, trains, and healthcare. It is more an economical question than a question that is pure IT related,? he adds.

Some of the concerted efforts for developing multi-lingual Internet have been taken up by Microsoft Research, Google, Rediff, and many more apart from telecom providers. Microsoft Research is exploring the issues in automatic translation of text between English and Indian languages, using statistical machine translation technologies.

The information and models inferred from large monolingual, comparable and parallel multilingual corpora are used to translate new text accurately and intuitively, for building practical and scalable systems. An instant messenger platform is also being developed.

The multi-lingual systems (MLS) group in Microsoft Research India focuses on research that develops a true natural language-neutral approach in all aspects of linguistic computing. These include language computing systems-related issues, such as technologies for multi-lingual information interfaces, organisation and access, information retrieval, etc, and, computational linguistics-related issues, such as, language understanding, summarisation, translation, cross-lingual searches, etc.

Another example is from the systems development laboratory, IIT-Madras. It has developed a software base for the development of multi-lingual applications with local content involved in ICT-based projects. But developers are confident of offering this localised content on the global Internet soon.