The Hindi Database
The Hindi Database project is the first database for Hindi, the third most spoken language in the world. It will contain a huge amount of Hindi texts for scientific research.
About the project
The Hindi Database is a resource and a tool for research on Hindi for many different scientific approaches like linguistics or literary studies. It can also be used for information retrieval. Texts are selected from all fields of written Hindi, and the database contains texts from the beginnings of modern Hindi till the present time.
So far, texts had to be uploaded manually in the Hindi office in New Delhi, but from now onwards the texts will be uploaded with the help of an Optical Character Recognition tool which will substantially accelerate this process.
Moreover, also electronic texts from the Internet will be uploaded, concentrating on newspaper texts. The goal is to set up an archive for the Hindi newspaper Navbharat Times. Text database: salCORPORA.
Once enough texts have been uploaded and all research tools deployed, the first research project will be the setting up of a Hindi Internet grammar in collaboration with the Jawahar Lal Nehru University in New Delhi and the University of Texas in Austin.