Under Resource Languages

SPECIAL SESSION ON SPEECH PROCESSING AND SPEECH TECHNOLOGY FOR UNDER RESOURCED LANGUAGES

Session chairs

About the session

Are you interested in speech processing for under-resourced languages? Do you want to contribute to the development of speech-based applications that can benefit speakers of under-resourced languages? If yes, then consider submitting a paper to the Special Session on Speech Processing and Speech Technology for Under-Resourced Languages at SPECOM2023!

This session will focus on the challenges and opportunities of developing speech and NLP-based applications for languages with either low or almost non-existential resources, such as annotated corpora, lexicons, etc. India has thousands of spoken languages used daily by millions of people; however, a few languages, such as Hindi, Tamil, Bengali, etc, have proper resources to build speech and language technology. Most of the spoken languages which are termed as under-resourced such as Konkani from Goa, Tulu from Karnataka, Chhattisgarhi from Chhattisgarh, Mising from Assam, Ao from Nagaland, etc., do not have any properly managed/open-sourced speech and linguistic resources for research and development. The lack of resources poses significant barriers in developing and evaluating speech and language technology for those speakers and getting connected globally. Keeping this in mind, the proposed session addresses the underlying challenges and possibilities in speech-language technologies.

This session aims to showcase the latest research and innovations in speech processing for under-resourced languages and foster collaboration and exchange of ideas among researchers working on this topic. Authors are open to submitting any other original works that address any aspect of speech processing for under-resourced languages, such as:

Speech and linguistic data collection and annotation methods for under-resourced languages
Speech recognition and synthesis for under-resourced languages
Technology development with under-resourced linguistic resources
Adaptation and transfer learning techniques
Applications of self-supervised learning-based pretraining for under-resourced languages
Cross-lingual training techniques
Data augmentation approaches for under-resourced languages
Low-resource speech translation and spoken language technology
Ethical and social issues in speech processing for under-resourced languages

To promote building speech technologies for under-resourced languages, we will be sharing new TTS and ASR corpora in Chattisgarhi, the official language in the state of Chattisgarh in Central India. Work done with these corpora should be submitted in this special session only.

Chhattisgarhi ASR and TTS datasets details

Chhattisgarhi ASR and TTS datasets are collected and curated in IISc Bangalore under the RESPIN project and SYSPIN project respectively.

ASR

Language: Chhattisgarhi
Train set: 238.85hr
Dev set: 4.03hr
Test set: 4.01hr
Links for downloading:
- - Train set: https://tinyurl.com/3shwu2dc
  - Dev set: https://tinyurl.com/5ye4kcdx
  - Test set: https://tinyurl.com/4974kkyb
Please cite reference [1] (see Reference section below) if you are using this dataset

TTS

Language: Chhattisgarhi
10 hours of TTS data from one male and 10 hours of TTS data from one female (may have audio-text mismatch on small fraction of the corpus)
Links for downloading:
- - Male TTS data: https://tinyurl.com/jyusbzs5
  - Female TTS data: https://tinyurl.com/5t4zuvae
Please cite reference [2] (see Reference section below) if you are using this dataset

Paper submission

If you have original and unpublished work on any of these topics, or related ones, we invite you to submit a paper to this special session. The paper submission deadline is the same as the regular papers. The papers should follow the SPECOM2023 format and guidelines, which can be found on the conference website: https://www.iitdh.ac.in/specom-2023/Submissions.html. The papers will be peer-reviewed by expert reviewers, and accepted papers will be published in the conference proceedings by Springer.

We look forward to receiving your submissions and seeing you at SPECOM2023!

References

Abhayjeet Singh, Arjun Singh Mehta, Ashish Khuraishi K S, Deekshitha G, Gauri Date, Jai Nanavati, Jesuraja Bandekar, Karnalius Basumatary, Karthika P, Sandhya Badiger, Sathvik Udupa, Saurabh Kumar, Prasanta Kumar Ghosh, Prashanthi V, Priyanka Pai, Raoul Nanavati, Sai Praneeth Reddy Mora, Srinivasa Raghavan, “An ASR corpus in Chhattisgarhi, an under-resourced Indian language,” submitted to SPECOM 2023
Abhayjeet Singh, Anjali Jayakumar, Deekshitha G, Hitesh Tiwari, Jesuraja Bandekar, Sandhya Badiger, Sathvik Udupa, Saurabh Kumar, Prasanta Kumar Ghosh, “An end-to-end TTS model in Chhattisgarhi, a under-resourced Indian language,” submitted to SPECOM 2023

Under Resource Languages

SPECIAL SESSION ON SPEECH PROCESSING AND SPEECH TECHNOLOGY FOR UNDER RESOURCED LANGUAGES

Session chairs

Dr. Prasanta Kumar Ghosh

Dr. Preethi Jyothi

About the session

Chhattisgarhi ASR and TTS datasets details

ASR

TTS

Paper submission

References

Contact Us

Mail: specom2023@gmail.com