Plenary Speakers
Prof. Bhiksha Raj
Carnegie Mellon University, USA
Title: Learning from weak and noisy labels
Abstract: One of the key bottlenecks in training diverse accurate classifiers is the need for “strongly-labeled” training data, that provide precisely demarcated instances of the classes to be recognized. Such data are, however, difficult to obtain, particularly in bulk. The alternate approach is to simply obtain imprecisely labeled data. Here by “imprecise” we refer to data where the labels provided may be noisy, i.e. erroneous, or “weak”, i.e. at a collection-level, where the collections of data are tagged with information about the classes inside, without identifying the labels of individual instances in the collection. Such labels are particularly common for audio and image data.
In this talk, I discuss the problem of training with such imprecisely labeled data. I introduce a few easy approaches and finally talk about an expectation maximization framework that applies to all forms of imprecision. I show how this framework performs comparably or better than current approaches which are specialized to each form of label imprecision.
Prof. S. Umesh
IIT Madras, India
Title: Speech and Language Research For Indian Languages
Abstract: In this talk, I will give a brief overview of the proposal of Speech Consortium to develop various technologies as part of the National Mission on Language Technologies (NLTM). I will also briefly demonstrate some of the tools developed as part of the ASR team towards this effort., including speech-2-speech synthesis as well as video-2-video translation and transcreation.
In the second half of the talk, I will talk about the various research activities done in our SPRING LAB (formerly Speech Lab, IITM) over the past couple of years, especially in self-supervised learning, multilingual speech recognition, speaker verification and audio representation learning. This includes our proposed ccc-wav2vec2.0 and data2vec-aqc SSL methods which have outperformed other Speech SSL models such as conventional wav2vec2.0 or data2vec. In addition, I will also talk about Generalised Audio representations developed in our Lab as well as the Indian languages speech data sets that we have recently released along with ASR recipes and source codes in ESPnet and Fairseq.
Prof. Visar Berisha
Arizona State University, USA
Title: Translating clinical speech analytics from the lab to the clinic: challenges and opportunities
Abstract: The dominant paradigm in clinical speech analytics has been supervised learning with high-dimensional input features. Despite many years of work by academic and industry research labs, and thousands of publications, the translation of techniques developed under this paradigm has been slow. The focus of this talk will be on why this is and what we can do about it. We will discuss converging evidence collected from multiple systematic reviews that the traditional supervised machine learning paradigm leads to overoptimistic estimates of how well these models actually work when deployed. Next, we will discuss an alternate approach that focuses on developing a more holistic measurement model for clinical speech analytics and provide several examples of models developed under this paradigm.
Prof. Satoshi Nakamura
Nara Institute of Science and Technology, Japan
Title: Modeling Simultaneous Speech Translation
Abstract: Speech translation technology has reached the level of providing a service using a smart phone as a result of long research. However, there are still various problems to realize the automatic simultaneous interpretation that produces the interpretation output before the end of the utterance. In this talk, I will introduce the recent research activities on the automatic simultaneous speech translation and the simultaneous speech translation system developed by NAIST team for IWSLT 2023 shared task. The talk also includes research activities on speech translation preserving para-linguistic information.