קולקוויום מחלקתי 1.2.24
Dr. Hagai Aronowitz
IBM Haifa Research Lab.
Will lecture on
Speech Analysis using self-supervised speech representations
Self-supervised representation learning has led to remarkable advances in natural language processing. In my talk, I will describe how to leverage recent advances in self-supervised-based speech processing to create a common speech analysis engine. Such an engine can handle multiple speech processing tasks, to obtain accuracy which is now the state-of-the-art. I will describe our work on language identification where we obtain an error reduction of more than 50% compared to the state-of-the-art. Then, I will describe our emotion recognition work and show that speaker normalization can be applied to reach state-of-the-art accuracy.
In the second part of the talk, I will describe our recent work on predicting turn-transitions in spoken conversational ordering and show how using the common engine outperforms the state-of-the-art approach that is based on automatic speech recognition.
Finally, I will describe the current efforts in unifying text and speech processing under a single large language model framework.
Hagai Aronowitz is a Senior Research Scientist at IBM Haifa Research Lab. He received a B.Sc. degree in Computer Science, Mathematics, and Physics from the Hebrew University, Jerusalem in 1994, and an M.Sc. degree, Summa Cum Laude, and a Ph.D. degree, both in Computer Science from Bar-Ilan University, in 2000 and 2006 respectively. From 1994 to 2000, he was with the Israeli Defense Forces, where he led advanced research in the fields of speech processing and speech recognition. In 2006-2007 he was a postdoctoral fellow in the advanced LVCSR group at IBM T. J. Watson Research Center, Yorktown Heights, NY. In 2014, he gave a tutorial in Interspeech 2014 on speaker diarization. During 2022-2023, Hagai was an elected member of the Speech and Language Technical Committee (SLTC), which is part of the IEEE Signal Processing Society. Hagai is part of the organizing committee of Interspeech 2024. His research interests include speech processing and deep learning.