Speaker separation in the wild, and the industry's view
Audio recordings are a data source of great value used for analyzing conversations and enabling digital assistants. An important aspect of analyzing single-channel audio conversations is identifying who said what, a task known as speaker diarization. The task is further complicated when the number of speakers is a priori unknown. In this talk we’ll review the motivation for speaker diarization and verification, outline previous approaches to diarization and their relation to “real” diarization needs (as in audio applications such as Chorus.ai’s Conversation Analytics platform), and present the pipeline required for integrating these solutions, as well as recent SOTA end-to-end deep learning approaches to this problem.