Alexa, Tell Me How Kaldi and Deep Learning Revolutionized Automatic Speech Recognition!

This presentation will review the history of automatic speech recognition (ASR) technology, and show how deep neural networks have revolutionized the field within the last 5 years, giving birth to Alexa, enhancing Siri and nudging Google Home to market, and generally making ASR a household phenomenon. The story will be told from the viewpoint of Kaldi, a set of open-source ASR tools widely used in academia and industry, touching on some key milestones and seminal developments along this short-yet-exciting journey, such as suitable network architectures, novel training criteria, and scalable optimization algorithms, along with prescient research funding, realistic data sets, and competitive benchmark tests conducted by neutral entities.

Sanjeev Khudanpur, Johns Hopkins Center for Language and Speech Processing