You are here: Home » NewsFeeds » Open Source Toolkits for Speech Recognition

Open Source Toolkits for Speech Recognition

Looking at CMU Sphinx, Kaldi, HTK, Julius, and ISIP  |  February 23rd, 2017
As members of the deep learning R&D team at SVDS, we are interested in comparing Recurrent Neural Network (RNN) and other approaches to speech recognition. Until a few years ago, the state-of-the-art for speech recognition was a phonetic-based approach including separate components for pronunciation, acoustic, and language models. Typically, this consists of n-gram language models combined with Hidden Markov models (HMM). We wanted to start with this as a baseline model, and then explore ways to combine it with newer approaches such as Baidu’s Deep Speech. While summaries exist explaining these baseline phonetic models, there do not appear to be any easily-digestible blog posts or papers that compare the tradeoffs of the different freely available tools.
This article reviews the main options for free speech recognition toolkits that use traditional HMM and n-gram language models. For operational, general, and


Original article