Making Linux Offline Voice Recognition Easier

For just about any task you care to name, a Linux-based desktop computer can get the job done using applications that rival or exceed those found on other platforms. However, that doesn’t mean it’s always easy to get it working, and speech recognition is just one of those difficult setups.

A project called Voice2JSON is trying to simplify the use of voice workflows. While it doesn’t provide the actual voice recognition, it does make it easier to get things going and then use speech in a natural way.

The software can integrate with several backends to do offline speech recognition including CMU’s pocketsphinx, Dan Povey’s Kaldi, Mozilla’s DeepSpeech 0.9, and Kyoto University’s Julius. However, the code is more than just a thin wrapper around these tools. The fast training process produces both a speech recognizer and an intent recognizer. So not only do you know there is a garage door, but you gain an understanding of the opening and closing of the garage door.

In addition, the tools are all made to work in Unix-style pipelines which is refreshing. Here’s an example configuration from the project’s website:

[GarageDoor]
open the garage door
close the garage door

[LightState]
turn on the living room lamp
turn off the living room lamp

There are templating features so you can specify optional words and alternative words in a single rule. There are other features like mapping an object like living room lamp into something more computer-friendly.

Overall, this looks like a fun tool to have in your kit. If you do something interesting with it, be sure to drop us a tip so we can cover it. Meanwhile, we’ve been watching Linux speech for quite a while. Of course, what we really want is speech commands like the USS Enterprise, and we have to admit it is getting closer.