Next Previous Contents

3. Usage

3.1 Introduction

CVoiceControl consists of three executables:

First of all, you have to calibrate your microphone to use it for speech recognition. Use microphone_config to do this. When your sound hardware is prepared you can use model_editor to create speaker models. These are the main objects needed for the speech recognition process. cvoicecontrol is the actual speech recognition program.

These three components are described in the next three sections. Also the structure of the speaker models will be described more closely.

3.2 Microphone Calibration

Microphone_config must be started by entering the command

% microphone_config
at a command prompt (Linux console, xterm, kvt etc.). First, the tool generates a list of available mixer and audio devices. If the tool fails at this time, it is because it could not find any appropriate mixer and/or audio devices in your system. In this case, make sure that your sound device is installed correctly and that your sound driver is working properly.

The microphone calibration process is divided into five steps. These steps can be run from the main menu. A step that has been completed successfully is displayed in bold face, a step that has not been completed is displayed in normal face and a step that can not be run at this time is not displayed at all.

The five steps are:

If microphone_config managed to detect your audio hardware automatically the first two steps (Select Mixer Device and Select Audio Device) are displayed in bold, i.e. marked as ``completed successfully''. (The selected device files are displayed in parantheses behind the menu entries.) In this case you may continue with step three. Nevertheless, if you have more than one sound card installed or if you have to select a non-default mixer or audio device, hit enter on the respective menu item and select a device from the list. In case of doubt, stick with the suggested settings!

Next, run step three: Adjust Mixer Levels. Here, we try to estimate good values for the mixer channels MICROPHONE IN (MIC) and (if available) INPUT GAIN (IGAIN). You will be guided through the process by detailed information dialogs.

To succeed, this step strongly relies on your cooperation!

Initially, the MIC level is set to the maximum and the IGAIN level (if available) is set to the minimum value.

If an IGAIN channel is available then its level is increased while you speak at a conversational volume until the input signal is strong enough. Hint: Reasonable values for the IGAIN level on my system range between 1 and 8.

Next, the microphone level is reduced repeatedly while you speak at a ``maximum volume level'' until the incoming signal does not exceed an upper limit anymore. Hint: Reasonable values for the MIC level on my system range between 60 and 95.

Upon successful completion of this step, the next two steps are available for selection from the main menu.

Next, select Calculate Recording Thresholds from the menu.

During this step, we try to find reasonable energy levels at which to start the automatic voice recording and at which to stop the recording. Again, you will be guided through the process by detailed information dialogs.

In the next step Estimate Characteristics of Recording Channel the characteristics (like background noise etc.) of the recording channel are estimated. Again, there is online information to guide you through the process.

If all five steps have been completed successfully, the item Write Configuration becomes available in the main menu. Please select it to store all the gathered information to the file config which is put in the directory .cvoicecontrol in your home directory. The directory .cvoicecontrol is created if necessary.

If the configuration has been saved successfully you can leave the configuration tool by selecting Exit from the main menu.

Congratulations, your microphone is set up for speech recognition!

3.3 Speaker Model Editor

CVoiceControl is a template-matching based speech recognition system, i.e. for each command that can be recognized there have to be some sample utterances which an incoming utterance can be compared to. All this stuff is collected in a so-called speaker model.

A speaker model consists of a variable number of reference items where each reference item corresponds to a command that can be recognized. A reference item consists of a label (a transcription of what is said), a command (a unix command that is executed upon recognition of this reference item) and a variable number of sample utterances.

Roughly speaking, to recognize an incoming utterance, it is compared to all sample utterances of all reference items in the active speaker model. If the sample utterances of one reference item are most similar to the incoming utterance (i.e. have the smallest distance score), this reference item will be chosen as recognition result.

To launch the speaker model editor open a console and type:

% model_editor

From the main menu of the editor you can reset the current speaker model (New Speaker Model), load one from file (Load Speaker Model), edit the model (Edit Speaker Model), save it (Save Speaker Model) and leave the editor (Exit).

Model Editor:

The model editor shows the reference items of the current speaker model in a table view, one reference per line. A reference item in the table can be highlighted (selected) using the up and down cursor keys. At the bottom of the dialog a brief summary of keyboard commands is displayed for your convenience. Press a to add a new reference item to the model, press d to delete the currently highlighted item, Press Enter to edit the currently highlighted item and press b to return to the main menu. So for example, to add and edit a new reference item, please press a followed by Enter.

Edit Speaker Model Item:

Selecting a reference item by pressing Enter opens the item editor dialog. This dialog displays the label and command of the selected item as well as a list of donated sample utterances. A brief summary of keyboard commands is displayed at the bottom.

Sample utterances in the list view can be highlighted using the up and down cursor keys. To record a new sample utterance press r. The recording is then done automatically, i.e. no further keyboard interaction is required to record the utterance. Note: After pressing r you should wait a second or so before starting to talk! This is because an audio buffer needs to be filled before the actual automatic recording can be started! To delete a highlighted sample utterance press d, to play it press Enter. To edit the label string of the current item press l. To edit the command string press c. To leave the current dialog press b.

Important: Listen to every utterance you record to make sure that nothing has been cut off at the boundaries! If many utterances are cut off, please rerun the microphone configuration tool!

Note: To ensure a good recognition quality, a minimum number of sample utterances per reference item is required. By default, the minimum number is set to ``4''.

Note: Recognized commands are executed in the foreground by default. This means that the speech recognizer blocks until the executed command has finished! This behaviour is required because many sound cards do not allow for recording and playing at the same time. So, if one wants to output any acoustic reaction to the sound card, the speech recognizer will need to wait until the command was executed before continuing in auto recording mode. If you want to have the speech recognizer run a command in the background and continue with recognition you have to append a ``&'' to the command!

By the way, the command may consist of a sequence of commands separated by ``;''.

Important: If a reference item has been recognized by the speech recognizer the associated command will be executed! There is no guarantee that the recognition result is correct. Also, the speech recognizer does not check whether the execution of a command would harm your system (we talk about commands like rm). Thus, it is the users responsibility to define harmless commands in the speaker model and to make sure that the reference items in a speaker model are not too confusable!

Once you have finished editing the speaker model, save it to disk via Save Speaker Model from the main menu. Note that speaker model files must have the extension .cvc. If you do not specify this extension it will be appended to the file name automatically!

3.4 Speech Recognizer

To start the speech recognizer open a console and type:

% cvoicecontrol <model_file>
where <model_file> is the name of the speaker model you want to use. The speech recognizer enters auto recording mode automatically.

Note: Make sure that no application needs access to the sound device at this time, as most sound devices only allow for exclusive access!

After a command was recognized successfully the speech recognizer reenters automatic recording mode, being ready for the next speech command.

To finish the program, you have to kill the speech recognizer explicitely by pressing Ctrl-C in the console where you started the recognizer or by issuing the command killall cvoicecontrol from any command prompt.

Hint: There is also a special command name that can be used in a speaker model's reference item to finish cvoicecontrol. It is called cvoicecontrol_off.

Note: The speech recognizer can be started in a special mode by specifying the command line option --once, i.e. by starting it the follow way:

% cvoicecontrol --once <model_file>

In this case, the speech recognizer will exit automatically after the first successful recognition run. The exit code of the program is set to the id number of the reference item that has been recognized. As an example let us consider a speaker model yes-no.cvc that contains two reference items. The first one being ``Yes'', the second one being ``No''. Invoked like

% cvoicecontrol --once yes-no.cvc
the speech recognizer returns 0 if ``Yes'' was recognized and 1 if ``No'' was recognized. Using speech prompts in shell scripts is then straightforward. Example:
  #!/usr/bin/tcsh

  cvoicecontrol --once yes-no.cvc
  set result = $status

  if ($result == "-1") then
    echo "Error!"
  else if ($result == "0") then
    echo "You said yes"
  else if ($result == "1")
    echo "You said no"
  endif

  exit

Note: In a tcsh script the shell variable status always contains the exit code of the most recently executed command! To obtain the exit code in a bash script you have to use the special parameter $?.

Have fun with CVoiceControl!


Next Previous Contents