Monday, April 16, 2007

Some how the conversion from little Endian wave file (windows wav format) to the Big Endian raw format (Sun unix box) was not done correctly by using the following two commands:
- to start with we need to remember that the initial wav file was converted to big endian one and we start from here. This file is 44100 Hz sampled.
Then,
1. sox for downsampling and,
2. dd for byte swapping.

This experiment failed.

Hence, the results for Sphinx experiments were the way they were (very low speech recognition accuracy).
After which the steps for conversion were changed.
The three step process:
1. The existing raw file was first converted to the wav format (again using Sox) then,
2. downsampled (sox) and,
3. then dd for byte swapping.

This worked as the recognition rate drastically improved from 3% to 90% under all conditions (neutral, cognitive load, physical load).

Lessons learned:
1. Know thy tools
2. Beware of finding what you are looking for.

No comments: