to see the performance of Sphinx decoder for UT-Scope data.
the results are not good, hence we will need to "condition-train" the sphinx decoder as well as aligner.
Friday, March 09, 2007
Running experiments using Sphinx Aligner.
This is done:
The script layout is made ready by Dr. Wooil Kim.
down converted the files from 44KHz to 16KHz
created the trs and ctl files.
scp-ed the files to the proper location.
Here is what happens:
1. not all files are aligned. actually, about 10% of the files are aligned.
2. all the files - male speakers.
Need to find the issues here. "why are not all files aligned?"
what could it be:
1. is the file size to big? - not really. almost all the files are 5 seconds or lesser.
2. the signal amplitude may a issue - but I am able to listen to it. the files which were aligned are not different from the files which were aligned.
Steps:
1. run the experiment - a smaller one - on a couple of files which were and weren't aligned, for a single speaker only. [instead of running for all the files for all the speakers]. see the results. In case the results repeat.. see the error or log files. [you can actually do the same for already done files - but forget it, let us start fresh]
Running experiment with a smaller file set for mak1_1 speaker.
With first three speech files (renamed as test2, test1, test3 respectively), the experiment gives the same result. files test2 and test3 were not aligned. Why? need to discuss. find the reasons
The final state NOT reached:
The error message in the .log file is -
ERROR: "main_align.c", line 902: Final state not reached; no alignment for test2
Is it that: test1.mfc has 298 frames, but test2.mfc has 498 frames, and test3.mfc has 348 frames. more # of frames.
let us see for a bigger index of files....
The above premise that it works only with less than or equal to 298 frames is not true...
Need to understand the error message... final state not reached.. when can this happen??
The reason logically is:
1. the transcription given is not correct.. as the viterbi decoding does not yields a very low score (lower than the threshold set internally)
2. or wrong model settings...
[reference: http://www.speech.cs.cmu.edu/sphinxman/FAQ.html]
further, I will need to refer to http://www.speech.cs.cmu.edu/sphinxman/FAQ.html#9 to understand more.
This is done:
The script layout is made ready by Dr. Wooil Kim.
down converted the files from 44KHz to 16KHz
created the trs and ctl files.
scp-ed the files to the proper location.
Here is what happens:
1. not all files are aligned. actually, about 10% of the files are aligned.
2. all the files - male speakers.
Need to find the issues here. "why are not all files aligned?"
what could it be:
1. is the file size to big? - not really. almost all the files are 5 seconds or lesser.
2. the signal amplitude may a issue - but I am able to listen to it. the files which were aligned are not different from the files which were aligned.
Steps:
1. run the experiment - a smaller one - on a couple of files which were and weren't aligned, for a single speaker only. [instead of running for all the files for all the speakers]. see the results. In case the results repeat.. see the error or log files. [you can actually do the same for already done files - but forget it, let us start fresh]
Running experiment with a smaller file set for mak1_1 speaker.
With first three speech files (renamed as test2, test1, test3 respectively), the experiment gives the same result. files test2 and test3 were not aligned. Why? need to discuss. find the reasons
The final state NOT reached:
The error message in the .log file is -
ERROR: "main_align.c", line 902: Final state not reached; no alignment for test2
Is it that: test1.mfc has 298 frames, but test2.mfc has 498 frames, and test3.mfc has 348 frames. more # of frames.
let us see for a bigger index of files....
The above premise that it works only with less than or equal to 298 frames is not true...
Need to understand the error message... final state not reached.. when can this happen??
The reason logically is:
1. the transcription given is not correct.. as the viterbi decoding does not yields a very low score (lower than the threshold set internally)
2. or wrong model settings...
[reference: http://www.speech.cs.cmu.edu/sphinxman/FAQ.html]
further, I will need to refer to http://www.speech.cs.cmu.edu/sphinxman/FAQ.html#9 to understand more.
Subscribe to:
Posts (Atom)
