Emotion Recognition

Wednesday, April 25, 2007

As accessing binary data (either writing and reading, basically the reading part of it) was tedious with PERL.
I thought C++ will help me with it. Initially, I was having to deal with the last data element being stored twice. This is now resolved....

The sample code is as:

// writing and reading a binary data
//
#include
#include
using namespace std;

// gives nice explaination on reading and writing a binary file
// ref: http://www.gamedev.net/reference/articles/article1127.asp
// http://courses.cs.vt.edu/~cs2604/spring06/binio.html
//
int main() {

// data to be written to a file
//
float arr[4] = {10.234, 1.212, 0.2, 342};

ofstream outfile("test1.raw", ios::out | ios::binary);

cout << "writing to a file " << i =" 0;" j =" arr[i];"> float data1[4];
and have --> myfile.read( (char *) &data1, sizeof(data1));
to access the data..

Tuesday, April 24, 2007

My idea was to access binary data using PERL. Specifically, if the data is of variable length.
For example, I have a data stream: 1 12 123 12 34. With PERL, can I write this data to a file? The answer is sure, I can. Actually writing the above variable length data is easy. All you have to do is open the file in binary mode and dump the data.

But, retrieving the data is an issue. I am not able to get hold of it...

Here is the code:

#! /usr/bin/perl

# to read and store binary files
# ref: --> http://www.herongyang.com/perl/binary.html

# gives problem with variable length data..

# part 1) to write binary data to a file
# this works correct.

@floatvalues = (23.3, 3.2, 4.3, 5.4);
$out = "test.raw";
open(OUT, "> $out");
binmode(OUT);
foreach $value (@floatvalues) {
print OUT "$value\n";
print "$value ";
}
print "\n";
close(OUT);

# part 2) to read binary data from a file
# reading is a trouble. As read function has a parameter identifying the data
# width.

$IN = "test.raw";
open(IN, "$IN");

while (read(IN, $value, 3)) {
print "$value \n";
}

close(IN);

How to overcome this?

Monday, April 16, 2007

Some how the conversion from little Endian wave file (windows wav format) to the Big Endian raw format (Sun unix box) was not done correctly by using the following two commands:
- to start with we need to remember that the initial wav file was converted to big endian one and we start from here. This file is 44100 Hz sampled.
Then,
1. sox for downsampling and,
2. dd for byte swapping.

This experiment failed.

Hence, the results for Sphinx experiments were the way they were (very low speech recognition accuracy).
After which the steps for conversion were changed.
The three step process:
1. The existing raw file was first converted to the wav format (again using Sox) then,
2. downsampled (sox) and,
3. then dd for byte swapping.

This worked as the recognition rate drastically improved from 3% to 90% under all conditions (neutral, cognitive load, physical load).

Lessons learned:
1. Know thy tools
2. Beware of finding what you are looking for.

Friday, March 09, 2007

to see the performance of Sphinx decoder for UT-Scope data.

the results are not good, hence we will need to "condition-train" the sphinx decoder as well as aligner.

Running experiments using Sphinx Aligner.
This is done:
The script layout is made ready by Dr. Wooil Kim.
down converted the files from 44KHz to 16KHz
created the trs and ctl files.
scp-ed the files to the proper location.

Here is what happens:
1. not all files are aligned. actually, about 10% of the files are aligned.
2. all the files - male speakers.

Need to find the issues here. "why are not all files aligned?"

what could it be:
1. is the file size to big? - not really. almost all the files are 5 seconds or lesser.
2. the signal amplitude may a issue - but I am able to listen to it. the files which were aligned are not different from the files which were aligned.

Steps:
1. run the experiment - a smaller one - on a couple of files which were and weren't aligned, for a single speaker only. [instead of running for all the files for all the speakers]. see the results. In case the results repeat.. see the error or log files. [you can actually do the same for already done files - but forget it, let us start fresh]

Running experiment with a smaller file set for mak1_1 speaker.
With first three speech files (renamed as test2, test1, test3 respectively), the experiment gives the same result. files test2 and test3 were not aligned. Why? need to discuss. find the reasons

The final state NOT reached:
The error message in the .log file is -
ERROR: "main_align.c", line 902: Final state not reached; no alignment for test2
Is it that: test1.mfc has 298 frames, but test2.mfc has 498 frames, and test3.mfc has 348 frames. more # of frames.

let us see for a bigger index of files....
The above premise that it works only with less than or equal to 298 frames is not true...
Need to understand the error message... final state not reached.. when can this happen??

The reason logically is:
1. the transcription given is not correct.. as the viterbi decoding does not yields a very low score (lower than the threshold set internally)
2. or wrong model settings...
[reference: http://www.speech.cs.cmu.edu/sphinxman/FAQ.html]
further, I will need to refer to http://www.speech.cs.cmu.edu/sphinxman/FAQ.html#9 to understand more.

Emotion Recognition

Wednesday, April 25, 2007

Tuesday, April 24, 2007

Monday, April 16, 2007

Friday, March 09, 2007

Blog Archive

About Me