== Compile HTK == - Enter 32-bit chroot - Edit configure.in, comment out: bindir=${bindir}.${host_cpu} libdir=${libdir}.${host_cpu} - Run autoconf: $ autoconf - Run configure: $ export CPU=i386 $ ./configure CC=gcc-3.3 - Compile (as root, unfortunately) $ sudo make - Make chroot symlinks: $ cd /chroot/usr/local/bin $ for f in {H*,Cluster,L*}; do sudo ln -s $f ${f}32; done $ cd /usr/local/bin $ for f in /chroot/usr/local/bin/{H*,Cluster,L*}32; do sudo ln -s /usr/local/bin/do_dchroot `basename $f`; done == Create general dictionary == Using a slightly modified SAMPA, http://www.phon.ucl.ac.uk/home/sampa/swedish.htm See transcription.txt. Hand-wrote numerals-swe.dict == Create training and test utterances == echo 'gr -number=200 | l | wf random_utts.txt' | gf grammar/NumeralsSwe.gf grep -v '^$' random_utts.txt | nl -v0 -n rz -s": " | head -n100 > train_utts.txt grep -v '^$' random_utts.txt | nl -v0 -n rz -s": " | tail -n100 > test_utts.txt == Record data == ghc -package hsshellscript --make -o prompt prompt.hs cd train_data HSLab & ../prompt ../train_utts.txt == Create word list == echo 'pg -printer=fullform | wf fullform.txt' | gf grammar/NumeralsSwe.gf perl -pe 's/([^:]*)\s:.*/$1/' fullform.txt > wordlist.txt == Create application dictionary and monophone list == $ HDMan32 -m -w wordlist.txt -n monophones1 -l dlog dict.nosil numerals-swe.dict $ echo '!SILENCE sil' > dict $ cat dict.nosil >> dict $ if ! grep -q '^sil$' monophones1; then echo sil >> monophones1; fi == Create transcription file == - Create mkphones0.led: EX IS sil sil DE sp - Create word-level MLF file for training utterances: $ ./prompts2mlf train_utts.txt > train_utts_words.mlf - Create phone-level MLF file for training utterances: $ HLEd32 -l '*' -d dict -i train_utts_phone.mlf mkphones0.led train_utts_words.mlf == Parametrize the data == - Create the file param_config: # Coding parameters TARGETKIND = MFCC_0 TARGETRATE = 100000.0 SAVECOMPRESSED = T SAVEWITHCRC = T WINDOWSIZE = 250000.0 USEHAMMING = T PREEMCOEF = 0.97 NUMCHANS = 26 CEPLIFTER = 22 NUMCEPS = 12 ENORMALISE = F - Create directory for storing parameterized inputs: $ mkdir train_param - Create a script file for HCopy: $ for u in train_data/*; do echo $u train_param/`basename $u`.mfc; done > codetr.scp - Do the parametrization: $ HCopy32 -T 1 -C param_config -S codetr.scp == Create monophone HMMS == - Create the file proto. - Create train.scp with the list of training files: $ ls train_param/*.mfc > train.scp - Create a config file for the training: $ cp param_config train_config - Change train_config, setting: TARGETKIND = MFCC_0_D_A - Calculate initial Gaussians: $ mkdir hmm0 $ HCompV32 -C train_config -f 0.01 -m -S train.scp -M hmm0 proto - Create hmm0/macros file: $ echo '~o 39 ' > hmm0/macros $ cat hmm0/vFloors >> hmm0/macros - Create monophones0: $ egrep -v '^sp$' monophones1 > monophones0 - Create hmm0/hmmdefs: $ ./proto2hmmdefs hmm0/proto monophones0 > hmm0/hmmdefs - Re-estimate: $ mkdir hmm1 $ HERest32 -C train_config -I train_utts_phone.mlf -t 250.0 150.0 1000.0 -S train.scp -H hmm0/macros -H hmm0/hmmdefs -M hmm1 monophones0 $ mkdir hmm2 $ HERest32 -C train_config -I train_utts_phone.mlf -t 250.0 150.0 1000.0 -S train.scp -H hmm1/macros -H hmm1/hmmdefs -M hmm2 monophones0 $ mkdir hmm3 $ HERest32 -C train_config -I train_utts_phone.mlf -t 250.0 150.0 1000.0 -S train.scp -H hmm2/macros -H hmm2/hmmdefs -M hmm3 monophones0 == Fixing the silence models == - Add sp model: $ mkdir hmm4 $ cp hmm3/macros hmm4/macros $ ./makespmodel.pl < hmm3/hmmdefs > hmm4/hmmdefs - Create sil.hed: AT 2 4 0.2 {sil.transP} AT 4 2 0.2 {sil.transP} AT 1 3 0.3 {sp.transP} TI silst {sil.state[3],sp.state[2]} - Add extra transitions and tie sp to sil: $ mkdir hmm5 $ HHEd32 -H hmm4/macros -H hmm4/hmmdefs -M hmm5 sil.hed monophones1 - Re-estimate: $ mkdir hmm6 $ HERest32 -C train_config -I train_utts_phone.mlf -t 250.0 150.0 1000.0 -S train.scp -H hmm5/macros -H hmm5/hmmdefs -M hmm6 monophones1 $ mkdir hmm7 $ HERest32 -C train_config -I train_utts_phone.mlf -t 250.0 150.0 1000.0 -S train.scp -H hmm6/macros -H hmm6/hmmdefs -M hmm7 monophones1 == Re-align the training data == - Re-align $ HVite32 -l '*' -o SWT -b '!SILENCE' -C train_config -a -H hmm7/macros -H hmm7/hmmdefs -i train_utts_aligned.mlf -m -t 250.0 -y lab -I train_utts_words.mlf -S train.scp dict monophones1 - Generate train-aligned.scp: NOTE: after realignment, not all input files have an entry in the MLF file, so we need to generate a new train.scp $ perl -ne 'if (s/^"\*\/(utt\d+)\.lab"$/$1/){chomp;print "train_param/$_.mfc\n";}' train_utts_aligned.mlf > train-aligned.scp - Re-estimate: $ mkdir hmm8 $ HERest32 -C train_config -I train_utts_aligned.mlf -t 250.0 150.0 1000.0 -S train-aligned.scp -H hmm7/macros -H hmm7/hmmdefs -M hmm8 monophones1 $ mkdir hmm9 $ HERest32 -C train_config -I train_utts_aligned.mlf -t 250.0 150.0 1000.0 -S train-aligned.scp -H hmm8/macros -H hmm8/hmmdefs -M hmm9 monophones1 == Create tied-state triphones == === Create list of all triphones in dictionary === - Create mktridict.ded: TC - Create triphone list from dictionary: $ HDMan32 -b sp -g mktridict.ded -n triphones1 /dev/null dict === Annotate training data with triphones === - Create mktri.led: WB sp WB sil TC - Create triphone MLF: $ HLEd32 -l '*' -i train_utts_wintri.mlf mktri.led train_utts_aligned.mlf === Clone monophone models === - Create mktri.hed: $ ./maketrihed monophones1 triphones1 > mktri.hed - Clone monophone models to make triphone models: $ mkdir hmm10 $ HHEd32 -H hmm9/macros -H hmm9/hmmdefs -M hmm10 mktri.hed monophones1 === Reestimate using triphone data === $ mkdir hmm11 $ HERest32 -B -C train_config -I train_utts_wintri.mlf -t 250.0 150.0 1000.0 -S train-aligned.scp -H hmm10/macros -H hmm10/hmmdefs -M hmm11 triphones1 $ mkdir hmm12 $ HERest32 -B -C train_config -I train_utts_wintri.mlf -t 250.0 150.0 1000.0 -S train-aligned.scp -H hmm11/macros -H hmm11/hmmdefs -M hmm12 triphones1 == Test ==