Table of Contents
1. 在语料库的data目录下创建alignme目录,用于存放要进行对齐的音频的相关数据
cd mycorpus/data
mkdir alignme
# 创建要对齐的目标音频数据的text、segments、wav.scp、utt2spk、spk2utt等文件
# 过程和创建data/train里面的数据一样,参考http://pages.jh.edu/~echodro1/tutorial/kaldi/kaldi-training2.html
- text文件
此文件用于说明每段音频里包含的words,格式如下:
utt_id WORD1 WORD2 WORD3 WORD4 …
utt_id = utterance ID
例子:
110236_20091006_82330_F_0001 I’M WORRIED ABOUT THAT
110236_20091006_82330_F_0002 AT LEAST NOW WE HAVE THE BENEFIT
110236_20091006_82330_F_0003 DID YOU EVER GO ON STRIKE
…
120958_20100126_97016_M_0285 SOMETIMES LESS IS BETTER
120958_20100126_97016_M_0286 YOU MUST LOVE TO COOK
生成words.txt:cut -d ' ' -f 2- text | sed 's/ /\n/g' | sort -u > words.txt
- segments文件
此文件用于说明每段音频在某个音频文件里的开始和结束位置。这个文件不是必需的。格式如下:
utt_id file_id start_time end_time
utt_id = utterance ID file_id = file ID
start_time = start time in seconds
end_time = end time in seconds
例子:
110236_20091006_82330_F_001 110236_20091006_82330_F 0.0 3.44
110236_20091006_82330_F_002 110236_20091006_82330_F 4.60 8.54
110236_20091006_82330_F_003 110236_20091006_82330_F 9.45 12.05
110236_20091006_82330_F_004 110236_20091006_82330_F 13.29 16.13
110236_20091006_82330_F_005 110236_20091006_82330_F 17.27 20.36
110236_20091006_82330_F_006 110236_20091006_82330_F 22.06 25.46
110236_20091006_82330_F_007 110236_20091006_82330_F 25.86 27.56
110236_20091006_82330_F_008 110236_20091006_82330_F 28.26 31.24
…
120958_20100126_97016_M_282 120958_20100126_97016_M 915.62 919.67
120958_20100126_97016_M_283 120958_20100126_97016_M 920.51 922.69
120958_20100126_97016_M_284 120958_20100126_97016_M 922.88 924.27
120958_20100126_97016_M_285 120958_20100126_97016_M 925.35 927.88
120958_20100126_97016_M_286 120958_20100126_97016_M 928.31 930.51
- wav.scp文件
此文件用于说明每个音频文件的位置。如果音频文件格式是wav格式的,那么使用如下格式:
file_id path/file
例子:
110236_20091006_82330_F path/110236_20091006_82330_F.wav
111138_20091215_82636_F path/111138_20091215_82636_F.wav
111138_20091217_82636_F path/111138_20091217_82636_F.wav
…
120947_20100125_59427_F path/120947_20100125_59427_F.wav
120953_20100125_79293_F path/120953_20100125_79293_F.wav
120958_20100126_97016_M path/120958_20100126_97016_M.wav
- utt2spk文件
指明每段音频是由哪个人说的,格式如下:
utt_id spkr
utt_id = utterance ID
spkr = speaker ID
例子:
110236_20091006_82330_F_0001 110236
110236_20091006_82330_F_0002 110236
110236_20091006_82330_F_0003 110236
110236_20091006_82330_F_0004 110236
…
120958_20100126_97016_M_0284 120958
120958_20100126_97016_M_0285 120958
120958_20100126_97016_M_0286 120958
- spk2utt文件
和utt2spk文件类似,但是现在按照speaker的顺序来列,格式如下:
spkr utt_id1 utt_id2 utt_id3
在有了utt2spk文件的前提下,可以用Kaldi提供的脚本生成spk2utt:
utils/fix_data_dir.sh data/alignme
2. 抽取MFCC特征
使用如下脚本为data/alignme目录下的音频文件生成MFCC特征:
cd mycorpus
mfccdir=mfcc
for x in data/alignme; do
steps/make_mfcc.sh --cmd "$train_cmd" --nj 16 $x exp/make_mfcc/$x $mfccdir
utils/fix_data_dir.sh data/alignme
steps/compute_cmvn_stats.sh $x exp/make_mfcc/$x $mfccdir
utils/fix_data_dir.sh data/alignme
done
直接跑上面这个脚本是不行的,需要参考run.sh文件,先导入cmd.sh和path.sh,并设置H和n。所以能够单独跑的脚本如下:
#!/bin/bash
. ./cmd.sh ## You'll want to change cmd.sh to something that will work on your system.
## This relates to the queue.
. ./path.sh
H=`pwd` #exp home
n=11 #parallel jobs
mfccdir=mfcc
for x in data/alignme
do
steps/make_mfcc.sh --cmd "$train_cmd" --nj 16 $x exp/make_mfcc/$x $mfccdir
utils/fix_data_dir.sh data/alignme
steps/compute_cmvn_stats.sh $x exp/make_mfcc/$x $mfccdir
utils/fix_data_dir.sh data/alignme
done
3. 对齐
使用如下脚本对齐:
#!/bin/bash
. ./cmd.sh ## You'll want to change cmd.sh to something that will work on your system.
## This relates to the queue.
. ./path.sh
H=`pwd` #exp home
n=11 #parallel jobs
steps/align_si.sh --cmd "$train_cmd" data/alignme data/lang exp/tri1 exp/tri1_alignme || exit 1
data/alignme是我们要对齐的音频数据,data/lang为语言模型,exp/tri1为之前训练的声学模型,exp/tri1_alignme为存放对齐结果的目录。
4. 从对齐结果文件中获取CTM格式的输出
CTM格式:
utt_id channel_num start_time end_time phone_id
使用如下脚本获取CTM格式输出:
cd mycorpus
for i in exp/tri4a_alignme/ali.*.gz;
do src/bin/ali-to-phones --ctm-output exp/tri4a/final.mdl ark:"gunzip -c $i|" -> ${i%.gz}.ctm;
done;
注意这里的声学模型要和前面做对齐时使用的声学模型一样。
5. 连接CTM文件
cd mycorpus/exp/tri4a_alignme
cat *.ctm > merged_alignment.txt
参照phones.txt就可以看到每个phone出现的开始时间和结束时间了。
近期评论