lunar (1) pocketsphinx_batch.1.gz
NAME
pocketsphinx_batch - Run speech recognition in batch mode
SYNOPSIS
pocketsphinx_batch -ctl ctlfile -cepdir cepdir -cepext .mfc [ options ]...
DESCRIPTION
Run speech recognition over a list of utterances in batchmode. A list of arguments follows: -adchdr Size of audio file header in bytes (headers are ignored) -adcin Input is raw audio data -agc Automatic gain control for c0 ('max', 'emax', 'noise', or 'none') -agcthresh Initial threshold for automatic gain control -allphone phoneme decoding with phonetic lm -allphone_ci Perform phoneme decoding with phonetic lm and context-independent units only -alpha Preemphasis parameter -argfile file giving extra arguments. -ascale Inverse of acoustic model scale for confidence score calculation -aw Inverse weight applied to acoustic scores. -backtrace Print results and backtraces to log file. -beam Beam width applied to every frame in Viterbi search (smaller values mean wider beam) -bestpath Run bestpath (Dijkstra) search over word lattice (3rd pass) -bestpathlw Language model probability weight for bestpath search -build_outdirs Create missing subdirectories in output directory -cepdir files directory (prefixed to filespecs in control file) -cepext Input files extension (suffixed to filespecs in control file) -ceplen Number of components in the input feature vector -cmn Cepstral mean normalization scheme ('current', 'prior', or 'none') -cmninit Initial values (comma-separated) for cepstral mean when 'prior' is used -compallsen Compute all senone scores in every frame (can be faster when there are many senones) -ctl file listing utterances to be processed -ctlcount No. of utterances to be processed (after skipping -ctloffset entries) -ctlincr Do every Nth line in the control file -ctloffset No. of utterances at the beginning of -ctl file to be skipped -ctm output in CTM file format (may require post-sorting) -debug level for debugging messages -dict pronunciation dictionary (lexicon) input file -dictcase Dictionary is case sensitive (NOTE: case insensitivity applies to ASCII characters only) -dither Add 1/2-bit noise -doublebw Use double bandwidth filters (same center freq) -ds Frame GMM computation downsampling ratio -fdict word pronunciation dictionary input file -feat Feature stream type, depends on the acoustic model -featparams containing feature extraction parameters. -fillprob Filler word transition probability -frate Frame rate -fsg format finite state grammar file -fsgctl file listing FSG file to use for each utterance -fsgdir directory for FSG files -fsgext extension for FSG files (including leading dot) -fsgusealtpron Add alternate pronunciations to FSG -fsgusefiller Insert filler words at each state. -fwdflat Run forward flat-lexicon search over word lattice (2nd pass) -fwdflatbeam Beam width applied to every frame in second-pass flat search -fwdflatefwid Minimum number of end frames for a word to be searched in fwdflat search -fwdflatlw Language model probability weight for flat lexicon (2nd pass) decoding -fwdflatsfwin Window of frames in lattice to search for successor words in fwdflat search -fwdflatwbeam Beam width applied to word exits in second-pass flat search -fwdtree Run forward lexicon-tree search (1st pass) -hmm containing acoustic model files. -hyp output file name -hypseg output with segmentation file name -input_endian Endianness of input data, big or little, ignored if NIST or MS Wav -jsgf grammar file -keyphrase to spot -kws file with keyphrases to spot, one per line -kws_delay Delay to wait for best detection score -kws_plp Phone loop probability for keyword spotting -kws_threshold Threshold for p(hyp)/p(alternatives) ratio -latsize Initial backpointer table size -lda containing transformation matrix to be applied to features (single-stream features only) -ldadim Dimensionality of output of feature transformation (0 to use entire matrix) -lifter Length of sin-curve for liftering, or 0 for no liftering. -lm trigram language model input file -lmctl a set of language model -lmname language model in -lmctl to use by default -lmnamectl file listing LM name to use for each utterance -logbase Base in which all log-likelihoods calculated -logfn to write log messages in -logspec Write out logspectral files instead of cepstra -lowerf Lower edge of filters -lpbeam Beam width applied to last phone in words -lponlybeam Beam width applied to last phone in single-phone words -lw Language model probability weight -maxhmmpf Maximum number of active HMMs to maintain at each frame (or -1 for no pruning) -maxwpf Maximum number of distinct word exits at each frame (or -1 for no pruning) -mdef definition input file -mean gaussian means input file -mfclogdir to log feature files to -min_endfr Nodes ignored in lattice construction if they persist for fewer than N frames -mixw mixture weights input file (uncompressed) -mixwfloor Senone mixture weights floor (applied to data from -mixw file) -mllr transformation to apply to means and variances -mllrctl file listing MLLR transforms to use for each utterance -mllrdir directory for MLLR transforms -mllrext extension for MLLR transforms (including leading dot) -mmap Use memory-mapped I/O (if possible) for model files -nbest Number of N-best hypotheses to write to -nbestdir (0 for no N-best) -nbestdir for writing N-best hypothesis lists -nbestext Extension for N-best hypothesis list files -ncep Number of cep coefficients -nfft Size of FFT -nfilt Number of filter banks -nwpen New word transition penalty -outlatbeam Minimum posterior probability for output lattice nodes -outlatdir for dumping word lattices -outlatext Filename extension for dumping word lattices -outlatfmt Format for dumping word lattices (s3 or htk) -pbeam Beam width applied to phone transitions -pip Phone insertion penalty -pl_beam Beam width applied to phone loop search for lookahead -pl_pbeam Beam width applied to phone loop transitions for lookahead -pl_pip Phone insertion penalty for phone loop -pl_weight Weight for phoneme lookahead penalties -pl_window Phoneme lookahead window size, in frames -rawlogdir to log raw audio files to -remove_dc Remove DC offset from each frame -remove_noise Remove noise with spectral subtraction in mel-energies -remove_silence Enables VAD, removes silence frames from processing -round_filters Round mel filter frequencies to DFT points -samprate Sampling rate -seed Seed for random number generator; if less than zero, pick our own -sendump dump (compressed mixture weights) input file -senin Input is senone score dump files -senlogdir to log senone score files to -senmgau to codebook mapping input file (usually not needed) -silprob Silence word transition probability -smoothspec Write out cepstral-smoothed logspectral files -svspec specification (e.g., 24,0-11/25,12-23/26-38 or 0-12/13-25/26-38) -tmat state transition matrix input file -tmatfloor HMM state transition probability floor (applied to -tmat file) -topn Maximum number of top Gaussians to use in scoring. -topn_beam Beam width used to determine top-N Gaussians (or a list, per-feature) -toprule rule for JSGF (first public rule is default) -transform Which type of transform to use to calculate cepstra (legacy, dct, or htk) -unit_area Normalize mel filters to unit area -upperf Upper edge of filters -uw Unigram weight -vad_postspeech Num of silence frames to keep after from speech to silence. -vad_prespeech Num of speech frames to keep before silence to speech. -vad_startspeech Num of speech frames to trigger vad from silence to speech. -vad_threshold Threshold for decision between noise and silence frames. Log-ratio between signal level and noise level. -var gaussian variances input file -varfloor Mixture gaussian variance floor (applied to data from -var file) -varnorm Variance normalize each utterance (only if CMN == current) -verbose Show input filenames -warp_params defining the warping function -warp_type Warping function type (or shape) -wbeam Beam width applied to word exits -wip Word insertion penalty -wlen Hamming window length To do batchmode recognition, you will need to specify a control file, using -ctl This is a simple text file containing one entry per line. Each entry is the name of an input file relative to the -cepdir directory, and without the filename extension (which is given in the -cepext argument). If you are using acoustic feature files as input (see sphinx_fe(1) for information on how to generate these), you can also specify a subpart of a file, using the following format: FILENAME START-FRAME END-FRAME UTTERANCE-ID
AUTHOR
Written by numerous people at CMU from 1994 onwards. This manual page by David Huggins- Daines <dhuggins@cs.cmu.edu>
COPYRIGHT
Copyright © 1994-2016 Carnegie Mellon University. See the file LICENSE included with this package for more information.
SEE ALSO
pocketsphinx_continuous(1), sphinx_fe(1). 2007-08-27 POCKETSPHINX_BATCH(1)