lunar (1) julius.1.gz

Provided by: julius_4.2.2-0ubuntu3_amd64 bug

NAME

           julius
          - open source multi-purpose LVCSR engine

SYNOPSIS

       julius [-C jconffile] [options...]

DESCRIPTION

       julius is a high-performance, multi-purpose, open-source speech recognition engine for
       researchers and developers. It is capable of performing almost real-time recognition of
       continuous speech with over 60k-word 3-gram language model and triphone HMM model, on most
       current PCs.  julius can perform recognition on audio files, live microphone input,
       network input and feature parameter files.

       The core recognition module is implemented as C library called "JuliusLib". It can also be
       extended by plug-in facility.

   Supported Models
       julius needs a language model and an acoustic model to run as a speech recognizer.  julius
       supports the following models.

       Acoustic model
              Sub-word HMM (Hidden Markov Model) in HTK ascii format are supported. Phoneme
              models (monophone), context dependent phoneme models (triphone), tied-mixture and
              phonetic tied-mixture models of any unit can be used. When using context dependent
              models, inter-word context dependency is also handled. Multi-stream feature and
              MSD-HMM is also supported. You can further use a tool mkbinhmm to convert the ascii
              HMM file to a compact binary format for faster loading.

              Note that julius itself can only extract MFCC features from speech data. If you use
              acoustic HMM trained for other feature, you should give the input in HTK parameter
              file of the same feature type.

       Language model: word N-gram
              Word N-gram language model, up to 10-gram, is supported. Julius uses different
              N-gram for each pass: left-to-right 2-gram on 1st pass, and right-to-left N-gram on
              2nd pass. It is recommended to use both LR 2-gram and RL N-gram for Julius.
              However, you can use only single LR N-gram or RL N-gram. In such case, approximated
              LR 2-gram computed from the given N-gram will be applied at the first pass.

              The Standard ARPA format is supported. In addition, a binary format is also
              supported for efficiency. The tool mkbingram(1) can convert ARPA format N-gram to
              binary format.

       Language model: grammar
              The grammar format is an original one, and tools to create a recognirion grammar
              are included in the distribution. A grammar consists of two files: one is a
              'grammar' file that describes sentence structures in a BNF style, using word
              'category' name as terminate symbols. Another is a 'voca' file that defines words
              with its pronunciations (i.e. phoneme sequences) for each category. They should be
              converted by mkdfa(1) to a deterministic finite automaton file (.dfa) and a
              dictionary file (.dict), respectively. You can also use multiple grammars.

       Language model: isolated word
              You can perform isolated word recognition using only word dictionary. With this
              model type, Julius will perform rapid one pass recognition with static context
              handling. Silence models will be added at both head and tail of each word. You can
              also use multiple dictionaries in a process.

   Search Algorithm
       Recognition algorithm of julius is based on a two-pass strategy. Word 2-gram and reverse
       word 3-gram is used on the respective passes. The entire input is processed on the first
       pass, and again the final searching process is performed again for the input, using the
       result of the first pass to narrow the search space. Specifically, the recognition
       algorithm is based on a tree-trellis heuristic search combined with left-to-right
       frame-synchronous beam search and right-to-left stack decoding search.

       When using context dependent phones (triphones), interword contexts are taken into
       consideration. For tied-mixture and phonetic tied-mixture models, high-speed acoustic
       likelihood calculation is possible using gaussian pruning.

       For more details, see the related documents.

OPTIONS

       These options specify the models, system behaviors and various search parameters to
       Julius. These option can be set at the command line, but it is recommended that you write
       them in a text file as a "jconf file", and specify it by "-C" option.

       Applications incorporating JuliusLib also use these options to set the parameters of core
       recognition engine. For example, a jconf file can be loaded to the enine by calling
       j_config_load_file_new() with the jconf file name as argument.

       Please note that relative paths in a jconf file should be relative to the jconf file
       itself, not the current working directory.

       Below are the details of all options, gathered by group.

   Julius application option
       These are application options of Julius, outside of JuliusLib. It contains parameters and
       switches for result output, character set conversion, log level, and module mode options.
       These option are specific to Julius, and cannot be used at applications using JuliusLib
       other than Julius.

        -outfile
          On file input, this option write the recognition result of each file to a separate
          file. The output file of an input file will be the same name but the suffix will be
          changed to ".out". (rev.4.0)

        -separatescore
          Output the language and acoustic scores separately.

        -callbackdebug
          Print the callback names at each call for debug. (rev.4.0)

        -charconv  from to
          Print with character set conversion.  from is the source character set used in the
          language model, and to is the target character set you want to get.

          On Linux, the arguments should be a code name. You can obtain the list of available
          code names by invoking the command "iconv --list". On Windows, the arguments should be
          a code name or codepage number. Code name should be one of "ansi", "mac", "oem",
          "utf-7", "utf-8", "sjis", "euc". Or you can specify any codepage number supported at
          your environment.

        -nocharconv
          Disable character conversion.

        -module  [port]
          Run Julius on "Server Module Mode". After startup, Julius waits for tcp/ip connection
          from client. Once connection is established, Julius start communication with the client
          to process incoming commands from the client, or to output recognition results, input
          trigger information and other system status to the client. The default port number is
          10500.

        -record  dir
          Auto-save all input speech data into the specified directory. Each segmented inputs are
          recorded each by one. The file name of the recorded data is generated from system time
          when the input ends, in a style of YYYY.MMDD.HHMMSS.wav. File format is 16bit monoral
          WAV. Invalid for mfcfile input.

          With input rejection by -rejectshort, the rejected input will also be recorded even if
          they are rejected.

        -logfile  file
          Save all log output to a file instead of standard output. (Rev.4.0)

        -nolog
          Disable all log output. (Rev.4.0)

        -help
          Output help message and exit.

   Global options
       These are model-/search-dependent options relating audio input, sound detection, GMM,
       decoding algorithm, plugin facility, and others. Global options should be placed before
       any instance declaration (-AM, -LM, or -SR), or just after "-GLOBAL" option.

       Audio input
               -input  {mic|rawfile|mfcfile|adinnet|stdin|netaudio|alsa|oss|esd}
                 Choose speech input source. Specify 'file' or 'rawfile' for waveform file,
                 'htkparam' or 'mfcfile' for HTK parameter file. On file input, users will be
                 prompted to enter the file name from stdin, or you can use -filelist option to
                 specify list of files to process.

                 ´mic' is to get audio input from a default live microphone device, and 'adinnet'
                 means receiving waveform data via tcpip network from an adinnet client.
                 'netaudio' is from DatLink/NetAudio input, and 'stdin' means data input from
                 standard input.

                 For waveform file input, only WAV (no compression) and RAW (noheader, 16bit, big
                 endian) are supported by default. Other format can be read when compiled with
                 libsnd library. To see what format is actually supported, see the help message
                 using option -help. For stdin input, only WAV and RAW is supported. (default:
                 mfcfile)

                 At Linux, you can choose API at run time by specifying alsa, oss and esd.

               -chunk_size  samples
                 Audio fragment size in number of samples. (default: 1000)

               -filelist  filename
                 (With -input rawfile|mfcfile) perform recognition on all files listed in the
                 file. The file should contain input file per line. Engine will end when all of
                 the files are processed.

               -notypecheck
                 By default, Julius checks the input parameter type whether it matches the AM or
                 not. This option will disable the check and force engine to use the input vector
                 as is.

               -48
                 Record input with 48kHz sampling, and down-sample it to 16kHz on-the-fly. This
                 option is valid for 16kHz model only. The down-sampling routine was ported from
                 sptk. (Rev. 4.0)

               -NA  devicename
                 Host name for DatLink server input (-input netaudio).

               -adport  port_number
                 With -input adinnet, specify adinnet port number to listen. (default: 5530)

               -nostrip
                 Julius by default removes successive zero samples in input speech data. This
                 option inhibits the removal.

               -zmean ,  -nozmean
                 This option enables/disables DC offset removal of input waveform. Offset will be
                 estimated from the whole input. For microphone / network input, zero mean of the
                 first 48000 samples (3 seconds in 16kHz sampling) will be used for the
                 estimation. (default: disabled)

                 This option uses static offset for the channel. See also -zmeansource for
                 frame-wise offset removal.

       Speech detection by level and zero-cross
               -cutsilence ,  -nocutsilence
                 Turn on / off the speech detection by level and zero-cross. Default is on for
                 mic / adinnet input, and off for files.

               -lv  thres
                 Level threshold for speech input detection. Values should be in range from 0 to
                 32767. (default: 2000)

               -zc  thres
                 Zero crossing threshold per second. Only input that goes over the level
                 threshold (-lv) will be counted. (default: 60)

               -headmargin  msec
                 Silence margin at the start of speech segment in milliseconds. (default: 300)

               -tailmargin  msec
                 Silence margin at the end of speech segment in milliseconds. (default: 400)

       Input rejection
              Two simple front-end input rejection methods are implemented, based on input length
              and average power of detected segment. The rejection by average power is
              experimental, and can be enabled by --enable-power-reject on compilation. Valid for
              MFCC feature with power coefficient and real-time input only.

              For GMM-based input rejection see the GMM section below.

               -rejectshort  msec
                 Reject input shorter than specified milliseconds. Search will be terminated and
                 no result will be output.

               -powerthres  thres
                 Reject the inputted segment by its average energy. If the average energy of the
                 last recognized input is below the threshold, Julius will reject the input.
                 (Rev.4.0)

                 This option is valid when --enable-power-reject is specified at compilation
                 time.

       Gaussian mixture model / GMM-VAD
              GMM will be used for input rejection by accumulated score, or for front-end
              GMM-based VAD when --enable-gmm-vad is specified.

              NOTE: You should also set the proper MFCC parameters required for the GMM,
              specifying the acoustic parameters described in AM section -AM_GMM.

              When GMM-based VAD is enabled, the voice activity score will be calculated at each
              frame as front-end processing. The value will be computed as \[ \max_{m \in M_v}
              p(x|m) - \max_{m \in M_n} p(x|m) \] where $M_v$ is a set of voice GMM, and $M_n$ is
              a set of noise GMM whose names should be specified by -gmmreject. The activity
              score will be then averaged for the last N frames, where N is specified by
              -gmmmargin. Julius updates the averaged activity score at each frame, and detect
              speech up-trigger when the value gets higher than a value specified by -gmmup, and
              detecgt down-trigger when it gets lower than a value of -gmmdown.

               -gmm  hmmdefs_file
                 GMM definition file in HTK format. If specified, GMM-based input verification
                 will be performed concurrently with the 1st pass, and you can reject the input
                 according to the result as specified by -gmmreject. The GMM should be defined as
                 one-state HMMs.

               -gmmnum  number
                 Number of Gaussian components to be computed per frame on GMM calculation. Only
                 the N-best Gaussians will be computed for rapid calculation. The default is 10
                 and specifying smaller value will speed up GMM calculation, but too small value
                 (1 or 2) may cause degradation of identification performance.

               -gmmreject  string
                 Comma-separated list of GMM names to be rejected as invalid input. When
                 recognition, the log likelihoods of GMMs accumulated for the entire input will
                 be computed concurrently with the 1st pass. If the GMM name of the maximum score
                 is within this string, the 2nd pass will not be executed and the input will be
                 rejected.

               -gmmmargin  frames
                 (GMM_VAD) Head margin in frames. When a speech trigger detected by GMM,
                 recognition will start from current frame minus this value. (Rev.4.0)

                 This option will be valid only if compiled with --enable-gmm-vad.

               -gmmup  value
                 (GMM_VAD) Up trigger threshold of voice activity score. (Rev.4.1)

                 This option will be valid only if compiled with --enable-gmm-vad.

               -gmmdown  value
                 (GMM_VAD) Down trigger threshold of voice activity score. (Rev.4.1)

                 This option will be valid only if compiled with --enable-gmm-vad.

       Decoding option
              Real-time processing means concurrent processing of MFCC computation 1st pass
              decoding. By default, real-time processing on the pass is on for microphone /
              adinnet / netaudio input, and for others.

               -realtime ,  -norealtime
                 Explicitly switch on / off real-time (pipe-line) processing on the first pass.
                 The default is off for file input, and on for microphone, adinnet and NetAudio
                 input. This option relates to the way CMN and energy normalization is performed:
                 if off, they will be done using average features of whole input. If on, MAP-CMN
                 and energy normalization to do real-time processing.

       Misc. options
               -C  jconffile
                 Load a jconf file at here. The content of the jconffile will be expanded at this
                 point.

               -version
                 Print version information to standard error, and exit.

               -setting
                 Print engine setting information to standard error, and exit.

               -quiet
                 Output less log. For result, only the best word sequence will be printed.

               -debug
                 (For debug) output enormous internal message and debug information to log.

               -check  {wchmm|trellis|triphone}
                 For debug, enter interactive check mode.

               -plugindir  dirlist
                 Specify directory to load plugin. If several direcotries exist, specify them by
                 colon-separated list.

   Instance declaration for multi decoding
       The following arguments will create a new configuration set with default parameters, and
       switch current set to it. Jconf parameters specified after the option will be set into the
       current set.

       To do multi-model decoding, these argument should be specified at the first of each model
       / search instances with different names. Any options before the first instance definition
       will be IGNORED.

       When no instance definition is found (as older version of Julius), all the options are
       assigned to a default instance named _default.

       Please note that decoding with a single LM and multiple AMs is not fully supported. For
       example, you may want to construct the jconf file as following.
       This type of model sharing is not supported yet, since some part of LM processing depends
       on the assigned AM. Instead, you can get the same result by defining the same LMs for each
       AM, like this:

        -AM  name
          Create a new AM configuration set, and switch current to the new one. You should give a
          unique name. (Rev.4.0)

        -LM  name
          Create a new LM configuration set, and switch current to the new one. You should give a
          unique name. (Rev.4.0)

        -SR  name am_name lm_name
          Create a new search configuration set, and switch current to the new one. The specified
          AM and LM will be assigned to it. The am_name and lm_name can be either name or ID
          number. You should give a unique name. (Rev.4.0)

        -AM_GMM
          When using GMM for front-end processing, you can specify GMM-specific acoustic
          parameters after this option. If you does not specify -AM_GMM with GMM, the GMM will
          share the same parameter vector as the last AM. The current AM will be switched to the
          GMM one, so be careful not to confuse with normal AM configurations. (Rev.4.0)

        -GLOBAL
          Start a global section. The global options should be placed before any instance
          declaration, or after this option on multiple model recognition. This can be used
          multiple times. (Rev.4.1)

        -nosectioncheck ,  -sectioncheck
          Disable / enable option location check in multi-model decoding. When enabled, the
          options between instance declaration is treated as "sections" and only the belonging
          option types can be written. For example, when an option -AM is specified, only the AM
          related option can be placed after the option until other declaration is found. Also,
          global options should be placed at top, before any instance declarataion. This is
          enabled by default. (Rev.4.1)

   Language model (-LM)
       This group contains options for model definition of each language model type. When using
       multiple LM, one instance can have only one LM.

       Only one type of LM can be specified for a LM configuration. If you want to use multi
       model, you should define them one as a new LM.

       N-gram
               -d  bingram_file
                 Use binary format N-gram. An ARPA N-gram file can be converted to Julius binary
                 format by mkbingram.

               -nlr  arpa_ngram_file
                 A forward, left-to-right N-gram language model in standard ARPA format. When
                 both a forward N-gram and backward N-gram are specified, Julius uses this
                 forward 2-gram for the 1st pass, and the backward N-gram for the 2nd pass.

                 Since ARPA file often gets huge and requires a lot of time to load, it may be
                 better to convert the ARPA file to Julius binary format by mkbingram. Note that
                 if both forward and backward N-gram is used for recognition, they together will
                 be converted to a single binary.

                 When only a forward N-gram is specified by this option and no backward N-gram
                 specified by -nrl, Julius performs recognition with only the forward N-gram. The
                 1st pass will use the 2-gram entry in the given N-gram, and The 2nd pass will
                 use the given N-gram, with converting forward probabilities to backward
                 probabilities by Bayes rule. (Rev.4.0)

               -nrl  arpa_ngram_file
                 A backward, right-to-left N-gram language model in standard ARPA format. When
                 both a forward N-gram and backward N-gram are specified, Julius uses the forward
                 2-gram for the 1st pass, and this backward N-gram for the 2nd pass.

                 Since ARPA file often gets huge and requires a lot of time to load, it may be
                 better to convert the ARPA file to Julius binary format by mkbingram. Note that
                 if both forward and backward N-gram is used for recognition, they together will
                 be converted to a single binary.

                 When only a backward N-gram is specified by this option and no forward N-gram
                 specified by -nlr, Julius performs recognition with only the backward N-gram.
                 The 1st pass will use the forward 2-gram probability computed from the backward
                 2-gram using Bayes rule. The 2nd pass fully use the given backward N-gram.
                 (Rev.4.0)

               -v  dict_file
                 Word dictionary file.

               -silhead  word_string  -siltail  word_string
                 Silence word defined in the dictionary, for silences at the beginning of
                 sentence and end of sentence. (default: "<s>", "</s>")

               -mapunk  word_string
                 Specify unknown word. Default is "<unk>" or "<UNK>". This will be used to assign
                 word probability on unknown words, i.e. words in dictionary that are not in
                 N-gram vocabulary.

               -iwspword
                 Add a word entry to the dictionary that should correspond to inter-word pauses.
                 This may improve recognition accuracy in some language model that has no
                 explicit inter-word pause modeling. The word entry to be added can be changed by
                 -iwspentry.

               -iwspentry  word_entry_string
                 Specify the word entry that will be added by -iwspword. (default: "<UNK> [sp] sp
                 sp")

               -sepnum  number
                 Number of high frequency words to be isolated from the lexicon tree, to ease
                 approximation error that may be caused by the one-best approximation on 1st
                 pass. (default: 150)

       Grammar
              Multiple grammars can be specified by repeating -gram and -gramlist. Note that this
              is unusual behavior from other options (in normal Julius option, last one will
              override previous ones). You can use -nogram to reset the grammars already
              specified before the point.

               -gram  gramprefix1[,gramprefix2[,gramprefix3,...]]
                 Comma-separated list of grammars to be used. the argument should be a prefix of
                 a grammar, i.e. if you have foo.dfa and foo.dict, you should specify them with a
                 single argument foo. Multiple grammars can be specified at a time as a
                 comma-separated list.

               -gramlist  list_file
                 Specify a grammar list file that contains list of grammars to be used. The list
                 file should contain the prefixes of grammars, each per line. A relative path in
                 the list file will be treated as relative to the file, not the current path or
                 configuration file.

               -dfa  dfa_file  -v  dict_file
                 An old way of specifying grammar files separately. This is bogus, and should not
                 be used any more.

               -nogram
                 Remove the current list of grammars already specified by -gram, -gramlist, -dfa
                 and -v.

       Isolated word
              Dictionary can be specified by using -w and -wlist. When you specify multiple
              times, all of them will be read at startup. You can use -nogram to reset the
              already specified dictionaries at that point.

               -w  dict_file
                 Word dictionary for isolated word recognition. File format is the same as other
                 LM. (Rev.4.0)

               -wlist  list_file
                 Specify a dictionary list file that contains list of dictionaries to be used.
                 The list file should contain the file name of dictionaries, each per line. A
                 relative path in the list file will be treated as relative to the list file, not
                 the current path or configuration file. (Rev.4.0)

               -nogram
                 Remove the current list of dictionaries already specified by -w and -wlist.

               -wsil  head_sil_model_name tail_sil_model_name sil_context_name
                 On isolated word recognition, silence models will be appended to the head and
                 tail of each word at recognition. This option specifies the silence models to be
                 appended.  sil_context_name is the name of the head sil model and tail sil model
                 as a context of word head phone and tail phone. For example, if you specify
                 -wsil silB silE sp, a word with phone sequence b eh t will be translated as silB
                 sp-b+eh b-eh+t eh-t+sp silE. (Rev.4.0)

       User-defined LM
               -userlm
                 Declare to use user LM functions in the program. This option should be specified
                 if you use user-defined LM functions. (Rev.4.0)

       Misc. LM options
               -forcedict
                 Skip error words in dictionary and force running.

   Acoustic model and feature analysis (-AM) (-AM_GMM)
       This section is about options for acoustic model, feature extraction, feature
       normalizations and spectral subtraction.

       After -AM name, an acoustic model and related specification should be written. You can use
       multiple AMs trained with different MFCC types. For GMM, the required parameter condition
       should be specified just as same as AMs after -AM_GMM.

       When using multiple AMs, the values of -smpPeriod, -smpFreq, -fsize and -fshift should be
       the same among all AMs.

       Acoustic HMM
               -h  hmmdef_file
                 Acoustic HMM definition file. It should be in HTK ascii format, or Julius binary
                 format. You can convert HTK ascii format to Julius binary format using mkbinhmm.

               -hlist  hmmlist_file
                 HMMList file for phone mapping. This file provides mapping between logical
                 triphone names generated in the dictionary and the defined HMM names in hmmdefs.
                 This option should be specified for context-dependent model.

               -tmix  number
                 Specify the number of top Gaussians to be calculated in a mixture codebook.
                 Small number will speed up the acoustic computation, but AM accuracy may get
                 worse with too small value. See also -gprune. (default: 2)

               -spmodel  name
                 Specify HMM model name that corresponds to short-pause in an utterance. The
                 short-pause model name will be used in recognition: short-pause skipping on
                 grammar recognition, word-end short-pause model insertion with -iwsp on N-gram,
                 or short-pause segmentation (-spsegment). (default: "sp")

               -multipath
                 Enable multi-path mode. To make decoding faster, Julius by default impose a
                 limit on HMM transitions that each model should have only one transition from
                 initial state and to end state. On multi-path mode, Julius does extra handling
                 on inter-model transition to allows model-skipping transition and multiple
                 output/input transitions. Note that specifying this option will make Julius a
                 bit slower, and the larger beam width may be required.

                 This function was a compilation-time option on Julius 3.x, and now becomes a
                 run-time option. By default (without this option), Julius checks the transition
                 type of specified HMMs, and enable the multi-path mode if required. You can
                 force multi-path mode with this option. (rev.4.0)

               -gprune  {safe|heuristic|beam|none|default}
                 Set Gaussian pruning algorithm to use. For tied-mixture model, Julius performs
                 Gaussian pruning to reduce acoustic computation, by calculating only the top N
                 Gaussians in each codebook at each frame. The default setting will be set
                 according to the model type and engine setting.  default will force accepting
                 the default setting. Set this to none to disable pruning and perform full
                 computation.  safe guarantees the top N Gaussians to be computed.  heuristic and
                 beam do more aggressive computational cost reduction, but may result in small
                 loss of accuracy model (default: safe (standard), beam (fast) for tied mixture
                 model, none for non tied-mixture model).

               -iwcd1  {max|avg|best number}
                 Select method to approximate inter-word triphone on the head and tail of a word
                 in the first pass.

                 max will apply the maximum likelihood of the same context triphones.  avg will
                 apply the average likelihood of the same context triphones.  best number will
                 apply the average of top N-best likelihoods of the same context triphone.

                 Default is best 3 for use with N-gram, and avg for grammar and word. When this
                 AM is shared by LMs of both type, latter one will be chosen.

               -iwsppenalty  float
                 Insertion penalty for word-end short pauses appended by -iwsp.

               -gshmm  hmmdef_file
                 If this option is specified, Julius performs Gaussian Mixture Selection for
                 efficient decoding. The hmmdefs should be a monophone model generated from an
                 ordinary monophone HMM model, using mkgshmm.

               -gsnum  number
                 On GMS, specify number of monophone states to compute corresponding triphones in
                 detail. (default: 24)

       Speech analysis
              Only MFCC feature extraction is supported in current Julius. Thus when recognizing
              a waveform input from file or microphone, AM must be trained by MFCC. The parameter
              condition should also be set as exactly the same as the training condition by the
              options below.

              When you give an input in HTK Parameter file, you can use any parameter type for
              AM. In this case Julius does not care about the type of input feature and AM, just
              read them as vector sequence and match them to the given AM. Julius only checks
              whether the parameter types are the same. If it does not work well, you can disable
              this checking by -notypecheck.

              In Julius, the parameter kind and qualifiers (as TARGETKIND in HTK) and the number
              of cepstral parameters (NUMCEPS) will be set automatically from the content of the
              AM header, so you need not specify them by options.

              Other parameters should be set exactly the same as training condition. You can also
              give a HTK Config file which you used to train AM to Julius by -htkconf. When this
              option is applied, Julius will parse the Config file and set appropriate parameter.

              You can further embed those analysis parameter settings to a binary HMM file using
              mkbinhmm.

              If options specified in several ways, they will be evaluated in the order below.
              The AM embedded parameter will be loaded first if any. Then, the HTK config file
              given by -htkconf will be parsed. If a value already set by AM embedded value, HTK
              config will override them. At last, the direct options will be loaded, which will
              override settings loaded before. Note that, when the same options are specified
              several times, later will override previous, except that -htkconf will be evaluated
              first as described above.

               -smpPeriod  period
                 Sampling period of input speech, in unit of 100 nanoseconds. Sampling rate can
                 also be specified by -smpFreq. Please note that the input frequency should be
                 set equal to the training conditions of AM. (default: 625, corresponds to
                 16,000Hz)

                 This option corresponds to the HTK Option SOURCERATE. The same value can be
                 given to this option.

                 When using multiple AM, this value should be the same among all AMs.

               -smpFreq  Hz
                 Set sampling frequency of input speech in Hz. Sampling rate can also be
                 specified using -smpPeriod. Please note that this frequency should be set equal
                 to the training conditions of AM. (default: 16,000)

                 When using multiple AM, this value should be the same among all AMs.

               -fsize  sample_num
                 Window size in number of samples. (default: 400)

                 This option corresponds to the HTK Option WINDOWSIZE, but value should be in
                 samples (HTK value / smpPeriod).

                 When using multiple AM, this value should be the same among all AMs.

               -fshift  sample_num
                 Frame shift in number of samples. (default: 160)

                 This option corresponds to the HTK Option TARGETRATE, but value should be in
                 samples (HTK value / smpPeriod).

                 When using multiple AM, this value should be the same among all AMs.

               -preemph  float
                 Pre-emphasis coefficient. (default: 0.97)

                 This option corresponds to the HTK Option PREEMCOEF. The same value can be given
                 to this option.

               -fbank  num
                 Number of filterbank channels. (default: 24)

                 This option corresponds to the HTK Option NUMCHANS. The same value can be given
                 to this option. Be aware that the default value not the same as in HTK (22).

               -ceplif  num
                 Cepstral liftering coefficient. (default: 22)

                 This option corresponds to the HTK Option CEPLIFTER. The same value can be given
                 to this option.

               -rawe ,  -norawe
                 Enable/disable using raw energy before pre-emphasis (default: disabled)

                 This option corresponds to the HTK Option RAWENERGY. Be aware that the default
                 value differs from HTK (enabled at HTK, disabled at Julius).

               -enormal ,  -noenormal
                 Enable/disable normalizing log energy. On live input, this normalization will be
                 approximated from the average of last input. (default: disabled)

                 This option corresponds to the HTK Option ENORMALISE. Be aware that the default
                 value differs from HTK (enabled at HTK, disabled at Julius).

               -escale  float_scale
                 Scaling factor of log energy when normalizing log energy. (default: 1.0)

                 This option corresponds to the HTK Option ESCALE. Be aware that the default
                 value differs from HTK (0.1).

               -silfloor  float
                 Energy silence floor in dB when normalizing log energy. (default: 50.0)

                 This option corresponds to the HTK Option SILFLOOR.

               -delwin  frame
                 Delta window size in number of frames. (default: 2)

                 This option corresponds to the HTK Option DELTAWINDOW. The same value can be
                 given to this option.

               -accwin  frame
                 Acceleration window size in number of frames. (default: 2)

                 This option corresponds to the HTK Option ACCWINDOW. The same value can be given
                 to this option.

               -hifreq  Hz
                 Enable band-limiting for MFCC filterbank computation: set upper frequency
                 cut-off. Value of -1 will disable it. (default: -1)

                 This option corresponds to the HTK Option HIFREQ. The same value can be given to
                 this option.

               -lofreq  Hz
                 Enable band-limiting for MFCC filterbank computation: set lower frequency
                 cut-off. Value of -1 will disable it. (default: -1)

                 This option corresponds to the HTK Option LOFREQ. The same value can be given to
                 this option.

               -zmeanframe ,  -nozmeanframe
                 With speech input, this option enables/disables frame-wise DC offset removal.
                 This corresponds to HTK configuration ZMEANSOURCE. This cannot be used together
                 with -zmean. (default: disabled)

               -usepower
                 Use power instead of magnitude on filterbank analysis. (default: disabled)

       Normalization
              Julius can perform cepstral mean normalization (CMN) for inputs. CMN will be
              activated when the given AM was trained with CMN (i.e. has "_Z" qualifier in the
              header).

              The cepstral mean will be estimated in different way according to the input type.
              On file input, the mean will be computed from the whole input. On live input such
              as microphone and network input, the ceptral mean of the input is unknown at the
              start. So MAP-CMN will be used. On MAP-CMN, an initial mean vector will be applied
              at the beginning, and the mean vector will be smeared to the mean of the
              incrementing input vector as input goes. Options below can control the behavior of
              MAP-CMN.

               -cvn
                 Enable cepstral variance normalization. At file input, the variance of whole
                 input will be calculated and then applied. At live microphone input, variance of
                 the last input will be applied. CVN is only supported for an audio input.

               -vtln  alpha lowcut hicut
                 Do frequency warping, typically for a vocal tract length normalization (VTLN).
                 Arguments are warping factor, high frequency cut-off and low freq. cut-off. They
                 correspond to HTK Config values, WARPFREQ, WARPHCUTOFF and WARPLCUTOFF.

               -cmnload  file
                 Load initial cepstral mean vector from file on startup. The file should be one
                 saved by -cmnsave. Loading an initial cepstral mean enables Julius to better
                 recognize the first utterance on a real-time input. When used together with
                 -cmnnoupdate, this initial value will be used for all input.

               -cmnsave  file
                 Save the calculated cepstral mean vector into file. The parameters will be saved
                 at each input end. If the output file already exists, it will be overridden.

               -cmnupdate   -cmnnoupdate
                 Control whether to update the cepstral mean at each input on real-time input.
                 Disabling this and specifying -cmnload will make engine to always use the loaded
                 static initial cepstral mean.

               -cmnmapweight  float
                 Specify the weight of initial cepstral mean for MAP-CMN. Specify larger value to
                 retain the initial cepstral mean for a longer period, and smaller value to make
                 the cepstral mean rely more on the current input. (default: 100.0)

       Front-end processing
              Julius can perform spectral subtraction to reduce some stationary noise from audio
              input. Though it is not a powerful method, but it may work on some situation.
              Julius has two ways to estimate noise spectrum. One way is to assume that the first
              short segment of an speech input is noise segment, and estimate the noise spectrum
              as the average of the segment. Another way is to calculate average spectrum from
              noise-only input using other tool mkss, and load it in Julius. The former one is
              popular for speech file input, and latter should be used in live input. The options
              below will switch / control the behavior.

               -sscalc
                 Perform spectral subtraction using head part of each file as silence part. The
                 head part length should be specified by -sscalclen. Valid only for file input.
                 Conflict with -ssload.

               -sscalclen  msec
                 With -sscalc, specify the length of head silence for noise spectrum estimation
                 in milliseconds. (default: 300)

               -ssload  file
                 Perform spectral subtraction for speech input using pre-estimated noise spectrum
                 loaded from file. The noise spectrum file can be made by mkss. Valid for all
                 speech input. Conflict with -sscalc.

               -ssalpha  float
                 Alpha coefficient of spectral subtraction for -sscalc and -ssload. Noise will be
                 subtracted stronger as this value gets larger, but distortion of the resulting
                 signal also becomes remarkable. (default: 2.0)

               -ssfloor  float
                 Flooring coefficient of spectral subtraction. The spectral power that goes below
                 zero after subtraction will be substituted by the source signal with this
                 coefficient multiplied. (default: 0.5)

       Misc. AM options
               -htkconf  file
                 Parse the given HTK Config file, and set corresponding parameters to Julius.
                 When using this option, the default parameter values are switched from Julius
                 defaults to HTK defaults.

   Recognition process and search (-SR)
       This section contains options for search parameters on the 1st / 2nd pass such as beam
       width and LM weights, configurations for short-pause segmentation, switches for word
       lattice output and confusion network output, forced alignments, and other options relating
       recognition process and result output.

       Default values for beam width and LM weights will change according to compile-time setup
       of JuliusLib , AM model type, and LM size. Please see the startup log for the actual
       values.

       1st pass parameters
               -lmp  weight penalty
                 (N-gram) Language model weights and word insertion penalties for the first pass.

               -penalty1  penalty
                 (Grammar) word insertion penalty for the first pass. (default: 0.0)

               -b  width
                 Beam width in number of HMM nodes for rank beaming on the first pass. This value
                 defines search width on the 1st pass, and has dominant effect on the total
                 processing time. Smaller width will speed up the decoding, but too small value
                 will result in a substantial increase of recognition errors due to search
                 failure. Larger value will make the search stable and will lead to failure-free
                 search, but processing time will grow in proportion to the width.

                 The default value is dependent on acoustic model type: 400 (monophone), 800
                 (triphone), or 1000 (triphone, setup=v2.1)

               -nlimit  num
                 Upper limit of token per node. This option is valid when --enable-wpair and
                 --enable-wpair-nlimit are enabled at compilation time.

               -progout
                 Enable progressive output of the partial results on the first pass.

               -proginterval  msec
                 Set the time interval for -progout in milliseconds. (default: 300)

       2nd pass parameters
               -lmp2  weight penalty
                 (N-gram) Language model weights and word insertion penalties for the second
                 pass.

               -penalty2  penalty
                 (Grammar) word insertion penalty for the second pass. (default: 0.0)

               -b2  width
                 Envelope beam width (number of hypothesis) at the second pass. If the count of
                 word expansion at a certain hypothesis length reaches this limit while search,
                 shorter hypotheses are not expanded further. This prevents search to fall in
                 breadth-first-like situation stacking on the same position, and improve search
                 failure mostly for large vocabulary condition. (default: 30)

               -sb  float
                 Score envelope width for enveloped scoring. When calculating hypothesis score
                 for each generated hypothesis, its trellis expansion and Viterbi operation will
                 be pruned in the middle of the speech if score on a frame goes under the width.
                 Giving small value makes the second pass faster, but computation error may
                 occur. (default: 80.0)

               -s  num
                 Stack size, i.e. the maximum number of hypothesis that can be stored on the
                 stack during the search. A larger value may give more stable results, but
                 increases the amount of memory required. (default: 500)

               -m  count
                 Number of expanded hypotheses required to discontinue the search. If the number
                 of expanded hypotheses is greater then this threshold then, the search is
                 discontinued at that point. The larger this value is, The longer Julius gets to
                 give up search. (default: 2000)

               -n  num
                 The number of candidates Julius tries to find. The search continues till this
                 number of sentence hypotheses have been found. The obtained sentence hypotheses
                 are sorted by score, and final result is displayed in the order (see also the
                 -output). The possibility that the optimum hypothesis is correctly found
                 increases as this value gets increased, but the processing time also becomes
                 longer. The default value depends on the engine setup on compilation time: 10
                 (standard) or 1 (fast or v2.1)

               -output  num
                 The top N sentence hypothesis to be output at the end of search. Use with -n
                 (default: 1)

               -lookuprange  frame
                 Set the number of frames before and after to look up next word hypotheses in the
                 word trellis on the second pass. This prevents the omission of short words, but
                 with a large value, the number of expanded hypotheses increases and system
                 becomes slow. (default: 5)

               -looktrellis
                 (Grammar) Expand only the words survived on the first pass instead of expanding
                 all the words predicted by grammar. This option makes second pass decoding
                 faster especially for large vocabulary condition, but may increase deletion
                 error of short words. (default: disabled)

       Short-pause segmentation / decoder-VAD
              When compiled with --enable-decoder-vad, the short-pause segmentation will be
              extended to support decoder-based VAD.

               -spsegment
                 Enable short-pause segmentation mode. Input will be segmented when a short pause
                 word (word with only silence model in pronunciation) gets the highest likelihood
                 at certain successive frames on the first pass. When detected segment end,
                 Julius stop the 1st pass at the point, perform 2nd pass, and continue with next
                 segment. The word context will be considered among segments. (Rev.4.0)

                 When compiled with --enable-decoder-vad, this option enables decoder-based VAD,
                 to skip long silence.

               -spdur  frame
                 Short pause duration length to detect end of input segment, in number of frames.
                 (default: 10)

               -pausemodels  string
                 A comma-separated list of pause model names to be used at short-pause
                 segmentation. The word whose pronunciation consists of only the pause models
                 will be treated as "pause word" and used for pause detection. If not specified,
                 name of -spmodel, -silhead and -siltail will be used. (Rev.4.0)

               -spmargin  frame
                 Back step margin at trigger up for decoder-based VAD. When speech up-trigger
                 found by decoder-VAD, Julius will rewind the input parameter by this value, and
                 start recognition at the point. (Rev.4.0)

                 This option will be valid only if compiled with --enable-decoder-vad.

               -spdelay  frame
                 Trigger decision delay frame at trigger up for decoder-based VAD. (Rev.4.0)

                 This option will be valid only if compiled with --enable-decoder-vad.

       Word lattice / confusion network output
               -lattice ,  -nolattice
                 Enable / disable generation of word graph. Search algorithm also has changed to
                 optimize for better word graph generation, so the sentence result may not be the
                 same as normal N-best recognition. (Rev.4.0)

               -confnet ,  -noconfnet
                 Enable / disable generation of confusion network. Enabling this will also
                 activates -lattice internally. (Rev.4.0)

               -graphrange  frame
                 Merge same words at neighbor position at graph generation. If the beginning time
                 and ending time of two word candidates of the same word is within the specified
                 range, they will be merged. The default is 0 (allow merging same words on
                 exactly the same location) and specifying larger value will result in smaller
                 graph output. Setting this value to -1 will disable merging, in that case same
                 words on the same location of different scores will be left as they are.
                 (default: 0)

               -graphcut  depth
                 Cut the resulting graph by its word depth at post-processing stage. The depth
                 value is the number of words to be allowed at a frame. Setting to -1 disables
                 this feature. (default: 80)

               -graphboundloop  count
                 Limit the number of boundary adjustment loop at post-processing stage. This
                 parameter prevents Julius from blocking by infinite adjustment loop by short
                 word oscillation. (default: 20)

               -graphsearchdelay ,  -nographsearchdelay
                 When this option is enabled, Julius modifies its graph generation algorithm on
                 the 2nd pass not to terminate search by graph merging, until the first sentence
                 candidate is found. This option may improve graph accuracy, especially when you
                 are going to generate a huge word graph by setting broad search. Namely, it may
                 result in better graph accuracy when you set wide beams on both 1st pass -b and
                 2nd pass -b2, and large number for -n. (default: disabled)

       Multi-gram / multi-dic recognition
               -multigramout ,  -nomultigramout
                 On grammar recognition using multiple grammars, Julius will output only the best
                 result among all grammars. Enabling this option will make Julius to output
                 result for each grammar. (default: disabled)

       Forced alignment
               -walign
                 Do viterbi alignment per word units for the recognition result. The word
                 boundary frames and the average acoustic scores per frame will be calculated.

               -palign
                 Do viterbi alignment per phone units for the recognition result. The phone
                 boundary frames and the average acoustic scores per frame will be calculated.

               -salign
                 Do viterbi alignment per state for the recognition result. The state boundary
                 frames and the average acoustic scores per frame will be calculated.

       Misc. search options
               -inactive
                 Start this recognition process instance with inactive state. (Rev.4.0)

               -1pass
                 Perform only the first pass.

               -fallback1pass
                 When 2nd pass fails, Julius finish the recognition with no result. This option
                 tell Julius to output the 1st pass result as a final result when the 2nd pass
                 fails. Note that some score output (confidence etc.) may not be useful. This was
                 the default behavior of Julius-3.x.

               -no_ccd ,  -force_ccd
                 Explicitly switch phone context handling at search. Normally Julius determines
                 whether the using AM is a context-dependent model or not from the model names,
                 i.e., whether the names contain character + and -. This option will override the
                 automatic detection.

               -cmalpha  float
                 Smoothing parameter for confidence scoring. (default: 0.05)

               -iwsp
                 (Multi-path mode only) Enable inter-word context-free short pause insertion.
                 This option appends a skippable short pause model for every word end. The
                 short-pause model can be specified by -spmodel.

               -transp  float
                 Additional insertion penalty for transparent words. (default: 0.0)

               -demo
                 Equivalent to -progout -quiet.

ENVIRONMENT VARIABLES

        ALSADEV
          (using mic input with alsa device) specify a capture device name. If not specified,
          "default" will be used.

        AUDIODEV
          (using mic input with oss device) specify a capture device path. If not specified,
          "/dev/dsp" will be used.

        LATENCY_MSEC
          Try to set input latency of microphone input in milliseconds. Smaller value will
          shorten latency but sometimes make process unstable. Default value will depend on the
          running OS.

EXAMPLES

       For examples of system usage, refer to the tutorial section in the Julius documents.

NOTICE

       Note about jconf files: relative paths in a jconf file are interpreted as relative to the
       jconf file itself, not to the current directory.

SEE ALSO

       julian(1), jcontrol(1), adinrec(1), adintool(1), mkbingram(1), mkbinhmm(1), mkgsmm(1),
       wav2mfcc(1), mkss(1)

       http://julius.sourceforge.jp/en/

DIAGNOSTICS

       Julius normally will return the exit status 0. If an error occurs, Julius exits abnormally
       with exit status 1. If an input file cannot be found or cannot be loaded for some reason
       then Julius will skip processing for that file.

BUGS

       There are some restrictions to the type and size of the models Julius can use. For a
       detailed explanation refer to the Julius documentation. For bug-reports, inquires and
       comments please contact julius-info at lists.sourceforge.jp.

       Copyright (c) 1991-2008 Kawahara Lab., Kyoto University

       Copyright (c) 1997-2000 Information-technology Promotion Agency, Japan

       Copyright (c) 2000-2008 Shikano Lab., Nara Institute of Science and Technology

       Copyright (c) 2005-2008 Julius project team, Nagoya Institute of Technology

AUTHORS

       Rev.1.0 (1998/02/20)
          Designed by Tatsuya KAWAHARA and Akinobu LEE (Kyoto University)

          Development by Akinobu LEE (Kyoto University)

       Rev.1.1 (1998/04/14), Rev.1.2 (1998/10/31), Rev.2.0 (1999/02/20), Rev.2.1 (1999/04/20),
       Rev.2.2 (1999/10/04), Rev.3.0 (2000/02/14), Rev.3.1 (2000/05/11)
          Development of above versions by Akinobu LEE (Kyoto University)

       Rev.3.2 (2001/08/15), Rev.3.3 (2002/09/11), Rev.3.4 (2003/10/01), Rev.3.4.1 (2004/02/25),
       Rev.3.4.2 (2004/04/30)
          Development of above versions by Akinobu LEE (Nara Institute of Science and Technology)

       Rev.3.5 (2005/11/11), Rev.3.5.1 (2006/03/31), Rev.3.5.2 (2006/07/31), Rev.3.5.3
       (2006/12/29), Rev.4.0 (2007/12/19), Rev.4.1 (2008/10/03)
          Development of above versions by Akinobu LEE (Nagoya Institute of Technology)

THANKS TO

       From rev.3.2, Julius is released by the "Information Processing Society, Continuous Speech
       Consortium".

       The Windows DLL version was developed and released by Hideki BANNO (Nagoya University).

       The Windows Microsoft Speech API compatible version was developed by Takashi SUMIYOSHI
       (Kyoto University).

                                            02/11/2009                                  JULIUS(1)