Ubuntu Manpage: sphinx_fe - Convert audio files to acoustic feature files

Provided by: sphinxbase-utils_0.8+5prealpha+1-16_amd64

NAME

       sphinx_fe - Convert audio files to acoustic feature files

SYNOPSIS

       sphinx_fe [ options ]...

DESCRIPTION

       This program converts audio files (in either Microsoft WAV, NIST Sphere, or raw format) to
       acoustic feature files for input to batch-mode speech recognition.   The  resulting  files
       are also useful for various other things.  A list of options follows:

       -alpha Preemphasis parameter

       -argfile
              file  (e.g. feat.params from an acoustic model) to read parameters from.  This will
              override anything set in other command line arguments.

       -blocksize
              Number of samples to read at a time.

       -build_outdirs
              Create missing subdirectories in output directory

       -c     file for batch processing

       -cep2spec
              Input is cepstral files, output is log spectral files

       -di    directory, input file names are relative to this, if defined

       -dither
              Add 1/2-bit noise

       -do    directory, output files are relative to this

       -doublebw
              Use double bandwidth filters (same center freq)

       -ei    extension to be applied to all input files

       -eo    extension to be applied to all output files

       -example
              Shows example of how to use the tool

       -frate Frame rate

       -help  Shows the usage of the tool

       -i     audio input file

       -input_endian
              Endianness of input data, big or little, ignored if NIST or MS Wav

       -lifter
              Length of sin-curve for liftering, or 0 for no liftering.

       -logspec
              Write out logspectral files instead of cepstra

       -lowerf
              Lower edge of filters

       -mach_endian
              Endianness of machine, big or little

       -mswav Defines input format as Microsoft Wav (RIFF)

       -ncep  Number of cep coefficients

       -nchans
              Number of channels of data (interlaced samples assumed)

       -nfft  Size of FFT

       -nfilt Number of filter banks

       -nist  Defines input format as NIST sphere

       -npart Number of parts to run in (supersedes -nskip and -runlen if non-zero)

       -nskip If a control file was specified, the number of utterances to skip at  the  head  of
              the file

       -o     cepstral output file

       -ofmt  Format of output files - one of sphinx, htk, text.

       -part  Index of the part to run (supersedes -nskip and -runlen if non-zero)

       -raw   Defines input format as raw binary data

       -remove_dc
              Remove DC offset from each frame

       -remove_noise
              Remove noise with spectral subtraction in mel-energies

       -remove_silence
              Enables VAD, removes silence frames from processing

       -round_filters
              Round mel filter frequencies to DFT points

       -runlen
              If a control file was specified, the number of utterances to process, or -1 for all

       -samprate
              Sampling rate

       -seed  Seed for random number generator; if less than zero, pick our own

       -smoothspec
              Write out cepstral-smoothed logspectral files

       -spec2cep
              Input is log spectral files, output is cepstral files

       -sph2pipe
              Input is NIST sphere (possibly with Shorten), use sph2pipe to convert

       -transform
              Which type of transform to use to calculate cepstra (legacy, dct, or htk)

       -unit_area
              Normalize mel filters to unit area

       -upperf
              Upper edge of filters

       -vad_postspeech
              Num of silence frames to keep after from speech to silence.

       -vad_prespeech
              Num of speech frames to keep before silence to speech.

       -vad_startspeech
              Num of speech frames to trigger vad from silence to speech.

       -vad_threshold
              Threshold  for  decision between noise and silence frames. Log-ratio between signal
              level and noise level.

       -verbose
              Show input filenames

       -warp_params
              defining the warping function

       -warp_type
              Warping function type (or shape)

       -whichchan
              Channel to process (numbered from 1), or 0 to mix all channels

       -wlen  Hamming window length

       Currently  the  only  kind  of  features  supported  are  MFCCs  (mel-frequency   cepstral
       coefficients).   There  are  numerous  options  which control the properties of the output
       features.  It is VERY important that you document the specific set of flags used to create
       any  given  set  of  feature  files,  since  this information is NOT recorded in the files
       themselves, and  any  mismatch  between  the  parameters  used  to  extract  features  for
       recognition  and  those  used  to  extract features for training will cause recognition to
       fail.

AUTHOR

       Written by numerous people at CMU from 1994 onwards.  This manual page by  David  Huggins-
       Daines <dhuggins@cs.cmu.edu>

COPYRIGHT

       Copyright © 1994-2007 Carnegie Mellon University.  See the file COPYING included with this
       package for more information.

                                            2007-08-27                               SPHINX_FE(1)