Ubuntu Manpage: lfit - general purpose evaluation and regression analysis tool

NAME

       lfit - general purpose evaluation and regression analysis tool

SYNOPSIS

       lfit [method of analysis] [options] <input> [-o, --output <output>]

DESCRIPTION

       The  program  `lfit`  is  a  standalone  command line driven tool designed for both interactive and batch
       processed data analysis and regression. In principle, the program may run in  two  modes.  First,  `lfit`
       supports  numerous  regression  analysis  methods that can be used to search for "best fit" parameters of
       model functions in order to model the input data (which  are  read  from  one  or  more  input  files  in
       tabulated  form). Second, `lfit` is capable to read input data and performs various arithmetic operations
       as it is specified by the user. Basically this second mode is used to evaluate the model  functions  with
       the  parameters  presumably  derived  by  the  actual  regression  methods (and in order to complete this
       evaluation, only slight changes are needed in the command line invocation arguments).

OPTIONS

   General options:
       -h, --help
              Gives general summary about the command line options.

       --long-help, --help-long
              Gives a detailed list of command line options.

       --wiki-help, --help-wiki, --mediawiki-help, --help-mediawiki
              Gives a detailed list of command line options in Mediawiki format.

       --version, --version-short, --short-version
              Gives some version information about the program.

       --functions, --list-functions, --function-list
              Lists the available arithmetic operations and built-in functions supported by the program.

       --wiki-functions, --functions-wiki
              Lists the available arithmetic operations and built-in  functions  supported  by  the  program  in
              Mediawiki format.

       --examples
              Prints some very basic examples for the program invocation.

   Common options for regression analysis:
       -v, --variable, --variables <list-of-variables>
              Comma-separated  list  of  regression variables. In case of non-linear regression analysis, all of
              these fit variables are expected to  have  some  initial  values  (specified  as  <name>=<value>),
              otherwise  the  initial  values  are  set  to  be  zero.  Note  that  in  the  case of some of the
              regression/analysis methods, additional parameters should  be  assigned  to  these  fit/regression
              variables. See the section "Regression analysis methods" for additional details.

       -c, --column, --columns <independent>[:<column index>],...
              Comma-separated  list  of  independet  variable  names  as read from the subsequent columns of the
              primary input data file. If the independent variables are not in sequential  order  in  the  input
              file,  the  optional  column indices should be defined for each variable, by separating the column
              index with a colon after the name of the variable. In the case of multiple input  files  and  data
              blocks,  the  user  should  assign  the individual independent variables and the respective column
              names and definitions for each file (see later, Sec. "Multiple data blocks").

       -f, --function <model function>
              Model function of the analysis in a symbolic form. This expression for the model  function  should
              contain  built-in arithmetic operators, built-in functions, user-defined macros (see -x, --define)
              or functions provided by the dynamically loaded external modules (see -d,  --dynamic).  The  model
              function can depend on both the fit/regression variables (see -v, --variables) and the independent
              variables read from the input file (see -c, --columns). In the case of multiple  input  files  and
              data  blocks,  the  user  should  assign  the  respective model functions for each data block (see
              later). Note that  some  of  the  analysis  methods  expects  the  model  function  to  be  either
              differentiable  or linear in the fit/regression variables. See "Regression analysis methods" later
              on about more details.

       -y, --dependent <dependent expression>
              The dependent variable of the regression analysis, in a form of  an  arithmetic  expression.  This
              expression  for  the  dependent variable can depend only on the variables read from the input file
              (see -c, --columns). In the case of multiple input files and data blocks, the user  should  assign
              the respective dependent expressions for each data block (see later).

       -o, --output <output file>
              Name  of  the output file into which the fit results (the values for the fit/regression variables)
              are written.

   Common options for function evaluation:
       -f, --function <function to evaluate>[...]
              List of functions to be evaluated. More expressions can be  specified  by  either  separating  the
              subsequent  expressions  by  a  comma  or by specifying more -f, --function options in the command
              line.

       Note that the two basic modes of `lfit` are distinguished only by the presence or the absence of the  -y,
       --dependent  command  line argument. In other words, there isn't any explicit command line argument which
       specify the mode of `lfit`. If the -y, --dependent command line  argument  is  omitted,  `lfit`  runs  in
       function evaluation mode, otherwise the program runs in regression analysis mode.

       -o, --output <output file>
              Name of the output file in which the results of the function evaluation are written.

   Regression analysis methods:
       -L, --clls, --linear
              The  default mode of `lfit`, the classical linear least squares (CLLS) method. The model functions
              specified after -f, --function are expected to be both differentiable and linear with  respect  to
              the  fit/regression  variables.  Otherwise,  `lfit`  detects the non-differentiable and non-linear
              property of the model function(s)  and  refuses  the  analysis.  In  this  case,  other  types  of
              regression   analysis   methods   can   be   applied   depending   our  needs,  for  instance  the
              Levenberg-Marquardtalgorithm (NLLM, see -N, --nllm) or the downhill  simplex  minimization  (DHSX,
              see -D, --dhsx).

       -N, --nllm, --nonlinear
              This  option  implies a regression involving the nonlinear Levenberg-Marquardt (NLLM) minimization
              algorithm. The model function(s) specified after -f, --function are expected to be  differentiable
              with  respect  to  the  fit/regression variables. Otherwise, `lfit` detects the non-differentiable
              property and refuses the analysis. There some  fine-tune  parameters  of  the  Levenberg-Marquardt
              algorithm,  see  also the secion "Fine-tuning of regression analysis methods" for more details how
              these additional regression parameters can be set. Note that all of the  fit/regression  variables
              should  have a proper initial value, defined in the command line argument -v, --variable (see also
              there).

       -U, --lmnd
              Levenberg-Marquardt minimization with numerical partial  derivatives  (LMND).  Same  as  the  NLLM
              method, with the exception of that the partial derivatives of the model function(s) are calculated
              numerically. Therefore, the model function(s) may contain functions of which  partial  derivatives
              are  not  known  in  an  analytic  form.  The  differences used in the computations of the partial
              derivatives should be declared by the user, see also the command line option -q, --differences.

       -D, --dhsx, --downhill
              This option implies a regression involving the  nonlinear  downhill  simplex  (DHSX)  minimization
              algorithm.  The  user  should  specify  the  proper  inital  values  and  their  uncertainties  as
              <name>=<initial>:<uncertainty>, unless the "fisher" option  is  passed  to  the  -P,  --parameters
              command  line argument (see later in the section "Fine-tuning of regression analysis methods"). In
              the first case, the initial size of the simplex is based on the uncertainties provided by the user
              while  in the second case, the initial simplex is derived from the eigenvalues and eigenvectors of
              the Fisher covariance matrix. Note that the model functions must be differentiable in  the  latter
              case.

       -M, --mcmc
              This  option  implies  the method of Markov Chain Monte-Carlo (MCMC). The model function(s) can be
              arbitrary in the point of differentiability. However, each of the  fit/regression  variables  must
              have  an  initial  assumption for their uncertainties which must be specified via the command line
              argument -v, --variable. The user should specify the proper inital  values  and  uncertainties  of
              these as <name>=<initial>:<uncertainty>. In the actual implementation of `lfit`, each variable has
              an uncorrelated Gaussian a priori distribution with the specified uncertainty. The MCMC  algorithm
              has  some  fine-tune  parameters, see the section "Fine-tuning of regression analysis methods" for
              more details.

       -K, --mchi, --chi2
              With this option one can perform a "brute force" Chi^2 minimization by evaluating the value of the
              merit  function of Chi^2 on a grid of the fit/regression variables. In this case the grid size and
              resolution must be specified in a specific form after the -v, --variable  command  line  argument.
              Namely  each of the fit/regression variables intended to be varied on a grid must have a format of
              <name>=[<min>:<step>:<max>] while the other ones specified as <name>=<value> are kept  fixed.  The
              output  of  this  analysis  will  be  a  series  of  lines  with  N+1 columns, where the values of
              fit/regression variables are followed by the value of the merit function. Note  that  all  of  the
              declared  fit/regression  variables  are written to the output, including the ones which are fixed
              (therefore the output is somewhat redundant).

       -E, --emce
              This option implies the method of "refitting  to  synthetic  data  sets",  or  "error  Monte-Carlo
              estimation" (EMCE). This method must have a primarily assigned minimization algorithm (that can be
              any of the CLLS, NLLM or DHSX methods). First, the program searches the best fit  values  for  the
              fit/regression  variables  involving the assigned primary minimization algorithm and reports these
              best fit variables. Then, additional synthetic data sets are generated around this set of best fit
              variables  and  the minimization is repeated involving the same primary method. The synthetic data
              sets are generated independently for each input data block, taking into account the fit residuals.
              The noise added to the best fit data is generated from the power spectrum of the residuals.

       -X, --xmmc
              This  option implies an improved/extended version of the Markov Chain Monte-Carlo analysis (XMMC).
              The major differences between the classic  MCMC  and  XMMC  methods  are  the  following.  1/  The
              transition  distribution  is derived from the Fisher covariance matrix. 2/ The program performs an
              initial minimization of the merit function involving the method of downhill  simplex.  3/  Various
              sanity checks are performed in order to verify the convergence of the Markov chains (including the
              comparison of the  actual  and  theoretical  transition  probabilities,  the  computation  of  the
              autocorrelation  lengths  of  each  fit/regression  variable  series  and  the  comparison  of the
              statistical and Fisher covariance).

       -A, --fima
              Fisher information matrix analysis  (FIMA).  With  this  analysis  method  one  can  estimate  the
              uncertainties  and  correlations  of  the  fit/regression variables involving the method of Fisher
              matrix analysis. This method does not minimize the merit functions by adjusting the fit/regression
              variables,  instead,  the initial values (specified after the -v, --variables option) are expected
              to be the "best fit" ones.

   Fine-tuning of regression analysis methods:
       -e, --error <error expression>
              Expression for the uncertainties. Note that zero or negative uncertainty  is  equivalent  to  zero
              weight, i.e. input lines with zero or negative errors are discarded from the fit.

       -w, --weight <weight expression>
              Expression  for  the  weights. The weight is simply the reciprocal of the uncertainty. The default
              error/uncertainty (and therefore the weight) is unity. Note that most of  the  analysis/regression
              methods are rather sensitive to the uncertainties since the merit function also depends on these.

       -P, --parameters <regression parameters>
              This  option  is  followed  by  a set of optional fine-tune parameters, that is different for each
              primary regression analysis method:

       default, defaults
              Use the default fine-tune parameters for the given regression method.

       clls, linear
              Use the classic linear least squares method as the primary  minimization  algorithm  of  the  EMCE
              method.  Like  in the case of the CLLS regression analysis (see -L, --clls), the model function(s)
              must be both differentiable and linear with respect to the fit/regression variables.

       nllm, nonlinear
              Use  the  non-linear  Levenberg-Marquardt  minimization  algorithm  as  the  primary  minimization
              algorithm  of  the EMCE method. Like in the case of the NLLM regression analysis (see -N, --nllm),
              the model function(s) must be differentiable with respect to the fit/regression variables.

       lmnd   Use  the  non-linear  Levenberg-Marquardt  minimization  algorithm  as  the  primary  minimization
              algorithm  of  the  EMCE  method. Like in the case of -U, --lmnd regression method, the parametric
              derivatives of the model function(s) are calculated by a numerical  approximation  (see  also  -U,
              --lmnd and -q, --differences for additional details).

       dhsx, downhill
              Use  the  downhill  simplex  (DHSX) minimization as the primary minimization algorithm of the EMCE
              method. Unless the additional 'fisher' option is specified directly, like in the default  case  of
              the  DHSX  regression  method,  the  user  should  specify the uncertainties of the fit/regression
              variables that are used as an initial size of the simplex.

       mc, montecarlo
              Use a primitive Monte-Carlo diffusion minimization technique as the primary minimization algorithm
              of  the  EMCE  method.  The  user should specify the uncertainties of the fit/regression variables
              which are then used to generate the Monte-Carlo transitions. This primary  minimization  technique
              is rather nasty (very slow), so its usage is not recommended.

       fisher In  the  case  of  the  DHSX  regression method or in the case of the EMCE method when the primary
              minimization is the downhill simplex algorithm, the initial size of the simplex  is  derived  from
              the  Fisher  covariance  approximation evaluated at the point represented by the initial values of
              the fit/regression variables. Since the derivation of the Fisher covariance requires the knowledge
              of  the partial derivatives of the model function(s) with respect to the fit/regression variables,
              the(se) model function(s) must be differentiable. On the other hand,  the  user  do  not  have  to
              specify  the  initial  uncertainties  after  the  -v, --variables option since these uncertainties
              derived automatically from the Fisher covariance.

       skip   In the case of EMCE and XMMC method, the initial minimization is skipped.

       lambda=<value>
              Initial value for the "lambda" parameter of the Levenberg-Marquardt algorithm.

       multiply=<value>
              Value of the "lambda multiplicator" parameter of the Levenberg-Marquardt algorithm.

       iterations=<max.iterations>
              Number of iterations during the Levenberg-Marquardt algorithm.

       accepted
              Count the accepted transitions in the MCMC and XMMC methods (default).

       nonaccepted
              Count the total (accepted plus non-accepted) transitions in the MCMC and XMMC methods.

       gibbs  Use the Gibbs sampler in the MCMC method.

       adaptive
              Use the adaptive XMMC algorithm (i.e. the Fisher covariance is  re-computed  after  each  accepted
              transition).

       window=<window size>
              Window   size   for   calculating  the  autocorrelation  lengths  for  the  Markov  chains  (these
              autocorrelation lengths are reported only in the case of XMMC method). The default  value  is  20,
              which  is fine in the most cases since the typical autocorrelation lengths are between 1 and 2 for
              nice convergent chains.

       -q, --difference <variablename>=<difference>[,...]
              The analysis method of LMND (Levenberg-Marquardt minimization using numerical derivatives, see -U,
              --lmnd)  requires the differences that are used during the computations of the partial derivatives
              of the model function(s). With this option, one can specify these differences.

       -k, --separate <variablename>[,...]
              In the case of non-linear regression methods (for  instance,  DHSX  or  XMMC)  the  fit/regression
              variables  in  which  the  model functions are linear can be separated from the nonlinear part and
              therefore make the minimization process more robust and reliable. Since the set  of  variables  in
              which  the  model  functions  are  linear  is  ambiguous,  the user should explicitly specify this
              supposedly  linear  subset  of  regression  variables.   (For   instance,   the   model   function
              "a*b*x+a*cos(x)+b*sin(x)+c*x^2"  is linear in both "(a,c)" and "(b,c)" parameter vectors but it is
              non-linear in "(a,b,c)".) The program checks whether the specified subset of regression  variables
              is  a  linear  subset  and  reports  a  warning  if  not. Note that the subset of separated linear
              variables (defined here) and the  subset  of  the  fit/regression  variables  affected  by  linear
              constraints (see also section "Constraints") must be disjoint.

       --perturbations <noise level>, --perturbations <key>=<noise level>[,...]
              Additional  white  noise  to  be added to each EMCE synthetic data sets. Each data block (referred
              here by the approprate data block  keys,  see  also  section  "Multiple  data  blocks")  may  have
              different  white  noise  levels.  If  there  is only one data block, this command line argument is
              followed only by a single number specifying the white noise level.

   Additional parameters for Monte-Carlo analysis:
       -s, --seed <random seed>
              Seed for the random number generator. By default this seed is  0,  thus  all  of  the  Monte-Carlo
              regression  analyses  (EMCE,  MCMC,  XMMC and the optional generator for the FIMA method) generate
              reproducible parameter distributions. A positive value after this option yields alternative random
              seeds while all negative values result in an automatic random seed (derived from various available
              sources,  such  as  /dev/[u]random,  system  time,  hardware  MAC  address  and   so),   therefore
              distributions generated involving this kind of automatic random seed are not reproducible.

       -i, --[mcmc,emce,xmmc,fima]-iterations <iterations>
              The  actual  number  of Monte-Carlo iterations for the MCMC, EMCE, XMMC methods. Additionally, the
              FIMA method is capable to generate a mock Gaussian distribution of the  parameter  with  the  same
              covariance  as  derived  by the Fisher analysis. The number of points in this mock distribution is
              also specified by this command line option.

   Clipping outlier data points:
       -r, --sigma, --rejection-level <level>
              Rejection level in the units of standard deviations.

       -n, --iterations <number of iterations>
              Maximum number of iterations in the outlier clipping cycles. The actual number of  outlier  points
              can be traced by increasing the verbosity of the program (see -V, --verbose).

       --[no-]weighted-sigma
              During  the  derivation of the standard deviation, the contribution of the data points data points
              can be weighted by the respective weights/error bars (see also -w, --weight or -e, --error in  the
              section  "Fine-tuning of regression analysis methods"). If no weights/error bars are associated to
              the data points (i.e. both -w, --weight or -e, --error options are omitted), this option will have
              no practical effect.

       Note  that  in  the actual version of `lfit`, only the CLLS, NLLM and LMND regression methods support the
       above discussed way of outlier clipping.

   Multiple data blocks:
       -i<key> <input file name>
              Input file name for the data block named as <key>.

       -c<key> <independent>[:<column index>],...
              Column definitions (see also -c, --columns) for the given data block named as <key>.

       -f<key> <model function>
              Expression for the model function assigned to the data block named as <key>.

       -y<key> <dependent expression>
              Expression of the dependent variable for the data block named as <key>.

       -e<key> <errors>
              Expression of the uncertainties for the data block named as <key>.

       -w<key> <weights>
              Expression of the weights for the data block named as <key>. Note that like in  the  case  of  -e,
              --errors and -w, --weights, only one of the -e<key>, -w<key> arguments should be specified.

   Constraints:
       -t, --constraint, --constraints <expression>{=<>}<expression>[,...]
              List  of  fit  and  domain  constraints  between  the  regression  variables.  Each fit constraint
              expression must be linear in the fit/regression variables. The program checks the linearity of the
              fit  constraints  and  reports  an  error  if  any  of  the  constraints are non-linear.  A domain
              constraint can be any expression involving arbitrary binary arithmetic relation  (such  as  strict
              greater  than: '>', strict less than: '<', greater or equal to: '>=' and less or requal to: '<=').
              Constraints can be specified either by a comma-separated list after a single command line argument
              of -t, --constraints or by multiple of these command line arguments.

       -v, --variable <name>:=<value>
              Another  form of specifying constraints. The variable specifications after -v, --variable can also
              be used to define constraints by writing ":=" instead of "=" between the variable name and initial
              value. Thus, -v <name>:=<value> is equivalent to -v <name>=<value> -t <name>=<value>.

   User-defined functions:
       -x, --define, --macro <name>(<parameters>)=<definition expression>
              With  this option, the user can define additional functions (also called macros) on the top of the
              built-in functions and operators, dynamically loadaded functions and  previously  defined  macros.
              Note  that  each  such user-defined function must be stand-alone, i.e. external variables (such as
              fit/regression variables and independent variables) cannot be part of the  definition  expression,
              only the parameters of these functions.

   Dynamically loaded extensions and functions:
       -d, --dynamic <library>:<array>[,...]
              Load  the  dynamically  linked  library  (shared  object)  named  <library>  and import the global
              `lfit`-compatible set of functions defined in the arrays specified after the name of the  library.
              The  arrays must have to be declared with the type of 'lfitfunction', as it is defined in the file
              "lfit.h". Each record in this array contains information about a certain imported function, namely
              the  actual  name of this function, flags specifying whether the function is differentiable and/or
              linear in its regression parameters, the number of regression variables and independent  variables
              and  the  actual  C  subroutine  that  implements the evaulation of the function (and the optional
              computation of the partial derivatives). The module 'linear.c' and 'linear.so' provides  a  simple
              example that implements the "line(a,b,x)=a*x+b" function. This example function has two regression
              variables ("a" and "b") and one independent variable ("x") and the function itself  is  linear  in
              the regression variables.

   More on outputs:
       -z, --columns-output <column indices>
              Column  indices  where  the results are written in evaluation mode. If this option is omitted, the
              results of the function evaluation are written sequentally. Otherwise, the input file  is  written
              to  the output and the appropriate columns (specified here) are replaced by the respective results
              of the function evaluation. Thus, although the default column order  is  sequential,  there  is  a
              significant  difference  between  omitting this option and specifying "-z 1,2,...,N". In the first
              case, the output file contains only the results of the function evaluations, while in  the  latter
              case, the first N columns of the original file are replaced with the results.

       --errors, --error-line, --error-columns
              Print the uncertainties of the fit/regression variables.

       -F, --format <variable name>=<format>[,...]
              Format  of the output in printf-style for each fit/regression variable(see printf(3)). The default
              format is %12.6g (6 signifiant figures).

       -F, --format <format>[,...]
              Format of the output in evaluation mode. The default format is %12.6g (6 signifiant figures).

       -C, --correlation-format <format>
              Format of the correlation matrix elements. The default format is %6.3f (3 significant figures).

       -g, --derived-variable[s] <variable name>=<expression>[,...]
              Some of the regression  and  analysis  methods  are  capable  to  compute  the  uncertainties  and
              correlations  for  derived  regression variables. These additional (and therefore not independent)
              variables can be defined with this command line option. In the definition  expression  one  should
              use  only  the fit/regression variables (as defined by the -v, --variables command line argument).
              The output format of these variables can also be  specified  by  the  -F,  --format  command  line
              argument.

       -u, --output-fitted <filename>
              Neme  of  an output file into which those lines of the input are written that were involved in the
              final regression. This option is useful in the case of outlier clipping in order to see  what  was
              the  actual  subset  of input data that was used in the fit (see also the -n, --iterations and -r,
              --sigma options).

       -j, --output-rejected <filename>
              Neme of an output file into which those lines of the input are written that were rejected from the
              final  regression.  This option is useful in the case of outlier clipping in order to see what was
              the actual subset of input data where the dependent variable represented outlier points (see  also
              the -n, --iterations and -r, --sigma options).

       -a, --output-all <filename>
              File  containing  the  lines  of  the  input  file  that  were involved in the complete regression
              analysis. This file is simply the original file, only the commented and empty lines are omitted.

       -p, --output-expression <filename>
              In this file the model function is written in which the fit/regression variables are  replaced  by
              their best-fit values.

       -l, --output-variables <filename>
              List  of the names and values of the fit/regression variables in the same format as used after the
              -v, --variables command line argument. The content  of  this  file  can  therefore  be  passed  to
              subsequent invocations of `lfit`.

       --delta
              Write  the  individual  differences  between  the independent variables and the evaluated best fit
              model function values for each line in the output files specified by the -u, --output-fitted,  -j,
              --output-rejected and -a, --output-all command line options.

       --delta-comment
              Same  as  --delta, but the differences are written as a comment (i.e. separated by a '##' from the
              original input lines).

       --residual
              Write the final fit residual to the output file (after the list of the  best-fit  values  for  the
              fit/regression variables).

REPORTING BUGS

       Report bugs to <apal@szofi.net>, see also https://fitsh.net/.

COPYRIGHT

       Copyright © 1996, 2002, 2004-2008, 2009-2020; Pal, Andras <apal@szofi.net>