lunar (1) lfit.1.gz

Provided by: fitsh_0.9.4-1_amd64 bug

NAME

       lfit - general purpose evaluation and regression analysis tool

SYNOPSIS

       lfit [method of analysis] [options] <input> [-o, --output <output>]

DESCRIPTION

       The  program `lfit` is a standalone command line driven tool designed for both interactive
       and batch processed data analysis and regression. In principle, the program may run in two
       modes.  First,  `lfit`  supports  numerous regression analysis methods that can be used to
       search for "best fit" parameters of model functions in  order  to  model  the  input  data
       (which are read from one or more input files in tabulated form). Second, `lfit` is capable
       to read input data and performs various arithmetic operations as it is  specified  by  the
       user.  Basically  this  second  mode  is  used  to  evaluate  the model functions with the
       parameters presumably derived by the actual regression methods (and in order  to  complete
       this evaluation, only slight changes are needed in the command line invocation arguments).

OPTIONS

   General options:
       -h, --help
              Gives general summary about the command line options.

       --long-help, --help-long
              Gives a detailed list of command line options.

       --wiki-help, --help-wiki, --mediawiki-help, --help-mediawiki
              Gives a detailed list of command line options in Mediawiki format.

       --version, --version-short, --short-version
              Gives some version information about the program.

       --functions, --list-functions, --function-list
              Lists  the  available arithmetic operations and built-in functions supported by the
              program.

       --wiki-functions, --functions-wiki
              Lists the available arithmetic operations and built-in functions supported  by  the
              program in Mediawiki format.

       --examples
              Prints some very basic examples for the program invocation.

   Common options for regression analysis:
       -v, --variable, --variables <list-of-variables>
              Comma-separated  list  of  regression  variables.  In case of non-linear regression
              analysis, all of these fit variables are  expected  to  have  some  initial  values
              (specified  as  <name>=<value>),  otherwise  the initial values are set to be zero.
              Note that in the case  of  some  of  the  regression/analysis  methods,  additional
              parameters  should  be  assigned to these fit/regression variables. See the section
              "Regression analysis methods" for additional details.

       -c, --column, --columns <independent>[:<column index>],...
              Comma-separated list of independet variable  names  as  read  from  the  subsequent
              columns  of  the  primary  input data file. If the independent variables are not in
              sequential order in the input file, the optional column indices should  be  defined
              for  each  variable,  by separating the column index with a colon after the name of
              the variable. In the case of multiple input files and data blocks, the user  should
              assign  the  individual  independent  variables and the respective column names and
              definitions for each file (see later, Sec. "Multiple data blocks").

       -f, --function <model function>
              Model function of the analysis in a symbolic form. This expression  for  the  model
              function   should   contain  built-in  arithmetic  operators,  built-in  functions,
              user-defined macros (see -x, --define) or functions  provided  by  the  dynamically
              loaded  external modules (see -d, --dynamic). The model function can depend on both
              the fit/regression variables (see -v, --variables) and  the  independent  variables
              read  from  the input file (see -c, --columns). In the case of multiple input files
              and data blocks, the user should assign the respective  model  functions  for  each
              data  block  (see  later). Note that some of the analysis methods expects the model
              function to be either differentiable or linear in the fit/regression variables. See
              "Regression analysis methods" later on about more details.

       -y, --dependent <dependent expression>
              The  dependent  variable  of  the  regression  analysis, in a form of an arithmetic
              expression. This expression for the dependent  variable  can  depend  only  on  the
              variables  read  from  the  input file (see -c, --columns). In the case of multiple
              input files and data blocks,  the  user  should  assign  the  respective  dependent
              expressions for each data block (see later).

       -o, --output <output file>
              Name  of  the  output  file  into  which  the  fit  results  (the  values  for  the
              fit/regression variables) are written.

   Common options for function evaluation:
       -f, --function <function to evaluate>[...]
              List of functions to be evaluated. More expressions  can  be  specified  by  either
              separating  the  subsequent  expressions  by  a  comma  or  by  specifying more -f,
              --function options in the command line.

       Note that the two basic modes of `lfit` are distinguished only  by  the  presence  or  the
       absence  of  the  -y,  --dependent  command line argument. In other words, there isn't any
       explicit command line argument which specify the mode of `lfit`. If  the  -y,  --dependent
       command  line  argument is omitted, `lfit` runs in function evaluation mode, otherwise the
       program runs in regression analysis mode.

       -o, --output <output file>
              Name of the output file in  which  the  results  of  the  function  evaluation  are
              written.

   Regression analysis methods:
       -L, --clls, --linear
              The  default  mode of `lfit`, the classical linear least squares (CLLS) method. The
              model  functions  specified  after  -f,  --function  are  expected   to   be   both
              differentiable  and linear with respect to the fit/regression variables. Otherwise,
              `lfit`  detects  the  non-differentiable  and  non-linear  property  of  the  model
              function(s)  and  refuses  the  analysis.  In  this case, other types of regression
              analysis  methods  can  be  applied  depending  our   needs,   for   instance   the
              Levenberg-Marquardtalgorithm  (NLLM,  see  -N,  --nllm)  or  the  downhill  simplex
              minimization (DHSX, see -D, --dhsx).

       -N, --nllm, --nonlinear
              This option implies a regression involving the nonlinear Levenberg-Marquardt (NLLM)
              minimization  algorithm.  The  model function(s) specified after -f, --function are
              expected to  be  differentiable  with  respect  to  the  fit/regression  variables.
              Otherwise, `lfit` detects the non-differentiable property and refuses the analysis.
              There some fine-tune parameters of the Levenberg-Marquardt algorithm, see also  the
              secion  "Fine-tuning  of  regression  analysis  methods" for more details how these
              additional regression parameters can be set. Note that all  of  the  fit/regression
              variables  should have a proper initial value, defined in the command line argument
              -v, --variable (see also there).

       -U, --lmnd
              Levenberg-Marquardt minimization with numerical partial derivatives (LMND). Same as
              the  NLLM  method,  with the exception of that the partial derivatives of the model
              function(s) are  calculated  numerically.  Therefore,  the  model  function(s)  may
              contain  functions  of which partial derivatives are not known in an analytic form.
              The differences used in the computations  of  the  partial  derivatives  should  be
              declared by the user, see also the command line option -q, --differences.

       -D, --dhsx, --downhill
              This  option  implies  a regression involving the nonlinear downhill simplex (DHSX)
              minimization algorithm. The user should specify the proper inital values and  their
              uncertainties  as  <name>=<initial>:<uncertainty>,  unless  the  "fisher" option is
              passed to the -P, --parameters command line argument  (see  later  in  the  section
              "Fine-tuning  of regression analysis methods"). In the first case, the initial size
              of the simplex is based on the uncertainties provided by  the  user  while  in  the
              second  case,  the initial simplex is derived from the eigenvalues and eigenvectors
              of  the  Fisher  covariance  matrix.  Note  that  the  model  functions   must   be
              differentiable in the latter case.

       -M, --mcmc
              This  option  implies  the  method  of  Markov  Chain Monte-Carlo (MCMC). The model
              function(s) can be arbitrary in the point of differentiability.  However,  each  of
              the   fit/regression   variables   must   have  an  initial  assumption  for  their
              uncertainties which must be specified via the command line argument -v, --variable.
              The  user  should  specify  the  proper inital values and uncertainties of these as
              <name>=<initial>:<uncertainty>.  In  the  actual  implementation  of  `lfit`,  each
              variable  has  an  uncorrelated  Gaussian  a priori distribution with the specified
              uncertainty. The MCMC algorithm has some  fine-tune  parameters,  see  the  section
              "Fine-tuning of regression analysis methods" for more details.

       -K, --mchi, --chi2
              With  this  option one can perform a "brute force" Chi^2 minimization by evaluating
              the value of the merit function of Chi^2 on a grid of the fit/regression variables.
              In  this  case  the  grid  size and resolution must be specified in a specific form
              after the -v, --variable command line argument. Namely each of  the  fit/regression
              variables   intended   to   be   varied   on   a   grid   must  have  a  format  of
              <name>=[<min>:<step>:<max>] while the other ones specified  as  <name>=<value>  are
              kept fixed. The output of this analysis will be a series of lines with N+1 columns,
              where the values of fit/regression variables are followed by the value of the merit
              function. Note that all of the declared fit/regression variables are written to the
              output, including the ones which  are  fixed  (therefore  the  output  is  somewhat
              redundant).

       -E, --emce
              This  option  implies  the  method of "refitting to synthetic data sets", or "error
              Monte-Carlo  estimation"  (EMCE).  This  method  must  have  a  primarily  assigned
              minimization  algorithm (that can be any of the CLLS, NLLM or DHSX methods). First,
              the program searches the best fit values for the fit/regression variables involving
              the  assigned  primary minimization algorithm and reports these best fit variables.
              Then, additional synthetic data sets are generated around  this  set  of  best  fit
              variables  and  the minimization is repeated involving the same primary method. The
              synthetic data sets are generated independently for each input data  block,  taking
              into  account  the fit residuals. The noise added to the best fit data is generated
              from the power spectrum of the residuals.

       -X, --xmmc
              This option implies an improved/extended version of the  Markov  Chain  Monte-Carlo
              analysis  (XMMC).  The  major differences between the classic MCMC and XMMC methods
              are the following. 1/ The  transition  distribution  is  derived  from  the  Fisher
              covariance  matrix.  2/  The  program performs an initial minimization of the merit
              function involving the method of downhill simplex. 3/  Various  sanity  checks  are
              performed  in  order  to verify the convergence of the Markov chains (including the
              comparison of the actual and theoretical transition probabilities, the  computation
              of  the  autocorrelation  lengths  of  each  fit/regression variable series and the
              comparison of the statistical and Fisher covariance).

       -A, --fima
              Fisher information matrix analysis  (FIMA).  With  this  analysis  method  one  can
              estimate  the  uncertainties  and  correlations  of  the  fit/regression  variables
              involving the method of Fisher matrix analysis. This method does not  minimize  the
              merit  functions  by  adjusting  the fit/regression variables, instead, the initial
              values (specified after the -v, --variables option) are expected to  be  the  "best
              fit" ones.

   Fine-tuning of regression analysis methods:
       -e, --error <error expression>
              Expression  for  the  uncertainties.  Note  that  zero  or  negative uncertainty is
              equivalent to zero weight, i.e. input  lines  with  zero  or  negative  errors  are
              discarded from the fit.

       -w, --weight <weight expression>
              Expression for the weights. The weight is simply the reciprocal of the uncertainty.
              The default error/uncertainty (and therefore the weight) is unity. Note  that  most
              of  the analysis/regression methods are rather sensitive to the uncertainties since
              the merit function also depends on these.

       -P, --parameters <regression parameters>
              This option is followed  by  a  set  of  optional  fine-tune  parameters,  that  is
              different for each primary regression analysis method:

       default, defaults
              Use the default fine-tune parameters for the given regression method.

       clls, linear
              Use  the  classic linear least squares method as the primary minimization algorithm
              of the EMCE method. Like in the case of  the  CLLS  regression  analysis  (see  -L,
              --clls),  the model function(s) must be both differentiable and linear with respect
              to the fit/regression variables.

       nllm, nonlinear
              Use the  non-linear  Levenberg-Marquardt  minimization  algorithm  as  the  primary
              minimization  algorithm of the EMCE method. Like in the case of the NLLM regression
              analysis (see -N, --nllm),  the  model  function(s)  must  be  differentiable  with
              respect to the fit/regression variables.

       lmnd   Use  the  non-linear  Levenberg-Marquardt  minimization  algorithm  as  the primary
              minimization algorithm of  the  EMCE  method.  Like  in  the  case  of  -U,  --lmnd
              regression  method,  the  parametric  derivatives  of  the  model  function(s)  are
              calculated by a numerical approximation (see also -U, --lmnd and -q,  --differences
              for additional details).

       dhsx, downhill
              Use  the downhill simplex (DHSX) minimization as the primary minimization algorithm
              of the EMCE method. Unless the additional 'fisher' option  is  specified  directly,
              like in the default case of the DHSX regression method, the user should specify the
              uncertainties of the fit/regression variables that are used as an initial  size  of
              the simplex.

       mc, montecarlo
              Use  a  primitive  Monte-Carlo  diffusion  minimization  technique  as  the primary
              minimization  algorithm  of  the  EMCE  method.  The  user   should   specify   the
              uncertainties  of  the fit/regression variables which are then used to generate the
              Monte-Carlo transitions. This primary minimization technique is rather nasty  (very
              slow), so its usage is not recommended.

       fisher In  the  case  of the DHSX regression method or in the case of the EMCE method when
              the primary minimization is the downhill simplex algorithm, the initial size of the
              simplex  is derived from the Fisher covariance approximation evaluated at the point
              represented by the initial  values  of  the  fit/regression  variables.  Since  the
              derivation  of  the  Fisher  covariance  requires  the  knowledge  of  the  partial
              derivatives of the model function(s) with respect to the fit/regression  variables,
              the(se)  model  function(s)  must be differentiable. On the other hand, the user do
              not have to specify the initial uncertainties  after  the  -v,  --variables  option
              since these uncertainties derived automatically from the Fisher covariance.

       skip   In the case of EMCE and XMMC method, the initial minimization is skipped.

       lambda=<value>
              Initial value for the "lambda" parameter of the Levenberg-Marquardt algorithm.

       multiply=<value>
              Value of the "lambda multiplicator" parameter of the Levenberg-Marquardt algorithm.

       iterations=<max.iterations>
              Number of iterations during the Levenberg-Marquardt algorithm.

       accepted
              Count the accepted transitions in the MCMC and XMMC methods (default).

       nonaccepted
              Count  the  total  (accepted  plus  non-accepted)  transitions in the MCMC and XMMC
              methods.

       gibbs  Use the Gibbs sampler in the MCMC method.

       adaptive
              Use the adaptive XMMC algorithm (i.e. the Fisher covariance  is  re-computed  after
              each accepted transition).

       window=<window size>
              Window  size  for  calculating  the  autocorrelation  lengths for the Markov chains
              (these autocorrelation lengths are reported only in the case of XMMC  method).  The
              default  value  is  20,  which  is  fine  in  the  most  cases  since  the  typical
              autocorrelation lengths are between 1 and 2 for nice convergent chains.

       -q, --difference <variablename>=<difference>[,...]
              The analysis method  of  LMND  (Levenberg-Marquardt  minimization  using  numerical
              derivatives,  see  -U,  --lmnd)  requires  the differences that are used during the
              computations of the partial derivatives of the model function(s). With this option,
              one can specify these differences.

       -k, --separate <variablename>[,...]
              In  the  case  of  non-linear  regression  methods (for instance, DHSX or XMMC) the
              fit/regression variables in which the model functions are linear can  be  separated
              from the nonlinear part and therefore make the minimization process more robust and
              reliable. Since the set of variables in which the model  functions  are  linear  is
              ambiguous,  the  user  should  explicitly  specify this supposedly linear subset of
              regression     variables.     (For      instance,      the      model      function
              "a*b*x+a*cos(x)+b*sin(x)+c*x^2"  is  linear  in  both "(a,c)" and "(b,c)" parameter
              vectors but it  is  non-linear  in  "(a,b,c)".)  The  program  checks  whether  the
              specified  subset  of regression variables is a linear subset and reports a warning
              if not. Note that the subset of separated linear variables (defined here)  and  the
              subset  of  the  fit/regression  variables affected by linear constraints (see also
              section "Constraints") must be disjoint.

       --perturbations <noise level>, --perturbations <key>=<noise level>[,...]
              Additional white noise to be added to each EMCE  synthetic  data  sets.  Each  data
              block  (referred here by the approprate data block keys, see also section "Multiple
              data blocks") may have different white noise levels. If  there  is  only  one  data
              block,  this  command  line argument is followed only by a single number specifying
              the white noise level.

   Additional parameters for Monte-Carlo analysis:
       -s, --seed <random seed>
              Seed for the random number generator. By default this seed is 0, thus  all  of  the
              Monte-Carlo  regression  analyses  (EMCE, MCMC, XMMC and the optional generator for
              the FIMA method) generate reproducible parameter distributions.  A  positive  value
              after  this option yields alternative random seeds while all negative values result
              in an automatic random seed  (derived  from  various  available  sources,  such  as
              /dev/[u]random,  system time, hardware MAC address and so), therefore distributions
              generated involving this kind of automatic random seed are not reproducible.

       -i, --[mcmc,emce,xmmc,fima]-iterations <iterations>
              The actual number of Monte-Carlo iterations  for  the  MCMC,  EMCE,  XMMC  methods.
              Additionally,  the  FIMA method is capable to generate a mock Gaussian distribution
              of the parameter with the same covariance as derived by the  Fisher  analysis.  The
              number  of  points in this mock distribution is also specified by this command line
              option.

   Clipping outlier data points:
       -r, --sigma, --rejection-level <level>
              Rejection level in the units of standard deviations.

       -n, --iterations <number of iterations>
              Maximum number of iterations in the outlier clipping cycles. The actual  number  of
              outlier  points  can  be traced by increasing the verbosity of the program (see -V,
              --verbose).

       --[no-]weighted-sigma
              During the derivation of the standard  deviation,  the  contribution  of  the  data
              points  data  points can be weighted by the respective weights/error bars (see also
              -w, --weight or -e, --error in the  section  "Fine-tuning  of  regression  analysis
              methods").  If  no  weights/error bars are associated to the data points (i.e. both
              -w, --weight or -e,  --error  options  are  omitted),  this  option  will  have  no
              practical effect.

       Note that in the actual version of `lfit`, only the CLLS, NLLM and LMND regression methods
       support the above discussed way of outlier clipping.

   Multiple data blocks:
       -i<key> <input file name>
              Input file name for the data block named as <key>.

       -c<key> <independent>[:<column index>],...
              Column definitions (see also -c, --columns) for  the  given  data  block  named  as
              <key>.

       -f<key> <model function>
              Expression for the model function assigned to the data block named as <key>.

       -y<key> <dependent expression>
              Expression of the dependent variable for the data block named as <key>.

       -e<key> <errors>
              Expression of the uncertainties for the data block named as <key>.

       -w<key> <weights>
              Expression  of the weights for the data block named as <key>. Note that like in the
              case of -e, --errors and -w, --weights, only one of the -e<key>, -w<key>  arguments
              should be specified.

   Constraints:
       -t, --constraint, --constraints <expression>{=<>}<expression>[,...]
              List  of  fit  and  domain  constraints  between the regression variables. Each fit
              constraint expression must be linear in the fit/regression variables.  The  program
              checks  the  linearity  of  the  fit constraints and reports an error if any of the
              constraints are non-linear.  A domain constraint can be  any  expression  involving
              arbitrary binary arithmetic relation (such as strict greater than: '>', strict less
              than: '<', greater or equal to: '>=' and less or requal to: '<='). Constraints  can
              be  specified either by a comma-separated list after a single command line argument
              of -t, --constraints or by multiple of these command line arguments.

       -v, --variable <name>:=<value>
              Another form of specifying  constraints.  The  variable  specifications  after  -v,
              --variable  can  also  be used to define constraints by writing ":=" instead of "="
              between the variable name and initial value. Thus, -v <name>:=<value> is equivalent
              to -v <name>=<value> -t <name>=<value>.

   User-defined functions:
       -x, --define, --macro <name>(<parameters>)=<definition expression>
              With  this option, the user can define additional functions (also called macros) on
              the top of the built-in functions and operators, dynamically loadaded functions and
              previously  defined  macros.  Note  that  each  such  user-defined function must be
              stand-alone,  i.e.  external  variables  (such  as  fit/regression  variables   and
              independent  variables)  cannot  be  part  of  the  definition expression, only the
              parameters of these functions.

   Dynamically loaded extensions and functions:
       -d, --dynamic <library>:<array>[,...]
              Load the dynamically linked library (shared object) named <library> and import  the
              global `lfit`-compatible set of functions defined in the arrays specified after the
              name of the library. The  arrays  must  have  to  be  declared  with  the  type  of
              'lfitfunction',  as  it  is defined in the file "lfit.h". Each record in this array
              contains information about a certain imported function, namely the actual  name  of
              this  function,  flags  specifying  whether  the  function is differentiable and/or
              linear in its  regression  parameters,  the  number  of  regression  variables  and
              independent variables and the actual C subroutine that implements the evaulation of
              the function (and the optional computation of the partial derivatives). The  module
              'linear.c'   and   'linear.so'  provides  a  simple  example  that  implements  the
              "line(a,b,x)=a*x+b" function. This example function has  two  regression  variables
              ("a"  and "b") and one independent variable ("x") and the function itself is linear
              in the regression variables.

   More on outputs:
       -z, --columns-output <column indices>
              Column indices where the results are written in evaluation mode. If this option  is
              omitted, the results of the function evaluation are written sequentally. Otherwise,
              the input file is written to the output  and  the  appropriate  columns  (specified
              here)  are  replaced  by  the  respective results of the function evaluation. Thus,
              although the default column order is sequential, there is a significant  difference
              between  omitting this option and specifying "-z 1,2,...,N". In the first case, the
              output file contains only the results of the function  evaluations,  while  in  the
              latter  case,  the  first  N  columns  of  the  original file are replaced with the
              results.

       --errors, --error-line, --error-columns
              Print the uncertainties of the fit/regression variables.

       -F, --format <variable name>=<format>[,...]
              Format  of  the  output  in  printf-style  for  each  fit/regression   variable(see
              printf(3)). The default format is %12.6g (6 signifiant figures).

       -F, --format <format>[,...]
              Format of the output in evaluation mode. The default format is %12.6g (6 signifiant
              figures).

       -C, --correlation-format <format>
              Format of  the  correlation  matrix  elements.  The  default  format  is  %6.3f  (3
              significant figures).

       -g, --derived-variable[s] <variable name>=<expression>[,...]
              Some   of   the  regression  and  analysis  methods  are  capable  to  compute  the
              uncertainties and correlations for derived regression variables.  These  additional
              (and  therefore  not  independent)  variables can be defined with this command line
              option. In the  definition  expression  one  should  use  only  the  fit/regression
              variables  (as  defined  by  the -v, --variables command line argument). The output
              format of these variables can also be specified by the -F,  --format  command  line
              argument.

       -u, --output-fitted <filename>
              Neme  of  an  output file into which those lines of the input are written that were
              involved in the final regression. This option is useful  in  the  case  of  outlier
              clipping  in order to see what was the actual subset of input data that was used in
              the fit (see also the -n, --iterations and -r, --sigma options).

       -j, --output-rejected <filename>
              Neme of an output file into which those lines of the input are  written  that  were
              rejected  from  the  final regression. This option is useful in the case of outlier
              clipping in order to see what was  the  actual  subset  of  input  data  where  the
              dependent  variable  represented  outlier points (see also the -n, --iterations and
              -r, --sigma options).

       -a, --output-all <filename>
              File containing the lines of the input file that  were  involved  in  the  complete
              regression  analysis. This file is simply the original file, only the commented and
              empty lines are omitted.

       -p, --output-expression <filename>
              In this file the model function is written in which  the  fit/regression  variables
              are replaced by their best-fit values.

       -l, --output-variables <filename>
              List  of the names and values of the fit/regression variables in the same format as
              used after the -v, --variables command line argument. The content of this file  can
              therefore be passed to subsequent invocations of `lfit`.

       --delta
              Write  the  individual  differences  between  the  independent  variables  and  the
              evaluated best fit model  function  values  for  each  line  in  the  output  files
              specified  by  the  -u, --output-fitted, -j, --output-rejected and -a, --output-all
              command line options.

       --delta-comment
              Same as --delta, but the differences are written as a comment (i.e. separated by  a
              '##' from the original input lines).

       --residual
              Write  the  final  fit  residual to the output file (after the list of the best-fit
              values for the fit/regression variables).

REPORTING BUGS

       Report bugs to <apal@szofi.net>, see also https://fitsh.net/.

       Copyright © 1996, 2002, 2004-2008, 2009-2020; Pal, Andras <apal@szofi.net>