Provided by: fitsh_0.9.2-1_amd64 

NAME
lfit - general purpose evaluation and regression analysis tool
SYNOPSIS
lfit [method of analysis] [options] <input> [-o, --output <output>]
DESCRIPTION
The program `lfit` is a standalone command line driven tool designed for both interactive and batch
processed data analysis and regression. In principle, the program may run in two modes. First, `lfit`
supports numerous regression analysis methods that can be used to search for "best fit" parameters of
model functions in order to model the input data (which are read from one or more input files in
tabulated form). Second, `lfit` is capable to read input data and performs various arithmetic operations
as it is specified by the user. Basically this second mode is used to evaluate the model functions with
the parameters presumably derived by the actual regression methods (and in order to complete this
evaluation, only slight changes are needed in the command line invocation arguments).
OPTIONS
General options:
-h, --help
Gives general summary about the command line options.
--long-help, --help-long
Gives a detailed list of command line options.
--wiki-help, --help-wiki, --mediawiki-help, --help-mediawiki
Gives a detailed list of command line options in Mediawiki format.
--version, --version-short, --short-version
Gives some version information about the program.
--functions, --list-functions, --function-list
Lists the available arithmetic operations and built-in functions supported by the program.
--examples
Prints some very basic examples for the program invocation.
Common options for regression analysis:
-v, --variable, --variables <list-of-variables>
Comma-separated list of regression variables. In case of non-linear regression analysis, all of
these fit variables are expected to have some initial values (specified as <name>=<value>),
otherwise the initial values are set to be zero. Note that in the case of some of the
regression/analysis methods, additional parameters should be assigned to these fit/regression
variables. See the section "Regression analysis methods" for additional details.
-c, --column, --columns <independent>[:<column index>],...
Comma-separated list of independet variable names as read from the subsequent columns of the
primary input data file. If the independent variables are not in sequential order in the input
file, the optional column indices should be defined for each variable, by separating the column
index with a colon after the name of the variable. In the case of multiple input files and data
blocks, the user should assign the individual independent variables and the respective column
names and definitions for each file (see later, Sec. "Multiple data blocks").
-f, --function <model function>
Model function of the analysis in a symbolic form. This expression for the model function should
contain built-in arithmetic operators, built-in functions, user-defined macros (see -x, --define)
or functions provided by the dynamically loaded external modules (see -d, --dynamic). The model
function can depend on both the fit/regression variables (see -v, --variables) and the independent
variables read from the input file (see -c, --columns). In the case of multiple input files and
data blocks, the user should assign the respective model functions for each data block (see
later). Note that some of the analysis methods expects the model function to be either
differentiable or linear in the fit/regression variables. See "Regression analysis methods" later
on about more details.
-y, --dependent <dependent expression>
The dependent variable of the regression analysis, in a form of an arithmetic expression. This
expression for the dependent variable can depend only on the variables read from the input file
(see -c, --columns). In the case of multiple input files and data blocks, the user should assign
the respective dependent expressions for each data block (see later).
-o, --output <output file>
Name of the output file into which the fit results (the values for the fit/regression variables)
are written.
Common options for function evaluation:
-f, --function <function to evaluate>[...]
List of functions to be evaluated. More expressions can be specified by either separating the
subsequent expressions by a comma or by specifying more -f, --function options in the command
line.
Note that the two basic modes of `lfit` are distinguished only by the presence or the absence of the -y,
--dependent command line argument. In other words, there isn't any explicit command line argument which
specify the mode of `lfit`. If the -y, --dependent command line argument is omitted, `lfit` runs in
function evaluation mode, otherwise the program runs in regression analysis mode.
-o, --output <output file>
Name of the output file in which the results of the function evaluation are written.
Regression analysis methods:
-L, --clls, --linear
The default mode of `lfit`, the classical linear least squares (CLLS) method. The model functions
specified after -f, --function are expected to be both differentiable and linear with respect to
the fit/regression variables. Otherwise, `lfit` detects the non-differentiable and non-linear
property of the model function(s) and refuses the analysis. In this case, other types of
regression analysis methods can be applied depending our needs, for instance the
Levenberg-Marquardtalgorithm (NLLM, see -N, --nllm) or the downhill simplex minimization (DHSX,
see -D, --dhsx).
-N, --nllm, --nonlinear
This option implies a regression involving the nonlinear Levenberg-Marquardt (NLLM) minimization
algorithm. The model function(s) specified after -f, --function are expected to be differentiable
with respect to the fit/regression variables. Otherwise, `lfit` detects the non-differentiable
property and refuses the analysis. There some fine-tune parameters of the Levenberg-Marquardt
algorithm, see also the secion "Fine-tuning of regression analysis methods" for more details how
these additional regression parameters can be set. Note that all of the fit/regression variables
should have a proper initial value, defined in the command line argument -v, --variable (see also
there).
-U, --lmnd
Levenberg-Marquardt minimization with numerical partial derivatives (LMND). Same as the NLLM
method, with the exception of that the partial derivatives of the model function(s) are calculated
numerically. Therefore, the model function(s) may contain functions of which partial derivatives
are not known in an analytic form. The differences used in the computations of the partial
derivatives should be declared by the user, see also the command line option -q, --differences.
-D, --dhsx, --downhill
This option implies a regression involving the nonlinear downhill simplex (DHSX) minimization
algorithm. The user should specify the proper inital values and their uncertainties as
<name>=<initial>:<uncertainty>, unless the "fisher" option is passed to the -P, --parameters
command line argument (see later in the section "Fine-tuning of regression analysis methods"). In
the first case, the initial size of the simplex is based on the uncertainties provided by the user
while in the second case, the initial simplex is derived from the eigenvalues and eigenvectors of
the Fisher covariance matrix. Note that the model functions must be differentiable in the latter
case.
-M, --mcmc
This option implies the method of Markov Chain Monte-Carlo (MCMC). The model function(s) can be
arbitrary in the point of differentiability. However, each of the fit/regression variables must
have an initial assumption for their uncertainties which must be specified via the command line
argument -v, --variable. The user should specify the proper inital values and uncertainties of
these as <name>=<initial>:<uncertainty>. In the actual implementation of `lfit`, each variable has
an uncorrelated Gaussian a priori distribution with the specified uncertainty. The MCMC algorithm
has some fine-tune parameters, see the section "Fine-tuning of regression analysis methods" for
more details.
-K, --mchi, --chi2
With this option one can perform a "brute force" Chi^2 minimization by evaluating the value of the
merit function of Chi^2 on a grid of the fit/regression variables. In this case the grid size and
resolution must be specified in a specific form after the -v, --variable command line argument.
Namely each of the fit/regression variables intended to be varied on a grid must have a format of
<name>=[<min>:<step>:<max>] while the other ones specified as <name>=<value> are kept fixed. The
output of this analysis will be a series of lines with N+1 columns, where the values of
fit/regression variables are followed by the value of the merit function. Note that all of the
declared fit/regression variables are written to the output, including the ones which are fixed
(therefore the output is somewhat redundant).
-E, --emce
This option implies the method of "refitting to synthetic data sets", or "error Monte-Carlo
estimation" (EMCE). This method must have a primarily assigned minimization algorithm (that can be
any of the CLLS, NLLM or DHSX methods). First, the program searches the best fit values for the
fit/regression variables involving the assigned primary minimization algorithm and reports these
best fit variables. Then, additional synthetic data sets are generated around this set of best fit
variables and the minimization is repeated involving the same primary method. The synthetic data
sets are generated independently for each input data block, taking into account the fit residuals.
The noise added to the best fit data is generated from the power spectrum of the residuals.
-X, --xmmc
This option implies an improved/extended version of the Markov Chain Monte-Carlo analysis (XMMC).
The major differences between the classic MCMC and XMMC methods are the following. 1/ The
transition distribution is derived from the Fisher covariance matrix. 2/ The program performs an
initial minimization of the merit function involving the method of downhill simplex. 3/ Various
sanity checks are performed in order to verify the convergence of the Markov chains (including the
comparison of the actual and theoretical transition probabilities, the computation of the
autocorrelation lengths of each fit/regression variable series and the comparison of the
statistical and Fisher covariance).
-A, --fima
Fisher information matrix analysis (FIMA). With this analysis method one can estimate the
uncertainties and correlations of the fit/regression variables involving the method of Fisher
matrix analysis. This method does not minimize the merit functions by adjusting the fit/regression
variables, instead, the initial values (specified after the -v, --variables option) are expected
to be the "best fit" ones.
Fine-tuning of regression analysis methods:
-e, --error <error expression>
Expression for the uncertainties. Note that zero or negative uncertainty is equivalent to zero
weight, i.e. input lines with zero or negative errors are discarded from the fit.
-w, --weight <weight expression>
Expression for the weights. The weight is simply the reciprocal of the uncertainty. The default
error/uncertainty (and therefore the weight) is unity. Note that most of the analysis/regression
methods are rather sensitive to the uncertainties since the merit function also depends on these.
-P, --parameters <regression parameters>
This option is followed by a set of optional fine-tune parameters, that is different for each
primary regression analysis method:
default, defaults
Use the default fine-tune parameters for the given regression method.
clls, linear
Use the classic linear least squares method as the primary minimization algorithm of the EMCE
method. Like in the case of the CLLS regression analysis (see -L, --clls), the model function(s)
must be both differentiable and linear with respect to the fit/regression variables.
nllm, nonlinear
Use the non-linear Levenberg-Marquardt minimization algorithm as the primary minimization
algorithm of the EMCE method. Like in the case of the NLLM regression analysis (see -N, --nllm),
the model function(s) must be differentiable with respect to the fit/regression variables.
lmnd Use the non-linear Levenberg-Marquardt minimization algorithm as the primary minimization
algorithm of the EMCE method. Like in the case of -U, --lmnd regression method, the parametric
derivatives of the model function(s) are calculated by a numerical approximation (see also -U,
--lmnd and -q, --differences for additional details).
dhsx, downhill
Use the downhill simplex (DHSX) minimization as the primary minimization algorithm of the EMCE
method. Unless the additional 'fisher' option is specified directly, like in the default case of
the DHSX regression method, the user should specify the uncertainties of the fit/regression
variables that are used as an initial size of the simplex.
mc, montecarlo
Use a primitive Monte-Carlo diffusion minimization technique as the primary minimization algorithm
of the EMCE method. The user should specify the uncertainties of the fit/regression variables
which are then used to generate the Monte-Carlo transitions. This primary minimization technique
is rather nasty (very slow), so its usage is not recommended.
fisher In the case of the DHSX regression method or in the case of the EMCE method when the primary
minimization is the downhill simplex algorithm, the initial size of the simplex is derived from
the Fisher covariance approximation evaluated at the point represented by the initial values of
the fit/regression variables. Since the derivation of the Fisher covariance requires the knowledge
of the partial derivatives of the model function(s) with respect to the fit/regression variables,
the(se) model function(s) must be differentiable. On the other hand, the user do not have to
specify the initial uncertainties after the -v, --variables option since these uncertainties
derived automatically from the Fisher covariance.
skip In the case of EMCE and XMMC method, the initial minimization is skipped.
lambda=<value>
Initial value for the "lambda" parameter of the Levenberg-Marquardt algorithm.
multiply=<value>
Value of the "lambda multiplicator" parameter of the Levenberg-Marquardt algorithm.
iterations=<max.iterations>
Number of iterations during the Levenberg-Marquardt algorithm.
accepted
Count the accepted transitions in the MCMC and XMMC methods (default).
nonaccepted
Count the total (accepted plus non-accepted) transitions in the MCMC and XMMC methods.
gibbs Use the Gibbs sampler in the MCMC method.
adaptive
Use the adaptive XMMC algorithm (i.e. the Fisher covariance is re-computed after each accepted
transition).
window=<window size>
Window size for calculating the autocorrelation lengths for the Markov chains (these
autocorrelation lengths are reported only in the case of XMMC method). The default value is 20,
which is fine in the most cases since the typical autocorrelation lengths are between 1 and 2 for
nice convergent chains.
-q, --difference <variablename>=<difference>[,...]
The analysis method of LMND (Levenberg-Marquardt minimization using numerical derivatives, see -U,
--lmnd) requires the differences that are used during the computations of the partial derivatives
of the model function(s). With this option, one can specify these differences.
-k, --separate <variablename>[,...]
In the case of non-linear regression methods (for instance, DHSX or XMMC) the fit/regression
variables in which the model functions are linear can be separated from the nonlinear part and
therefore make the minimization process more robust and reliable. Since the set of variables in
which the model functions are linear is ambiguous, the user should explicitly specify this
supposedly linear subset of regression variables. (For instance, the model function
"a*b*x+a*cos(x)+b*sin(x)+c*x^2" is linear in both "(a,c)" and "(b,c)" parameter vectors but it is
non-linear in "(a,b,c)".) The program checks whether the specified subset of regression variables
is a linear subset and reports a warning if not. Note that the subset of separated linear
variables (defined here) and the subset of the fit/regression variables affected by linear
constraints (see also section "Constraints") must be disjoint.
--perturbations <noise level>, --perturbations <key>=<noise level>[,...]
Additional white noise to be added to each EMCE synthetic data sets. Each data block (referred
here by the approprate data block keys, see also section "Multiple data blocks") may have
different white noise levels. If there is only one data block, this command line argument is
followed only by a single number specifying the white noise level.
Additional parameters for Monte-Carlo analysis:
-s, --seed <random seed>
Seed for the random number generator. By default this seed is 0, thus all of the Monte-Carlo
regression analyses (EMCE, MCMC, XMMC and the optional generator for the FIMA method) generate
reproducible parameter distributions. A positive value after this option yields alternative random
seeds while all negative values result in an automatic random seed (derived from various available
sources, such as /dev/[u]random, system time, hardware MAC address and so), therefore
distributions generated involving this kind of automatic random seed are not reproducible.
-i, --[mcmc,emce,xmmc,fima]-iterations <iterations>
The actual number of Monte-Carlo iterations for the MCMC, EMCE, XMMC methods. Additionally, the
FIMA method is capable to generate a mock Gaussian distribution of the parameter with the same
covariance as derived by the Fisher analysis. The number of points in this mock distribution is
also specified by this command line option.
Clipping outlier data points:
-r, --sigma, --rejection-level <level>
Rejection level in the units of standard deviations.
-n, --iterations <number of iterations>
Maximum number of iterations in the outlier clipping cycles. The actual number of outlier points
can be traced by increasing the verbosity of the program (see -V, --verbose).
--[no-]weighted-sigma
During the derivation of the standard deviation, the contribution of the data points data points
can be weighted by the respective weights/error bars (see also -w, --weight or -e, --error in the
section "Fine-tuning of regression analysis methods"). If no weights/error bars are associated to
the data points (i.e. both -w, --weight or -e, --error options are omitted), this option will have
no practical effect.
Note that in the actual version of `lfit`, only the CLLS, NLLM and LMND regression methods support the
above discussed way of outlier clipping.
Multiple data blocks:
-i<key> <input file name>
Input file name for the data block named as <key>.
-c<key> <independent>[:<column index>],...
Column definitions (see also -c, --columns) for the given data block named as <key>.
-f<key> <model function>
Expression for the model function assigned to the data block named as <key>.
-y<key> <dependent expression>
Expression of the dependent variable for the data block named as <key>.
-e<key> <errors>
Expression of the uncertainties for the data block named as <key>.
-w<key> <weights>
Expression of the weights for the data block named as <key>. Note that like in the case of -e,
--errors and -w, --weights, only one of the -e<key>, -w<key> arguments should be specified.
Constraints:
-t, --constraint, --constraints <expression>{=<>}<expression>[,...]
List of fit and domain constraints between the regression variables. Each fit constraint
expression must be linear in the fit/regression variables. The program checks the linearity of the
fit constraints and reports an error if any of the constraints are non-linear. A domain
constraint can be any expression involving arbitrary binary arithmetic relation (such as strict
greater than: '>', strict less than: '<', greater or equal to: '>=' and less or requal to: '<=').
Constraints can be specified either by a comma-separated list after a single command line argument
of -t, --constraints or by multiple of these command line arguments.
-v, --variable <name>:=<value>
Another form of specifying constraints. The variable specifications after -v, --variable can also
be used to define constraints by writing ":=" instead of "=" between the variable name and initial
value. Thus, -v <name>:=<value> is equivalent to -v <name>=<value> -t <name>=<value>.
User-defined functions:
-x, --define, --macro <name>(<parameters>)=<definition expression>
With this option, the user can define additional functions (also called macros) on the top of the
built-in functions and operators, dynamically loadaded functions and previously defined macros.
Note that each such user-defined function must be stand-alone, i.e. external variables (such as
fit/regression variables and independent variables) cannot be part of the definition expression,
only the parameters of these functions.
Dynamically loaded extensions and functions:
-d, --dynamic <library>:<array>[,...]
Load the dynamically linked library (shared object) named <library> and import the global
`lfit`-compatible set of functions defined in the arrays specified after the name of the library.
The arrays must have to be declared with the type of 'lfitfunction', as it is defined in the file
"lfit.h". Each record in this array contains information about a certain imported function, namely
the actual name of this function, flags specifying whether the function is differentiable and/or
linear in its regression parameters, the number of regression variables and independent variables
and the actual C subroutine that implements the evaulation of the function (and the optional
computation of the partial derivatives). The module 'linear.c' and 'linear.so' provides a simple
example that implements the "line(a,b,x)=a*x+b" function. This example function has two regression
variables ("a" and "b") and one independent variable ("x") and the function itself is linear in
the regression variables.
More on outputs:
-z, --columns-output <column indices>
Column indices where the results are written in evaluation mode. If this option is omitted, the
results of the function evaluation are written sequentally. Otherwise, the input file is written
to the output and the appropriate columns (specified here) are replaced by the respective results
of the function evaluation. Thus, although the default column order is sequential, there is a
significant difference between omitting this option and specifying "-z 1,2,...,N". In the first
case, the output file contains only the results of the function evaluations, while in the latter
case, the first N columns of the original file are replaced with the results.
--errors, --error-line, --error-columns
Print the uncertainties of the fit/regression variables.
-F, --format <variable name>=<format>[,...]
Format of the output in printf-style for each fit/regression variable(see printf(3)). The default
format is %12.6g (6 signifiant figures).
-F, --format <format>[,...]
Format of the output in evaluation mode. The default format is %12.6g (6 signifiant figures).
-C, --correlation-format <format>
Format of the correlation matrix elements. The default format is %6.3f (3 significant figures).
-g, --derived-variable[s] <variable name>=<expression>[,...]
Some of the regression and analysis methods are capable to compute the uncertainties and
correlations for derived regression variables. These additional (and therefore not independent)
variables can be defined with this command line option. In the definition expression one should
use only the fit/regression variables (as defined by the -v, --variables command line argument).
The output format of these variables can also be specified by the -F, --format command line
argument.
-u, --output-fitted <filename>
Neme of an output file into which those lines of the input are written that were involved in the
final regression. This option is useful in the case of outlier clipping in order to see what was
the actual subset of input data that was used in the fit (see also the -n, --iterations and -r,
--sigma options).
-j, --output-rejected <filename>
Neme of an output file into which those lines of the input are written that were rejected from the
final regression. This option is useful in the case of outlier clipping in order to see what was
the actual subset of input data where the dependent variable represented outlier points (see also
the -n, --iterations and -r, --sigma options).
-a, --output-all <filename>
File containing the lines of the input file that were involved in the complete regression
analysis. This file is simply the original file, only the commented and empty lines are omitted.
-p, --output-expression <filename>
In this file the model function is written in which the fit/regression variables are replaced by
their best-fit values.
-l, --output-variables <filename>
List of the names and values of the fit/regression variables in the same format as used after the
-v, --variables command line argument. The content of this file can therefore be passed to
subsequent invocations of `lfit`.
--delta
Write the individual differences between the independent variables and the evaluated best fit
model function values for each line in the output files specified by the -u, --output-fitted, -j,
--output-rejected and -a, --output-all command line options.
--delta-comment
Same as --delta, but the differences are written as a comment (i.e. separated by a '##' from the
original input lines).
--residual
Write the final fit residual to the output file (after the list of the best-fit values for the
fit/regression variables).
REPORTING BUGS
Report bugs to <apal@szofi.net>, see also http://fitsh.net/.
COPYRIGHT
Copyright © 1996, 2002, 2004-2008, 2009; Pal, Andras <apal@szofi.net>
lfit 7.9 (2009.05.25) September 2016 LFIT(1)