Ubuntu Manpage: gmtregress - Linear regression of 1-D data sets

name
synopsis
description
required arguments
optional arguments
ascii format precision
examples
references
see also
copyright

Provided by: gmt-common_5.4.3+dfsg-1_all

NAME

       gmtregress - Linear regression of 1-D data sets

SYNOPSIS

       gmtregress  [ table ] [  -Amin/max/inc ] [  -Clevel ] [  -Ex|y|o|r ] [  -Fflags ] [  -N1|2|r|w ] [  -S[r]
       ] [  -Tmin/max/inc |  -Tn ] [  -W[w][x][y][r] ] [  -V[level] ] [ -aflags ] [ -bbinary ] [  -dnodata  ]  [
       -eregexp ] [ -ggaps ] [ -hheaders ] [ -iflags ] [ -oflags ]

       Note: No space is allowed between the option flag and the associated arguments.

DESCRIPTION

       gmtregress reads one or more data tables [or stdin] and determines the best linear regression model y = a
       + b* x for each segment using the  chosen  parameters.   The  user  may  specify  which  data  and  model
       components  should  be  reported.   By  default,  the  model  will  be evaluated at the input points, but
       alternatively you can specify an equidistant range  over  which  to  evaluate  the  model,  or  turn  off
       evaluation  completely.   Instead  of  determining  the  best  fit  we can perform a scan of all possible
       regression lines (for a range of slope angles) and examine how the  chosen  misfit  measure  varies  with
       slope.   This  is particularly useful when analyzing data with many outliers.  Note: If you actually need
       to work with log10 of x or y you can accomplish that transformation during read by using the -i option.

REQUIRED ARGUMENTS

       None

OPTIONAL ARGUMENTS

       table  One or more ASCII (or binary, see -bi[ncols][type]) data table file(s) holding a  number  of  data
              columns.  If  no  tables  are  given  then  we read from standard input. The first two columns are
              expected to contain the required x and y data.  Depending on your -W and -E settings we may expect
              an  additional  1-3  columns with error estimates of one of both of the data coordinates, and even
              their correlation.

       -Amin/max/inc
              Instead of determining a best-fit regression we explore the full range  of  regressions.   Examine
              all  possible  regression  lines with slope angles between min and max, using steps of inc degrees
              [-90/+90/1].  For each slope the optimum intercept is determined based  on  your  regression  type
              (-E)  and misfit norm (-N) settings.  For each segment we report the four columns angle, E, slope,
              intercept, for the range of specified angles. The best model  parameters  within  this  range  are
              written into the segment header and reported in verbose mode (-V).

       -Clevel
              Set  the  confidence  level  (in %) to use for the optional calculation of confidence bands on the
              regression [95].  This is only used if -F includes the output column c.

       -Ex|y|o|r
              Type of linear regression, i.e., select the type of misfit we should  calculate.   Choose  from  x
              (regress  x on y; i.e., the misfit is measured horizontally from data point to regression line), y
              (regress y on x; i.e., the misfit is measured vertically  [Default]),  o  (orthogonal  regression;
              i.e.,  the  misfit  is  measured  from data point orthogonally to nearest point on the line), or r
              (Reduced Major Axis regression; i.e., the misfit is the product of both  vertical  and  horizontal
              misfits) [y].

       -Fflags
              Append  a  combination  of  the  columns  you wish returned; the output order will match the order
              specified.  Choose from x (observed x), y (observed y), m (model prediction), r (residual  =  data
              minus  model),  c  (symmetrical  confidence  interval on the regression; see -C for specifying the
              level), z (standardized residuals or so-called z-scores) and w (outlier weights 0 or  1;  for  -Nw
              these  are  the  Reweighted Least Squares weights) [xymrczw].  As an alternative to evaluating the
              model, just give -Fp and we instead write a single record with the model parameters npoints  xmean
              ymean angle misfit slope intercept sigma_slope sigma_intercept.

       -N1|2|r|w
              Selects  the norm to use for the misfit calculation.  Choose among 1 (L-1 measure; the mean of the
              absolute residuals), 2 (Least-squares; the mean of the  squared  residuals),  r  (LMS;  The  least
              median  of  the  squared  residuals), or w (RLS; Reweighted Least Squares: the mean of the squared
              residuals after outliers identified via LMS  have  been  removed)  [Default  is  2].   Traditional
              regression  uses  L-2 while L-1 and in particular LMS are more robust in how they handle outliers.
              As alluded to, RLS implies an initial LMS regression which is then used to  identify  outliers  in
              the data, assign these a zero weight, and then redo the regression using a L-2 norm.

       -S[r]  Restricts  which records will be output.  By default all data records will be output in the format
              specified by -F.  Use -S to  exclude  data  points  identified  as  outliers  by  the  regression.
              Alternatively, use -Sr to reverse this and only output the outlier records.

       -Tmin/max/inc | -Tn
              Evaluate the best-fit regression model at the equidistant points implied by the arguments.  If -Tn
              is given instead we will reset min and max to the extreme x-values for each segment and  determine
              inc  so  that  there  are  exactly n output values for each segment.  To skip the model evaluation
              entirely, simply provide -T0.

       -W[w][x][y][r]
              Specifies weighted regression and which weights will be provided.   Append  x  if  giving  1-sigma
              uncertainties  in  the  x-observations,  y  if  giving 1-sigma uncertainties in y, and r if giving
              correlations between x and y observations, in the order these columns appear in the  input  (after
              the  two  required  and  leading x, y columns).  Giving both x and y (and optionally r) implies an
              orthogonal  regression,  otherwise  giving  x  requires  -Ex  and  y  requires  -Ey.   We  convert
              uncertainties  in x and y to regression weights via the relationship weight = 1/sigma.  Use -Ww if
              the we should interpret the input columns to have precomputed weights  instead.   Note:  residuals
              with  respect  to  the  regression line will be scaled by the given weights.  Most norms will then
              square this weighted residual (-N1 is the only exception).

       -V[level] (more …)
              Select verbosity level [c].

       -acol=name[…] (more …)
              Set aspatial column associations col=name.

       -bi[ncols][t] (more …)
              Select native binary input.

       -bo[ncols][type] (more …)
              Select native binary output. [Default is same as input].

       -d[i|o]nodata (more …)
              Replace input columns that equal nodata with NaN and do the reverse on output.

       -e[~]”pattern” | -e[~]/regexp/[i] (more …)
              Only accept data records that match the given pattern.

       -g[a]x|y|d|X|Y|D|[col]z[+|-]gap[u] (more …)
              Determine data gaps and line breaks.

       -h[i|o][n][+c][+d][+rremark][+rtitle] (more …)
              Skip or produce header record(s).

       -icols[+l][+sscale][+ooffset][,…] (more …)
              Select input columns and transformations (0 is first column).

       -ocols[,…] (more …)
              Select output columns (0 is first column).

       -^ or just -
              Print a short message about the syntax of the command, then exits (NOTE: on Windows just use -).

       -+ or just +
              Print an extensive usage (help) message, including the explanation of any  module-specific  option
              (but not the GMT common options), then exits.

       -? or no arguments
              Print a complete usage (help) message, including the explanation of all options, then exits.

ASCII FORMAT PRECISION

       The  ASCII output formats of numerical data are controlled by parameters in your gmt.conf file. Longitude
       and latitude  are  formatted  according  to  FORMAT_GEO_OUT,  absolute  time  is  under  the  control  of
       FORMAT_DATE_OUT  and  FORMAT_CLOCK_OUT,  whereas general floating point values are formatted according to
       FORMAT_FLOAT_OUT. Be aware that the format in effect can lead to loss of precision in ASCII output, which
       can  lead  to  various  problems downstream. If you find the output is not written with enough precision,
       consider  switching  to  binary  output  (-bo  if  available)  or  specify  more   decimals   using   the
       FORMAT_FLOAT_OUT setting.

EXAMPLES

       To  do  a  standard  least-squares  regression  on  the x-y data in points.txt and return x, y, and model
       prediction with 99% confidence intervals, try

              gmt regress points.txt -Fxymc -C99 > points_regressed.txt

       To just get the slope for the above regression, try

              slope=`gmt regress points.txt -Fp -o5`

       To do a reweighted least-squares regression on the data rough.txt and return x, y, model  prediction  and
       the RLS weights, try

              gmt regress rough.txt -Fxymw > points_regressed.txt

       To do an orthogonal least-squares regression on the data crazy.txt but first take the logarithm of both x
       and y, then return x, y, model prediction and the normalized residuals (z-scores), try

              gmt regress crazy.txt -Eo -Fxymz -i0-1l > points_regressed.txt

       To examine how the orthogonal LMS misfits vary with angle between 0 and 90 in steps of  0.2  degrees  for
       the same file, try

              gmt regress points.txt -A0/90/0.2 -Eo -Nr > points_analysis.txt

REFERENCES

       Draper,  N.  R.,  and H. Smith, 1998, Applied regression analysis, 3rd ed., 736 pp., John Wiley and Sons,
       New York.

       Rousseeuw, P. J., and A. M. Leroy, 1987, Robust regression and outlier detection, 329 pp., John Wiley and
       Sons, New York.

       York, D., N. M. Evensen, M. L. Martinez, and J. De Basebe Delgado, 2004, Unified equations for the slope,
       intercept, and standard errors of the best straight line, Am. J. Phys., 72(3), 367-375.

COPYRIGHT

       2018, P. Wessel, W. H. F. Smith, R. Scharroo, J. Luis, and F. Wobbe

NAME

SYNOPSIS

DESCRIPTION

REQUIRED ARGUMENTS

OPTIONAL ARGUMENTS

ASCII FORMAT PRECISION

EXAMPLES

REFERENCES

SEE ALSO

COPYRIGHT