Ubuntu Manpage: trend1d - Fit a [weighted] [robust] polynomial [and/or Fourier] model for y = f(x) to xy[w] data

Provided by: gmt-common_5.4.3+dfsg-1_all

NAME

       trend1d - Fit a [weighted] [robust] polynomial [and/or Fourier] model for y = f(x) to xy[w] data

SYNOPSIS

       trend1d  [ table ]  -Fxymrw|p|P|c  -Nparams [ xy[w]file ] [  -Ccondition_number ] [  -I[confidence_level]
       ] [  -V[level] ] [  -W ] [ -bbinary ] [ -dnodata ] [ -eregexp ] [ -fflags ] [ -hheaders ] [ -iflags  ]  [
       -:[i|o] ]

       Note: No space is allowed between the option flag and the associated arguments.

DESCRIPTION

       trend1d  reads x,y [and w] values from the first two [three] columns on standard input [or file] and fits
       a regression model y = f(x) + e by [weighted] least squares. The functional form of f(x) may be chosen as
       polynomial or Fourier or a mix of the two, and the fit may be made robust by iterative reweighting of the
       data. The user may also search for the number of terms in f(x) which significantly reduce the variance in
       y.

REQUIRED ARGUMENTS

       -Fxymrw|p|P|c
              Specify up to five letters from the set {x y m r w} in any order to create columns  of  ASCII  [or
              binary]  output.  x  =  x,  y = y, m = model f(x), r = residual y - m, w = weight used in fitting.
              Alternatively, choose just the single selection p to output a record  with  the  polynomial  model
              coefficients,  P  for  the  normalized  polynomial  model  coefficients,  or  c for the normalized
              Chebyshev model coefficients.

       -N[p|P|f|F|c|C|s|S|x]n[,…][+llength][+oorigin][+r]
              Specify the components of the (possibly mixed) model.  Append one or  more  comma-separated  model
              components.   Each  component  is  of  the  form  Tn,  where  T indicates the basis function and n
              indicates the polynomial degree or how many terms in  the  Fourier  series  we  want  to  include.
              Choose  T  from  p  (polynomial with intercept and powers of x up to degree n), P (just the single
              term x^n), f (Fourier series with n terms), c (Cosine series with n terms), s (sine series with  n
              terms),  F  (single  Fourier  component of order n), C (single cosine component of order n), and S
              (single sine component of order n).  By default the x-origin and fundamental period is set to  the
              mid-point  and  data  range, respectively.  Change this using the +oorigin and +llength modifiers.
              We normalize x before evaluating the basis functions.  Basically, the trigonometric bases all  use
              the  normalized  x’  = (2*pi*(x-origin)/length) while the polynomials use x’ = 2*(x-x_mid)/(xmax -
              xmin) for stability. Finally, append +r for a robust solution [Default gives a least squares fit].
              Use -V to see a plain-text representation of the y(x) model specified in -N.

OPTIONAL ARGUMENTS

       table  One or more ASCII [or binary, see -bi] files containing x,y [w] values in the first 2 [3] columns.
              If no files are specified, trend1d will read from standard input.

       -Ccondition_number
              Set the maximum allowed condition number for the matrix solution.  trend1d  fits  a  damped  least
              squares  model,  retaining  only  that  part of the eigenvalue spectrum such that the ratio of the
              largest eigenvalue to the smallest eigenvalue is condition_#. [Default: condition_# = 1.0e06. ].

       -I[confidence_level]
              Iteratively increase the number of model parameters, starting at one, until n_model is reached  or
              the  reduction  in variance of the model is not significant at the confidence_level level. You may
              set -I only, without an attached number; in this case the fit will be  iterative  with  a  default
              confidence  level  of  0.51.  Or choose your own level between 0 and 1. See remarks section.  Note
              that the model terms are added in the order they were given in -N so you  should  place  the  most
              important terms first.

       -V[level] (more …)
              Select verbosity level [c].

       -W     Weights  are  supplied  in  input  column  3. Do a weighted least squares fit [or start with these
              weights when doing the iterative robust fit]. [Default reads only the first 2 columns.]

       -bi[ncols][t] (more …)
              Select native binary input. [Default is 2 (or 3 if -W is set) columns].

       -bo[ncols][type] (more …)
              Select native binary output. [Default is 1-5 columns as given by -F].

       -d[i|o]nodata (more …)
              Replace input columns that equal nodata with NaN and do the reverse on output.

       -e[~]”pattern” | -e[~]/regexp/[i] (more …)
              Only accept data records that match the given pattern.

       -f[i|o]colinfo (more …)
              Specify data types of input and/or output columns.

       -h[i|o][n][+c][+d][+rremark][+rtitle] (more …)
              Skip or produce header record(s).

       -icols[+l][+sscale][+ooffset][,…] (more …)
              Select input columns and transformations (0 is first column).

       -:[i|o] (more …)
              Swap 1st and 2nd column on input and/or output.

       -^ or just -
              Print a short message about the syntax of the command, then exits (NOTE: on Windows just use -).

       -+ or just +
              Print an extensive usage (help) message, including the explanation of any  module-specific  option
              (but not the GMT common options), then exits.

       -? or no arguments
              Print a complete usage (help) message, including the explanation of all options, then exits.

ASCII FORMAT PRECISION

       The  ASCII output formats of numerical data are controlled by parameters in your gmt.conf file. Longitude
       and latitude  are  formatted  according  to  FORMAT_GEO_OUT,  absolute  time  is  under  the  control  of
       FORMAT_DATE_OUT  and  FORMAT_CLOCK_OUT,  whereas general floating point values are formatted according to
       FORMAT_FLOAT_OUT. Be aware that the format in effect can lead to loss of precision in ASCII output, which
       can lead to various problems downstream. If you find the output is not  written  with  enough  precision,
       consider   switching   to   binary  output  (-bo  if  available)  or  specify  more  decimals  using  the
       FORMAT_FLOAT_OUT setting.

REMARKS

If a polynomial model is included, then the domain of x will be shifted and scaled to [-1, 1] and the
basis functions will be Chebyshev polynomials provided the polygon is of full order (otherwise we stay
with powers of x). The Chebyshev polynomials have a numerical advantage in the form of the matrix which
must be inverted and allow more accurate solutions. The Chebyshev polynomial of degree n has n+1 extrema
in [-1, 1], at all of which its value is either -1 or +1. Therefore the magnitude of the polynomial model
coefficients can be directly compared. NOTE: The stable model coefficients are Chebyshev coefficients.
The corresponding polynomial coefficients in a + bx + cxx + … are also given in Verbose mode but users
must realize that they are NOT stable beyond degree 7 or 8. See Numerical Recipes for more discussion.
For evaluating Chebyshev polynomials, see gmtmath.

The -N…+r (robust) and -I (iterative) options evaluate the significance of the improvement in model
misfit Chi-Squared by an F test. The default confidence limit is set at 0.51; it can be changed with the
-I option. The user may be surprised to find that in most cases the reduction in variance achieved by
increasing the number of terms in a model is not significant at a very high degree of confidence. For
example, with 120 degrees of freedom, Chi-Squared must decrease by 26% or more to be significant at the
95% confidence level. If you want to keep iterating as long as Chi-Squared is decreasing, set
confidence_level to zero.

A low confidence limit (such as the default value of 0.51) is needed to make the robust method work. This
method iteratively reweights the data to reduce the influence of outliers. The weight is based on the
Median Absolute Deviation and a formula from Huber [1964], and is 95% efficient when the model residuals
have an outlier-free normal distribution. This means that the influence of outliers is reduced only
slightly at each iteration; consequently the reduction in Chi-Squared is not very significant. If the
procedure needs a few iterations to successfully attenuate their effect, the significance level of the F
test must be kept low.

EXAMPLES

       To remove a linear trend from data.xy by ordinary least squares, use:

              gmt trend1d data.xy -Fxr -Np1 > detrended_data.xy

       To make the above linear trend robust with respect to outliers, use:

              gmt trend1d data.xy -Fxr -Np1+r > detrended_data.xy

       To fit the model y(x) = a + bx^2 + c * cos(2*pi*3*(x/l) + d * sin(2*pi*3*(x/l), with  l  the  fundamental
       period (here l = 15), try:

              gmt trend1d data.xy -Fxm -NP0,P2,F3+l15 > model.xy

       To  find  out  how  many  terms (up to 20, say in a robust Fourier interpolant are significant in fitting
       data.xy, use:

              gmt trend1d data.xy -Nf20+r -I -V

REFERENCES

       Huber, P. J., 1964, Robust estimation of a location parameter, Ann.  Math. Stat., 35, 73-101.

       Menke, W., 1989, Geophysical Data Analysis: Discrete Inverse Theory, Revised Edition, Academic Press, San
       Diego.

COPYRIGHT

       2018, P. Wessel, W. H. F. Smith, R. Scharroo, J. Luis, and F. Wobbe

5.4.3                                             Jan 03, 2018                                     TREND1D(1gmt)