lunar (1) grmatch.1.gz

Provided by: fitsh_0.9.4-1_amd64 bug

NAME

       grmatch - pairing lines by involving identifier or cross matching

SYNOPSIS

       grmatch [options] -r <reference> -i <input> [-o <output>]

DESCRIPTION

       The program `grmatch` matches lines read from two input files, namely from a reference and
       from an input file. All implemented algorithms are  symmetric,  in  the  manner  that  the
       result  should be the same if these two files are swapped. The only case when the order of
       these files is important is when a geometrical transformation is also returned (see  point
       matching  below),  in  this case the swapping of the files results the inverse form of the
       original transformation. The lines (rows) can be matched using various criteria. 1.  Lines
       can  be matched by identifier, where the identifier can be any concatenation of arbitrary,
       space-separated columns found in the files. Generally, the identifier is represented by  a
       single  column  (e.g.  it  is  an  astronomical  catalog identifier). The behaviour of the
       program can be tuned for the cases when there  are  more  than  one  rows  with  the  same
       identifier. 2. Lines can be matched using a 2-dimensional point matchig algorithm. In this
       method, the program expects two-two columns both from the  reference and input files which
       can be treated as X and Y coordinates. If both point lists are known, the program tries to
       find the appropriate geometrical transformation  which  transforms  the  points  from  the
       frame  of the reference list to the frame of the input list and,  simultaneously, tries to
       find as many pairs as possible. The  parameters of the geometrical transformation and  the
       whole  algorithm  can  be  fine-tuned.  3.  Lines  can  be  matched  using arbitrary- (N-)
       dimensional coordinate matching algorithm. This method expects N-N columns both  from  the
       reference  and input files which can be treated as X_1, ..., X_N Cartesian coordinates and
       the method assumes both of the point sets in the same reference frame. The point 'A'  from
       the  reference  list  and   the  point 'P' from the input list forms a pair if the closest
       point to 'A' from the input list is 'P' and vice versa.

OPTIONS

   General options:
       -h, --help
              Give general summary about the command line options.

       --long-help, --help-long
              Gives a detailed list of command line options.

       --wiki-help, --help-wiki, --mediawiki-help, --help-mediawiki
              Gives a detailed list of command line options in Mediawiki format.

       --version, --version-short, --short-version
              Give some version information about the program.

       -C, --comment
              Comment the output (both the transformation file and the match file).

   Options for input/output specifications:
       -r <file>, --reference <file> --input-reference <file>
              Mandatory, name of the reference file.

       <inputfile>, -i <inputile>, --input <inputfile>
              Name of the input file. If this switch is omitted,  the  input  isread  from  stdin
              (specifying some input is mandatory).

       --separator-reference <char>|space, --separator-input <char>|space
              Character  for  separating  the  fields of the reference and the input input files,
              respectively. By default, the separation is  done  using  whitespaces,  it  can  be
              ephasized  by defining 'space' here. Otherwise, the character <char> should only be
              a  single  character.  For   instance,   use   '--separator-reference   ,'   and/or
              '--separator-input ,' to process CSV files.

       -o <output>, --output <output>, --output-matched <output>
              Name of the output file, containing the matched lines. The matched lines are pasted
              lines, the first part is from the reference file and the second part  is  from  the
              input  file,  these  two  parts are concatenated by a TAB character. This switch is
              optional, if it is not specified, no such output will be generated.

       --output-matched-reference <out>, --output-matched-input <out>
              Name of the output file, containing the lines corresponding  to  matches  but  only
              from the reference file or from the input file, respectively.

       --output-excluded-reference <out>, --output-excluded-input <out>
              Names  of  the  files which contain the valid but excluded lines from the reference
              and from the input. These  outputs  are  disjoint  from  the  previous  output  and
              altogether contaions all valid lines.

       --output-id <out>
              Name  of the file which contaions only the identifiers of the matched lines. If the
              primary matching method was not identifier matching, one should specify the  column
              indices of the identifiers by --col-ref-id and --col-inp-id also.

       --output-transformation <output-transformation-file>
              Name   of   the   output  file  containing  the   geometrical   transformation,  in
              human-readable format, if the matching method was point matching  (in  other  case,
              this  option  has no  effect).  The  commented  version  of this file includes some
              statistics about the matching (the total  number  of  lines   used   and   matched,
              the  required  CPU time, the final triangulation level, the fit residuals and other
              things like these).

       In all of the above input/output file specifications,  the  replacement of  the  file name
       by "-" (a single minus sign) forces the reading from stdin or writing to stdout. Note that
       all parts of the any  line  after "#" (hashmark)  are  treated  as  a  comment,  therefore
       ignored.

   General options for point matching:
       --match-points
              This   switch   forces   the  usage  of the point matching method. By default, this
              method is  assumed  to  be  used,  therefore  this switch can be omitted.

       --col-ref <x>,<y>, --col-inp <x>,<y>
              The  column  indices containing the X and Y coordinates, for the reference and  for
              the  input  file,  respectively.  The index of the first  column  is  always 1, the
              index of the second is 2 and so on. Lines in which these  columns  do  not  contain
              valid real numbers bers are omitted.

       -a <order>, --order <order>
              This    switch   specifies   the  polynomial  order  of  the  resulted  geometrical
              transformation. It can be arbitrary  positive  integer. Note that if the  order  is
              A,  at least (A+1)*(A+2)/2 valid points are needed both from the reference and both
              from the input  file to fit the transformation.

       --max-distance <maxdist>
              The  maximal accepted distance between the matched points in the  coordinate  frame
              of  the  input  coordinate  list  (and not in the coordinate frame of the reference
              coordinate list). Possible pairs (which are  valid  pairs  due  to  the   symmetric
              coordinate  matching  algorihms) are excluded if their Eucledian distance is larger
              than maxdist. Note that  this option has no initial value, therefore,  if  omitted,
              all  possible  pairs  due to the symmetric matching are resulted, which, in certain
              cases  in  practice,  can result unexpected behaviour. One should always specify  a
              reasonable  maximal  distance  which can be estimated  only  by  the  knowledge  of
              the physics of the input files.

       See more options concerning to point  matching  in  the  section "Fine-Tuning   of   Point
       Matching"  below. That  section  also describes the tuning of the  triangulation  used  by
       the  point matching  algorithm.  For  a more detailed description about the point matching
       algorithms based on pattern and triangle matching see [1], [2] or [3].

   General options for coordinate matching:
       --match-coord, --match-coords
              This   switch forces the usage of the coordinate matching method. Note that because
              of the common options with the point  matching  method,  one  should  specify  this
              switch  to force the usage of the coordinate matching method (the default method is
              point  matching, see above).

       --col-ref <x>[,<y>,[<z>...]] --col-inp <x>[,<y>,[<z>...]]
              The  column  indices containing the spatial coordinates, for the reference and  for
              the  input  file,  respectively.  The index of the first  column  is  always 1, the
              index of the second is 2 and so on. Lines in which these  columns  do  not  contain
              valid  real   numbers  are   omitted.   Note  that  the dimension of the coordinate
              matching space is specified indirectly, by the number of   column  indices   listed
              here.   Because  of  this,  the number of column indices should be the same for the
              reference and input, in other case,  when  the  dimensions   are   mismatched,  the
              program exits unsuccessfully.

       --max-distance <maxdist>
              The  maximal  accepted  distance between the matched points. Possible  pairs (which
              are valid pairs due to the symmetric coordinate matching algorihms) are excluded if
              their   Eucledian  distance   is  larger than maxdist. Note that this option has no
              initial value, therefore, if omitted, all possible  pairs  due  to  the   symmetric
              matching  are  resulted (see also point matching, above).

   General options for identifier matching:
       --match-id, --match-identifiers
              This switch forces the usage of the identifier matching  method.

       --col-ref-id <i>[,<j>,[<k>...]] --col-inp-id <i>[,<j>,[<k>...]]
              Column  index  or  indices  containing the identifiers, from the reference and from
              the input file, respectively.

       --no-ambiguity, --first-ambiguity, --any-ambiguity, --full-ambiguity
              These options tune the behaviour of the matching when   there  is  more   than  one
              occurrence  of  a  given  identifier  in  the  reference  and/or  input  file.   If
              --no-ambiguity is specified, these  identifiers  are discarded, this is the default
              method.   If --first-ambiguity is specified, only the first occurence is treated as
              a matched  line, independently from the  number  of  occurrences.   If  the  switch
              --any-ambiguity  is  specified,  the lines  are  paired sequentally, until there is
              any left from the reference and from  the  input.   For  example,  if  there  is  4
              occurrences  in  the  reference  and  6  in the input file of a given identifier, 4
              matched pairs are returned.  Otherwise, if  --full-ambiguity  is   specified,   all
              possible  combinations  of  the lines are treated as matched lines. For example, if
              there is  4  occurrences  in  the reference  and  6  in  the input file of a  given
              identifier, all 4*6=24 combinations are returned as matched pairs.

   Fine-tuning of point matching:
       --triangulation <parameters>
              This  switch   is   followed   by   comma-separated   directives, which specify the
              parameters of the triangulation-based point matching algorithm:

       delaunay, level=<level>, full, auto, unitarity=<U>
              These   directives  specify  the  triangulation  level  used  for  point  matching.
              "delaunay"  forces  the  usage only of the Delaunay-triangles.  This is the fastest
              method, however, it is only working if the points in the reference and input  lists
              are  almost   competely   overlapping   and  describe  almost  the  same point sets
              (within a ratio of common  points  above  60-70%).   The   "level"  specifies   the
              level  of  the  expansion of the Delaunay-triangulation (see [1] for more details).
              In  practice,  the  lower  the ratio  of common points  and/or  the  ratio  of  the
              overlapping,  the  higher level should be used.  Specifying "level=1" or  "level=2"
              gives  a  robust  but  still  fast method for general usage. The  directive  "full"
              forces  full triangulation.  This can  be  overwhelmingly  slow  and  annoying  and
              requires tons of memory if there are more than 40-50 points (the amounts  of  these
              resources  are   proportional   to   the  6th(!) and 3rd power of the number of the
              points, respectively).  The  directive   "auto"   increases   the  level   of   the
              triangulation   expansion  automatically  until a proper match is found. A match is
              considered as a good match if the unitarity of the transformation is less than  the
              unitarity  U  specified  by  the  "unitarity=U"  directive  (see also  the  section
              Notes/Unitarity below).

       mixed, conformable, reverse
              These  directives  define  the  chirality  of  the  triangle  spaces  to  be  used.
              Practically,  it  means  the  following.  If  we  don't  know whether the input and
              reference lists are inverted respecting to  each  other,  one  should  use  "mixed"
              triangle   space.   If   we  are sure  about that the input and reference lists are
              not inverted, we can use "conformable" triangle space.  If   we   know   that   the
              input   and   reference  lists  are inverted, we can use "reverse" space. Note that
              although  "mixed"  triangle  space  can  always result  a  good match, it is a wise
              idea  to  fix  the  chirality by specifying "conformable" or "reverse" if we really
              know that the point  sets  are  not  inverted   or   inverted  respecting  to  each
              other.  If  the   chirality   is  fixed,  the  program  yields more matched  pairs,
              the  appropriate  triangulation  level  can  be smaller and  in  "auto"  mode,  the
              program returns the match  definitely faster.

       maxnumber=<max>, maxref=<mr>, maxinp=<mi>
              These  directives  specify  the  maximal  number  of  points  which  are  used  for
              triangulation (for  any  type  of  triangulation). If "maxnumber"   is   specified,
              it  is  equivalent to define "maxref" and "maxinp" with the same values. Then,  the
              first  <mr>  points from  the  reference and the first <mi> points from  the  input
              list  are  used  to  generate  the triangle sets. The "first"  points  are selected
              using  the  optional  information  found in one of the columns, see  the  following
              switches.

       (Note that there should be only one --triangulation switch, all desired directives  should
       be  written in the same argument, separated by commas.)

       --col-ref-ordering [-]<w>, --col-inp-ordering [-]<w>.
              These switches specify one-one column index from  the  reference and from the input
              files  which  are  used  to  order  these  lists  and select the first "maxref" and
              "maxinp" points  (see  above)  for the  generation  of  the  two  triangle  meshes.
              Both  columns  should  contain valid real  numbers,  otherwise  the  whole(!)  line
              is excluded (not only from sorting but from the whole matching procedure). If there
              is   no  negative  sign  before  the   column   index,  the   data  are  sorted  in
              descending(!) order, therefore the lines with the lines with the highest(!)  values
              are  selected  for  triangulation.  If  there  is a negative sign before the index,
              the data are sorted in ascending order by  these   values,   therefore  the   lines
              with the smallest(!) values are selected for triangulation. For example, if we want
              to match star  lists,  we  might  want   to   use   only   the  brightest  ones  to
              generate  the  triangle  sets.  If the brightnesses of the stars are  specified  by
              their fluxes,  we should not use the negative sign (the list should  be  sorted  in
              descending  order  to select the first few lines as  the brightest  stars),  and if
              the brightness is known by the magnitude, we have to use the negative sign.

       --fit iterations=<N>,firstrejection=<F>,sigma=<S>
              Like --triangulation, this  switch  is   followed   by   some   directives.   These
              directives   specify   the  number <N> of iterations ("iterations=<N>")  for  point
              matching.  The  "firstrejection" directive  speciy  the  serial  number <F> of  the
              first  iteration where points farer than <S> "sigma" level are excluded in the next
              iteration.  Note  that  in   practice   these  type  of  iteration  is  really  not
              important   (due  to,  for  instance,  the  limitations  of  the  outliers  by  the
              --max-distance switch), however, some suspicious users can  be  convinced  by  such
              arguments.

       --weight reference|input,column=<wi>,[magnitude],[power=<p>]
              These   directives   specify  the  weights  which  are  used  during the fit of the
              geometrical transformation. For example,  in   practice  it   is   useful   in  the
              following  situation.  We  try  to  match  star  lists,  then the fainter stars are
              believed to have higher astrometrical errors, therefore they  should  have  smaller
              influence  in  the  fit.  We  can  take the weights  from  the  reference  (specify
              "reference") and from the input (specify "input"), from the column specified by the
              weight-index.  The  weights   can   be   derived  from  stellar  magnitudes, if so,
              specify "magnitude" to convert the read values  in  magnitude  to  flux.  The  real
              weights   then  is the  "power"th  power  of  the  flux.  The  default value of the
              "power" is 1,  however,  for  the  maximum-likelihood  estimation   of  an  assumed
              Gaussian distribution, the weights should be the second power of the fluxes.

       Some  notes  on unitarity.  The unitarity of a geometrical transformation measures  how it
       differs from the closest transformation which is affine and  a  combination  of  dilation,
       rotation  and  shift.  For  such  a  transformation   the  unitarity   is   0  and  if the
       second-order terms  in  a  transformation  distort  a  such  unitary  transformation,  the
       unitarity will  have  the  same magnitude  like the magnitude of this second-order effect.
       For example, to map a part of a sphere with the size of d degrees will have an   unitarity
       of  1-cos(d).  Therefore,  for  astrometrical purposes, a reasonable value of the critical
       unitarity in "auto" triangulation  mode  can  be estimated  as  2 or  3  times  1-cos(d/2)
       where d is the size of the field in which astrometry should be performed.

REPORTING BUGS

       Report bugs to <apal@szofi.net>, see also https://fitsh.net/.

       Copyright © 1996, 2002, 2004-2008, 2010-2016, 2018-2020; Pal, Andras <apal@szofi.net>