lunar (1) stilts-tmatch2.1.gz

Provided by: stilts_3.4.7-4_all bug

NAME

       stilts-tmatch2 - Crossmatches 2 tables using flexible criteria

SYNOPSIS

       stilts tmatch2 [in1=<table1>] [ifmt1=<in-format>] [in2=<table2>] [ifmt2=<in-format>]
                      [icmd1=<cmds>] [icmd2=<cmds>] [ocmd=<cmds>]
                      [omode=out|meta|stats|count|checksum|cgi|discard|topcat|samp|tosql|gui]
                      [out=<out-table>] [ofmt=<out-format>] [matcher=<matcher-name>]
                      [values1=<expr-list>] [values2=<expr-list>] [params=<match-params>]
                      [tuning=<tuning-params>] [join=1and2|1or2|all1|all2|1not2|2not1|1xor2]
                      [find=all|best|best1|best2] [fixcols=none|dups|all] [suffix1=<label>]
                      [suffix2=<label>] [scorecol=<col-name>] [progress=none|log|time|profile]
                      [runner=parallel|parallel<n>|parallel-all|sequential|classic|partest]

DESCRIPTION

       tmatch2 is an efficient and highly configurable tool for crossmatching pairs of tables. It
       can match rows between tables on the basis of their  relative  position  in  the  sky,  or
       alternatively  using  many  other  criteria  such  as  separation  in  some  isotropic  or
       anisotropic Cartesian space, identity of a key value, or some combination  of  these;  the
       full  range  of match criteria is discussed in SUN/256. You can choose whether you want to
       identify all the matches or only the closest, and what form the output  table  takes,  for
       instance  matched  rows  only,  or all rows from one or both tables, or only the unmatched
       rows.

       If you simply want to match two  tables  based  on  sky  position  with  a  fixed  maximum
       separation, you may find the tskymatch2 command easier to use.

       Note:  the duptag1 and duptag2 parameters have been replaced at version 1.4 by suffix1 and
       suffix2 for consistency with other table join tasks.

OPTIONS

       in1=<table1>
              The location of the first input table. This may take one of the following forms:

                * A filename.

                * A URL.

                * The special value "-", meaning standard input. In this case  the  input  format
                  must  be  given explicitly using the ifmt1 parameter. Note that not all formats
                  can be streamed in this way.

                * A scheme specification of the form :<scheme-name>:<scheme-args>.

                * A system command line with either a "<"  character  at  the  start,  or  a  "|"
                  character at the end ("<syscmd" or "syscmd|"). This executes the given pipeline
                  and reads from its standard output. This will probably only work  on  unix-like
                  systems.
               In  any  case,  compressed data in one of the supported compression formats (gzip,
              Unix compress or bzip2) will be decompressed transparently.

       ifmt1=<in-format>
              Specifies the format of the first input table as specified by  parameter  in1.  The
              known  formats are listed in SUN/256. This flag can be used if you know what format
              your table is in. If it has the special value (auto) (the default), then an attempt
              will be made to detect the format of the table automatically. This cannot always be
              done correctly however,  in  which  case  the  program  will  exit  with  an  error
              explaining  which  formats  were  attempted.  This parameter is ignored for scheme-
              specified tables.

       in2=<table2>
              The location of the second input table. This may take one of the following forms:

                * A filename.

                * A URL.

                * The special value "-", meaning standard input. In this case  the  input  format
                  must  be  given explicitly using the ifmt2 parameter. Note that not all formats
                  can be streamed in this way.

                * A scheme specification of the form :<scheme-name>:<scheme-args>.

                * A system command line with either a "<"  character  at  the  start,  or  a  "|"
                  character at the end ("<syscmd" or "syscmd|"). This executes the given pipeline
                  and reads from its standard output. This will probably only work  on  unix-like
                  systems.
               In  any  case,  compressed data in one of the supported compression formats (gzip,
              Unix compress or bzip2) will be decompressed transparently.

       ifmt2=<in-format>
              Specifies the format of the second input table as specified by parameter  in2.  The
              known  formats are listed in SUN/256. This flag can be used if you know what format
              your table is in. If it has the special value (auto) (the default), then an attempt
              will be made to detect the format of the table automatically. This cannot always be
              done correctly however,  in  which  case  the  program  will  exit  with  an  error
              explaining  which  formats  were  attempted.  This parameter is ignored for scheme-
              specified tables.

       icmd1=<cmds>
              Specifies processing to be performed on the  first  input  table  as  specified  by
              parameter  in1,  before  any  other  processing  has taken place. The value of this
              parameter is one or more of the filter commands described in SUN/256. If more  than
              one  is given, they must be separated by semicolon characters (";"). This parameter
              can be repeated multiple times on the same command line  to  build  up  a  list  of
              processing steps. The sequence of commands given in this way defines the processing
              pipeline which is performed on the table.

              Commands may alteratively be supplied in an external file, by using the indirection
              character  '@'. Thus a value of "@filename" causes the file filename to be read for
              a list of filter commands to execute. The commands in the file may be separated  by
              newline characters and/or semicolons, and lines which are blank or which start with
              a '#' character are ignored.

       icmd2=<cmds>
              Specifies processing to be performed on the second  input  table  as  specified  by
              parameter  in2,  before  any  other  processing  has taken place. The value of this
              parameter is one or more of the filter commands described in SUN/256. If more  than
              one  is given, they must be separated by semicolon characters (";"). This parameter
              can be repeated multiple times on the same command line  to  build  up  a  list  of
              processing steps. The sequence of commands given in this way defines the processing
              pipeline which is performed on the table.

              Commands may alteratively be supplied in an external file, by using the indirection
              character  '@'. Thus a value of "@filename" causes the file filename to be read for
              a list of filter commands to execute. The commands in the file may be separated  by
              newline characters and/or semicolons, and lines which are blank or which start with
              a '#' character are ignored.

       ocmd=<cmds>
              Specifies processing  to  be  performed  on  the  output  table,  after  all  other
              processing  has  taken  place.  The  value  of this parameter is one or more of the
              filter commands described in SUN/256. If more than  one  is  given,  they  must  be
              separated  by  semicolon  characters (";"). This parameter can be repeated multiple
              times on the same command line to build up a list of processing steps. The sequence
              of commands given in this way defines the processing pipeline which is performed on
              the table.

              Commands may alteratively be supplied in an external file, by using the indirection
              character  '@'. Thus a value of "@filename" causes the file filename to be read for
              a list of filter commands to execute. The commands in the file may be separated  by
              newline characters and/or semicolons, and lines which are blank or which start with
              a '#' character are ignored.

       omode=out|meta|stats|count|checksum|cgi|discard|topcat|samp|tosql|gui
              The mode in which the result table will be output. The default mode is  out,  which
              means  that  the  result  will  be  written as a new table to disk or elsewhere, as
              determined by the out and ofmt parameters. However, there are other  possibilities,
              which correspond to uses to which a table can be put other than outputting it, such
              as displaying metadata, calculating statistics, or populating a  table  in  an  SQL
              database.  For  some  values of this parameter, additional parameters (<mode-args>)
              are required to determine the exact behaviour.

              Possible values are

                * out

                * meta

                * stats

                * count

                * checksum

                * cgi

                * discard

                * topcat

                * samp

                * tosql

                * gui
               Use the help=omode flag or see SUN/256 for more information.

       out=<out-table>
              The location of the output table. This is usually a filename to write to. If it  is
              equal  to  the  special value "-" (the default) the output table will be written to
              standard output.

              This parameter must only be given if omode has its default value of "out".

       ofmt=<out-format>
              Specifies the format in which the output table will be written (one of the ones  in
              SUN/256 - matching is case-insensitive and you can use just the first few letters).
              If it has the special value "(auto)" (the default), then the output  filename  will
              be examined to try to guess what sort of file is required usually by looking at the
              extension. If it's not obvious from the filename what output format is intended, an
              error will result.

              This parameter must only be given if omode has its default value of "out".

       matcher=<matcher-name>
              Defines  the  nature  of the matching that will be performed. Depending on the name
              supplied, this may be positional matching using celestial or Cartesian coordinates,
              exact  matching  on  the  value  of  a  string  column, or other things. A list and
              explanation of the available matching algorithms is given  in  SUN/256.  The  value
              supplied  for  this parameter determines the meanings of the values required by the
              params, values* and tuning parameter(s).

       values1=<expr-list>
              Defines the values from table 1 which are used to determine  whether  a  match  has
              occurred.  These will typically be coordinate values such as RA and Dec and perhaps
              some per-row error values as well, though  exactly  what  values  are  required  is
              determined  by the kind of match as determined by matcher. Depending on the kind of
              match, the number and type of the  values  required  will  be  different.  Multiple
              values  should  be  separated  by  whitespace; if whitespace occurs within a single
              value it must be 'quoted' or "quoted". Elements of the expression list are commonly
              just  column  names,  but may be algebraic expressions calculated from zero or more
              columns as explained in SUN/256.

       values2=<expr-list>
              Defines the values from table 2 which are used to determine  whether  a  match  has
              occurred.  These will typically be coordinate values such as RA and Dec and perhaps
              some per-row error values as well, though  exactly  what  values  are  required  is
              determined  by the kind of match as determined by matcher. Depending on the kind of
              match, the number and type of the  values  required  will  be  different.  Multiple
              values  should  be  separated  by  whitespace; if whitespace occurs within a single
              value it must be 'quoted' or "quoted". Elements of the expression list are commonly
              just  column  names,  but may be algebraic expressions calculated from zero or more
              columns as explained in SUN/256.

       params=<match-params>
              Determines the parameters of this match. This is typically one or  more  tolerances
              such  as  error  radii.  It  may  contain  zero or more values; the values that are
              required depend on the match type selected by the matcher parameter. If it contains
              multiple values, they must be separated by spaces; values which contain a space can
              be 'quoted' or "quoted".

       tuning=<tuning-params>
              Tuning values for the matching process, if appropriate. It may contain zero or more
              values;  the  values  that  are  permitted depend on the match type selected by the
              matcher parameter. If it contains  multiple  values,  they  must  be  separated  by
              spaces;  values which contain a space can be 'quoted' or "quoted". If this optional
              parameter is not supplied, sensible defaults will be chosen.

       join=1and2|1or2|all1|all2|1not2|2not1|1xor2
              Determines which rows are included in the  output  table.  The  matching  algorithm
              determines which of the rows from the first table correspond to which rows from the
              second. This parameter determines what to do with  that  information.  Perhaps  the
              most obvious thing is to write out a table containing only rows which correspond to
              a row in both of the two input tables. However,  you  may  also  want  to  see  the
              unmatched  rows  from  one  or  both input tables, or rows present in one table but
              unmatched in the other, or other possibilities. The options are:

                * 1and2: An output row for each row represented in both input tables (INNER JOIN)

                * 1or2: An output row for each row represented in either or  both  of  the  input
                  tables (FULL OUTER JOIN)

                * all1:  An  output  row for each matched or unmatched row in table 1 (LEFT OUTER
                  JOIN)

                * all2: An output row for each matched or unmatched row in table 2  (RIGHT  OUTER
                  JOIN)

                * 1not2:  An output row only for rows which appear in the first table but are not
                  matched in the second table

                * 2not1: An output row only for rows which appear in the second table but are not
                  matched in the first table

                * 1xor2:  An  output row only for rows represented in one of the input tables but
                  not the other one

       find=all|best|best1|best2
              Determines what happens when a row in one table can be matched by more than one row
              in the other table. The options are:

                * all: All matches. Every match between the two tables is included in the result.
                  Rows from both of the input tables may appear multiple times in the result.

                * best: Best match, symmetric. The best pairs are selected in a way which  treats
                  the two tables symmetrically. Any input row which appears in one result pair is
                  disqualified from appearing in any other result pair, so  each  row  from  both
                  input tables will appear in at most one row in the result.

                * best1:  Best match for each Table 1 row. For each row in table 1, only the best
                  match from table 2 will appear in the result. Each row from table 1 will appear
                  a  maximum  of  once  in  the result, but rows from table 2 may appear multiple
                  times.

                * best2: Best match for each Table 2 row. For each row in table 2, only the  best
                  match from table 1 will appear in the result. Each row from table 2 will appear
                  a maximum of once in the result, but rows from  table  1  may  appear  multiple
                  times.
               The  differences  between  best,  best1 and best2 are a bit subtle. In cases where
              it's obvious which object in each table is the best match for which object  in  the
              other,  choosing  betwen  these  options  will  not  affect the result. However, in
              crowded fields (where the distance between objects within one  or  both  tables  is
              typically  similar  to  or  smaller than the specified match radius) it will make a
              difference. In this case one of the asymmetric options (best1 or best2) is  usually
              more appropriate than best, but you'll have to think about which of them suits your
              requirements. The performance (time and memory usage) of the match may also  differ
              between these options, especially if one table is much bigger than the other.

       fixcols=none|dups|all
              Determines  how  input  columns  are  renamed  before  use in the output table. The
              choices are:

                * none: columns are not renamed

                * dups: columns which would otherwise have duplicate names in the output will  be
                  renamed to indicate which table they came from

                * all: all columns will be renamed to indicate which table they came from
               If columns are renamed, the new ones are determined by suffix* parameters.

       suffix1=<label>
              If  the  fixcols  parameter  is set so that input columns are renamed for insertion
              into the output table, this parameter determines how the renaming is done. It gives
              a suffix which is appended to all renamed columns from table 1.

       suffix2=<label>
              If  the  fixcols  parameter  is set so that input columns are renamed for insertion
              into the output table, this parameter determines how the renaming is done. It gives
              a suffix which is appended to all renamed columns from table 2.

       scorecol=<col-name>
              Gives  the  name  of  a column in the output table to contain the "match score" for
              each pairwise match. The meaning of this column is dependent on the chosen matcher,
              but  it  typically  represents  a  distance  of  some kind between the two matching
              points. If a null value is chosen, no score column will be inserted in  the  output
              table. The default value of this parameter depends on matcher.

       progress=none|log|time|profile
              Determines  whether  information  on  progress of the match should be output to the
              standard error stream as it progresses.  For  lengthy  matches  this  is  a  useful
              reassurance  and  can give guidance about how much longer it will take. It can also
              be useful as a performance diagnostic.

              The options are:

                * none: no progress is shown

                * log: progress information is shown

                * time: progress information and some time profiling information is shown

                * profile: progress information and limited time/memory profiling information are
                  shown

       runner=parallel|parallel<n>|parallel-all|sequential|classic|partest
              Selects the threading implementation. The options are currently:

                * parallel:  uses  multithreaded  implementation  for  large tables, with default
                  parallelism, which is the smaller of 6 and the number of available processors

                * parallel<n>:  uses  multithreaded  implementation  for   large   tables,   with
                  parallelism given by the supplied value <n>

                * parallel-all:  uses  multithreaded  implementation  for  large  tables,  with a
                  parallelism given by the number of available processors

                * sequential: uses multithreaded implementation but with only a single thread

                * classic: uses legacy sequential implementation

                * partest: uses multithreaded implementation even when tables are small
               The parallel* options should normally run faster than sequential or classic (which
              are  provided  mainly  for  testing purposes), at least for large matches and where
              multiple processing cores are available.

              The default value "parallel" is currently limited  to  a  parallelism  of  6  since
              larger  values  yield  diminishing  returns  given  that some parts of the matching
              algorithms run  sequentially  (Amdahl's  Law),  and  using  too  many  threads  can
              sometimes  end  up  doing  more  work  or impacting on other operations on the same
              machine. But you can experiment with other concurrencies, e.g. "parallel16" to  run
              on 16 cores (if available) or "parallel-all" to run on all available cores.

              The  value  of this parameter should make no difference to the matching results. If
              you notice any discrepancies please report them.

SEE ALSO

       stilts(1)

       If the package stilts-doc is installed, the full documentation  SUN/256  is  available  in
       HTML format:
       file:///usr/share/doc/stilts/sun256/index.html

VERSION

       STILTS version 3.4.7-debian

       This  is  the  Debian  version  of Stilts, which lack the support of some file formats and
       network protocols. For differences see
       file:///usr/share/doc/stilts/README.Debian

AUTHOR

       Mark Taylor (Bristol University)

                                             Mar 2017                           STILTS-TMATCH2(1)