jammy (1) msort.1.gz

Provided by: msort_8.53-2.3build1_amd64 bug

NAME

       msort - sort records in complex ways

SYNOPSIS

       msort <options> [<input file>]

DESCRIPTION

       msort  is  a  program  for  sorting  text  files  in sophisticated ways.  It was developed
       initially for alphabetizing dictionaries of languages in which the ordering may  be  quite
       different from English but has many other uses.

       msort  allows  you  to  sort blocks of text delimited in a number of ways rather than just
       lines and to specify particular fields of  a  record  as  sort  keys  using  either  their
       position, counted from either end, or by matching regular expressions to their tags.

       msort is capable of sorting on multiple keys, so that when two records tie on one key, the
       tie may be broken on another. Any or all keys may be optional.  How absent  optional  keys
       are ordered with respect to present keys may be set separately for each key.

       msort  allows  you  to  specify  arbitrary  sort  orders and to define virtually unlimited
       numbers of multigraphs of effectively unlimited length.  The sort  order  and  multigraphs
       are  defined  separately for each key. If your system has locale support, you can also use
       locale collation rules instead of specify your own sort order.

       msort provides twelve types of key comparison:  lexicographic,  numeric,  numeric  string,
       hybrid, by string length, by angle, by date, by domain name, by time, by ISO8601 date/time
       stamp, by month name, and random.

       What month names are used is a bit complicated. If the -s flag is used on the same key and
       its  argument  is the name of a file, the month names are read from the file, which should
       be in the same format as a sort order definition file. If the -s  flag  is  used  and  its
       argument  is  a  locale  name,  the  month  names  recognized  will be the month names and
       abbreviations associated with the specified locale. If the -s flag is not used  the  month
       names  recognized  will  be  the month names and abbreviations associated with the current
       locale. If your system does not have locale support and you do not use the -s flag to read
       the  month  names  from a file, the month names recognized will be the English month names
       and abbreviations.

       msort can reverse the characters in a key, allowing it to  be  used  to  generate  reverse
       dictionaries.

       A choice of sorting algorithms is provided.

       msort  fully supports Unicode. The text to be sorted, and all specifications, should be in
       UTF-8 Unicode. (If you have plain ASCII text, this is not a problem as ASCII is  a  subset
       of  Unicode.)  Full  Unicode case-folding is available, in Turkic and non-Turkic variants.
       Unicode normalization is performed before sorting.

       For usage information, execute msort with no arguments.

       Full information about msort is currently to be found in the reference  manual,  which  is
       distributed  as a PDF (Portable Document Format) file. If a copy is not available locally,
       you can download it from msort's home page:
       http://billposer.org/Software/msort.html

OPTIONS

   Informational options
       -h,--help
              Print usage message

       -v,--version
              Print version message

       -D,--defaults
              List defaults

       -F,--general-options
              List general command line options

       -G,--gnu-equivalences
              List equivalents for GNU sort command line options.

       -H,--informational-options
              List informational command line options

       -K,--key-specific-options
              List key-specific command line options

       -L,--limits
              List limits

       -N,--number-systems
              List the supported number systems.

   General options
       -b,--block
              A record is terminated by two or more newlines

       -l,--line
              A record consists of a single line

       -r,--record-separator <separator>
              A record is terminated by separator character

       -O,--fixed-size-record <bytes>
              A record consists of the specified number of bytes.

       -d,--field-separators <character>+
              Fields are delimited by the named character(s)

       -w,--whole
              Sort on the entire text of the record

       -a,--algorithm <algorithm>
              Use the specified sort algorithm. The choices  are:  I(nsertionSort),  M(ergeSort),
              Q(uickSort),  and  S(hellSort).   Note that InsertionSort and MergeSort are stable,
              while QuickSort and ShellSort are unstable. The default is QuickSort.

       -M,-initial-maximum-records <records>
              Set initial maximum number of records

       -m,--line-end-carriage-return
              End-of-line in the input data is  marked  by  Carriage  Return  (0x0D)  as  on  the
              Macintosh rather than by Line Feed (0x0A) as on Unix systems.

       -I,--invert-globally
              Invert sense of comparisons globally

       -B,--BMP
              No characters fall outside the Basic Multingual Plane (that is, have values greater
              than 0xFFFF).

       -Z,--skip-first-record
              Copy the first record in the input to the output without sorting it. This is useful
              for sorting files with a header.

       -p,--reserve-private-use-area
              Do  not  make  internal  use  of the Private Use areas. By default, multigraphs are
              assigned internally to codepoints in the Supplementary Private Use  areas  if  full
              Unicode  is  in use or to codepoints in the Private Use area if input is restricted
              to the Basic Multilingual Plane by means of the -B option. If your input makes  use
              of  the  Private  Use  areas, this option prevents interference with your input. In
              this case, multigraphs will be  assigned  to  the  Low  and  High  Surrogate  areas
              (0xD800-0xDFFF). Note that this limits the number of multigraphs to 2,048.

       -P,--random-seed <seed>
              Set the seed for the random number generator. If not set here, it is set to a value
              determined by the time. The seed used is reported in the log.  This  option  allows
              runs to be replicated.

       -Q,--check-only
              Check whether the input is already sorted. Do not generate any output.  Exit status
              is 0 if input is already sorted, 11 if not sorted.

       -1,--in <input file name>

       -2,--out <output file name>
              If the output file is  the  same  as  the  input  file,  the  input  file  will  be
              overwritten. The input file will not be overwritten if the run is unsuccessful.

       -j,--suppress-log
              Suppress output to the log. If this flag is given before there is any output to the
              log from a command line flag, nothing will be written to the log and the  log  file
              will  not  be  created.  If a command line flag generates a log message before this
              flag is processed, the log file will be created but no log messages will be written
              to  it  once  this  flag is processed. To guarantee that no attempt will be made to
              open a log file, give this flag first.

       -q,--quiet
              Be quiet - do not chat while working

       -u,--unicode-normalization <mode>
              Select Unicode normalization mode. The choices of mode  are:  c  for  normalization
              form C (NFC), d for normalization form D (NFD), C for normalization form KC (NFKC),
              D for normalization form KD (NFKD), and n for no normalization. The default is NFC.

   Key specific options
       -e,--character-range <m,n>
              Sort on characters m through n. Positive indices start from one.  Negative  indices
              indicate  position  with  respect to the end of the record.  For example, the range
              3,-2 consists of the third character through the next-to-last character.

       -n,--position <POS>(,<POS>)
              Sort on the specified POS or contiguous range of POSs, where a POS is of  the  form
              <field  number>(.<character  number>). Both counts begin at one.  Field numbers but
              not character numbers may be negative, in which case  they  are  counted  from  the
              right.  Thus,  1.2  is  the  second character of the first field; -2.1 is the first
              character of the next to last field.

       -t,--tag <tag regexp>
              Sort on the field with the specified tag

       -o,--optional <comparison>
              Optional: compare as (<,=,>) to present key if absent

       -C,--fold-case
              Fold case

       -z,--fold-case-turkic
              Fold case with additional Turkic conversions.

       -c,--comparison-type <comparison type>
              a(ngle),l(exicographic), i(so8601 date/time), t(ime), D(omain name/email  address),
              d(ate), m(onth name), n(umeric), N(umeric string),s(ize), h(hybrid), r(andom)

       -y,--number-system <number system>
              Specifies  the  number  system expected for this key. This affects only numeric and
              numeric string keys. There are two special values. If the number system  is  "all",
              records  may  contain any number system that msort can interpret. Different records
              may contain different number systems.  If the number system is "any",  records  may
              contain  any writing system that msort can interpret, but all records must make use
              of the same number system.  msort sets the number system on the basis of the  first
              record.

       -f,--date-format <date format>
              Permutation of ymd with separators, e.g. y-m-d for international date format, m/d/y
              for American date format, or a permutation of yd with  separators,  e.g.  y-d,  for
              day-of-year  dates.  All  three  components  may be numbers in any available number
              system. The month field may also be a month name, determined by the same devices as
              independent month name fields.

       -W,--sort-order-file-separators <file name>
              Read  the  list  of  characters  to  be  treated  as  separators  in the sort order
              definition file.

       -S,--substitutions <file name>
              Read substitutions from named file

       -s,--sort-order <file name>|<locale name>|"locale"
              If the argument is a file name, it is taken to be a sort order file  and  the  sort
              order  for  the  key  is  read from the file. If the argument is a locale name, the
              collation rules for that  locale  are  used.  If  the  argument  is  "locale",  the
              collation rules for the current locale are used.

       -T,--transformations <(d)(e)(s)>
              Apply  the  specified  transformations.   d  specifies  that  diacritics  are to be
              stripped. Separately encoded combining  diacritics  are  removed.  Characters  with
              diacritics  represented  by  single  codepoints are replaced with the corresponding
              ASCII character without the diacritics, if there is one.  e specifies that enclosed
              characters,  that  is, characters within circles or parentheses, are to be replaced
              with the corresponding plain ASCII character if there is  one.   s  specifies  that
              characters  in special styles are to be replaced with the corresponding plain ASCII
              character if there is one. Stylistic  equivalents  include:  small  capitals  (e.g.
              U+1D04),  script  forms  (e.g.  U+212C),  black  letter forms (e.g. U+212D), Arabic
              presentation  forms  (e.g.  U+FE81),  Hebrew  presentation  forms  (e.g.   U+FB1D),
              fullwidth  forms (e.g. U+FF01), halfwidth forms (e.g. U+FF7B), and the mathematical
              alphanumeric symbols (e.g. U+1D400).

       -x,--exclusion-file <file name>
              Read exclusions from named file

       -X,--exclude-characters <exclusions>
              Exclude specified characters

       -i,--invert-locally
              Invert sense of comparisons

       -R,--reverse-key
              Reverse characters of key

       -A,--first-character-only
              Ignore all but the first character of the field, after  substitutions,  exclusions,
              etc.

       Note: long options may not be available on your system.

SEE ALSO

       sort(1), uninum(3)

AUTHOR

       Bill Poser (billposer@alum.mit.edu)

LICENSE

       GNU General Public License (http://www.gnu.org/licenses/gpl.html), version 3.