lunar (1) agrep.1.gz

Provided by: glimpse_4.18.7-6_amd64 bug

NAME

       agrep  -  search  a  file  for  a  string or regular expression, with approximate matching
       capabilities

SYNOPSIS

       agrep [ -#cdehiklnpstvwxBDGIS ] pattern [ -f patternfile ] [ filename... ]

DESCRIPTION

       agrep searches the input filenames (standard input is the default, but see a warning under
       LIMITATIONS)  for records containing strings which either exactly or approximately match a
       pattern.  A record is by default a line, but it can be defined differently  using  the  -d
       option  (see  below).   Normally,  each  record  found  is  copied to the standard output.
       Approximate matching allows finding records that contain the pattern with  several  errors
       including  substitutions,  insertions,  and  deletions.  For example, Massechusets matches
       Massachusetts with two errors (one substitution and  one  insertion).   Running  agrep  -2
       Massechusets foo outputs all lines in foo containing any string with at most 2 errors from
       Massechusets.

       agrep supports many kinds of queries including arbitrary wild cards, sets of patterns, and
       in  general,  regular  expressions.   See PATTERNS below.  It supports most of the options
       supported by the grep family plus several more (but it is not 100% compatible with  grep).
       For  more  information  on  the  algorithms  used  by  agrep see Wu and Manber, "Fast Text
       Searching  With  Errors,"  Technical  report  #91-11,  Department  of  Computer   Science,
       University  of  Arizona,  June  1991  (available  by  anonymous ftp from cs.arizona.edu in
       agrep/agrep.ps.1), and Wu and Manber, "Agrep  --  A  Fast  Approximate  Pattern  Searching
       Tool",  To  appear  in  USENIX  Conference  1992  January (available by anonymous ftp from
       cs.arizona.edu in agrep/agrep.ps.2).

       As with the rest of the grep family, the characters `$', `^', `',  `[',  `]',  `^',  `|',
       `(', `)', `!', and `\' can cause unexpected results when included in the pattern, as these
       characters are also meaningful to the shell.  To avoid these problems, one  should  always
       enclose  the entire pattern argument in single quotes, i.e., 'pattern'.  Do not use double
       quotes (").

       When agrep is applied to more than one input file, the  name  of  the  file  is  displayed
       preceding  each  line  which  matches  the  pattern.   The  filename is not displayed when
       processing a single file, so if you actually want the filename to appear, use /dev/null as
       a second file in the list.

OPTIONS

       -#     #  is  a  non-negative  integer (at most 8) specifying the maximum number of errors
              permitted in finding the approximate matches (defaults to zero).   Generally,  each
              insertion, deletion, or substitution counts as one error.  It is possible to adjust
              the relative cost of insertions, deletions and substitutions  (see  -I  -D  and  -S
              options).

       -c     Display only the count of matching records.

       -d 'delim'
              Define  delim  to  be the separator between two records.  The default value is '$',
              namely a record is by default a line.  delim can be a string  of  size  at  most  8
              (with  possible  use  of  ^ and $), but not a regular expression.  Text between two
              delim's, before the first delim, and after the last  delim  is  considered  as  one
              record.  For example, -d '$$' defines paragraphs as records and -d '^From ' defines
              mail messages as records.  agrep matches each record separately.  This option  does
              not currently work with regular expressions.

       -e pattern
              Same as a simple pattern argument, but useful when the pattern begins with a `-'.

       -f patternfile
              patternfile  contains  a  set  of  (simple) patterns.  The output is all lines that
              match at least one of the patterns in patternfile.  Currently, the -f option  works
              only  for  exact match and for simple patterns (any meta symbol is interpreted as a
              regular character); it is compatible only with -c, -h, -i, -l, -s, -v, -w,  and  -x
              options.  see LIMITATIONS for size bounds.

       -h     Do not display filenames.

       -i     Case-insensitive search — e.g., "A" and "a" are considered equivalent.

       -k     No  symbol  in  the  pattern is treated as a meta character.  For example, agrep -k
              'a(b|c)*d' foo  will  find  the  occurrences  of  a(b|c)*d  in  foo  whereas  agrep
              'a(b|c)*d'  foo  will  find  substrings  in  foo  that match the regular expression
              'a(b|c)*d'.

       -l     List only the files that contain a match.  This option is useful  for  looking  for
              files  containing a certain pattern.  For example, " agrep -l 'wonderful'  * " will
              list the  names  of  those  files  in  current  directory  that  contain  the  word
              'wonderful'.

       -n     Each line that is printed is prefixed by its record number in the file.

       -p     Find records in the text that contain a supersequence of the pattern.  For example,
               agrep -p DCS foo will match "Department of Computer Science."

       -s     Work  silently, that is, display nothing except error messages.  This is useful for
              checking the error status.

       -t     Output the record starting from the end of delim to (and including) the next delim.
              This is useful for cases where delim should come at the end of the record.

       -v     Inverse mode — display only those records that do not contain the pattern.

       -w     Search for the pattern as a word — i.e., surrounded by non-alphanumeric characters.
              The non-alphanumeric must surround the match;  they cannot be  counted  as  errors.
              For example, agrep -w -1 car will match cars, but not characters.

       -x     The pattern must match the whole line.

       -y     Used  with  -B  option.  When  -y  is on, agrep will always output the best matches
              without giving a prompt.

       -B     Best match mode.  When -B is specified and no exact matches are found,  agrep  will
              continue to search until the closest matches (i.e., the ones with minimum number of
              errors) are found, at which point the following message will be  shown:  "the  best
              match  contains  x  errors, there are y matches, output them? (y/n)" The best match
              mode is not supported for standard input, e.g., pipeline input.  When the  -#,  -c,
              or  -l  options  are  specified,  the  -B option is ignored.  In general, -B may be
              slower than -#, but not by very much.

       -Dk    Set the cost of a deletion to k (k is a positive integer).  This  option  does  not
              currently work with regular expressions.

       -G     Output the files that contain a match.

       -Ik    Set  the cost of an insertion to k (k is a positive integer).  This option does not
              currently work with regular expressions.

       -Sk    Set the cost of a substitution to k (k is a positive integer).   This  option  does
              not currently work with regular expressions.

PATTERNS

       agrep supports a large variety of patterns, including simple strings, strings with classes
       of characters, sets of strings, wild cards, and regular expressions.

       Strings
              any sequence of characters, including the special symbols `^' for beginning of line
              and `$' for end of line.  The special characters listed above ( `$', `^', `', `[',
              `^', `|', `(', `)', `!', and `\' ) should be preceded by `\'  if  they  are  to  be
              matched  as  regular  characters.   For  example, \^abc\\ corresponds to the string
              ^abc\, whereas ^abc corresponds to the string abc at the beginning of a line.

       Classes of characters
              a list of characters inside [] (in order) corresponds to  any  character  from  the
              list.   For  example, [a-ho-z] is any character between a and h or between o and z.
              The symbol `^' inside [] complements the list.   For  example,  [^i-n]  denote  any
              character  in  the  character set except character 'i' to 'n'.  The symbol `^' thus
              has two meanings, but this is consistent with egrep.  The symbol `.'  (don't  care)
              stands for any symbol (except for the newline symbol).

       Boolean operations
              agrep  supports  an  `and'  operation  `;'  and  an  `or'  operation `,', but not a
              combination  of  both.   For  example,  'fast;network'  searches  for  all  records
              containing both words.

       Wild cards
              The  symbol  '#'  is  used  to denote a wild card.  # matches zero or any number of
              arbitrary characters.   For  example,  ex#e  matches  example.   The  symbol  #  is
              equivalent  to  .*  in  egrep.   In  fact,  .* will work too, because it is a valid
              regular expression (see below), but unless  this  is  part  of  an  actual  regular
              expression, # will work faster.

       Combination of exact and approximate matching
              any  pattern inside angle brackets <> must match the text exactly even if the match
              is with errors.  For example, <mathemat>ics matches  mathematical  with  one  error
              (replacing  the last s with an a), but mathe<matics> does not match mathematical no
              matter how many errors we allow.

       Regular expressions
              The syntax of regular expressions in agrep is in  general  the  same  as  that  for
              egrep.   The  union  operation  `|', Kleene closure `*', and parentheses () are all
              supported.  Currently '+' is not  supported.   Regular  expressions  are  currently
              limited to approximately 30 characters (generally excluding meta characters).  Some
              options (-d, -w, -f, -t, -x, -D,  -I,  -S)  do  not  currently  work  with  regular
              expressions.   The maximal number of errors for regular expressions that use '*' or
              '|' is 4.

EXAMPLES

       agrep -2 -c ABCDEFG foo
              gives the number of lines in file foo that contain ABCDEFG within two errors.

       agrep -1 -D2 -S2 'ABCD#YZ' foo
              outputs the lines containing ABCD followed, within arbitrary distance, by YZ,  with
              up  to  one  additional insertion (-D2 and -S2 make deletions and substitutions too
              "expensive").

       agrep -5 -p abcdefghij /path/to/dictionary/words
              outputs the list of all words containing at least 5 of the first 10 letters of  the
              alphabet  in  order.   (Try  it:   any  list starting with academia and ending with
              sacrilegious must mean something!)

       agrep -1 'abc[0-9](de|fg)*[x-z]' foo
              outputs the lines containing, within up to one error, the string that  starts  with
              abc followed by one digit, followed by zero or more repetitions of either de or fg,
              followed by either x, y, or z.

       agrep -d '^From ' 'breakdown;internet' mbox
              outputs all mail messages (the pattern '^From ' separates mail messages in  a  mail
              file) that contain keywords 'breakdown' and 'internet'.

       agrep -d '$$' -1 '<word1> <word2>' foo
              finds  all  paragraphs that contain word1 followed by word2 with one error in place
              of the blank.  In particular, if word1 is the last word in a line and word2 is  the
              first word in the next line, then the space will be substituted by a newline symbol
              and it will match.  Thus, this is a way to overcome separation by a newline.   Note
              that  -d  '$$'  (or  another  delim  which  spans more than one line) is necessary,
              because otherwise agrep searches only one line at a time.

       agrep '^agrep' <this manual>
              outputs all the examples of the use of agrep in this man pages.

SEE ALSO

       ed(1), ex(1), grep(1V), sh(1), csh(1).

BUGS/LIMITATIONS

       Any bug reports or comments will be appreciated!  Please mail them to sw@cs.arizona.edu or
       udi@cs.arizona.edu

       Regular  expressions  do  not  support  the '+' operator (match 1 or more instances of the
       preceding token).  These can be searched for by using this syntax in the pattern:

          'pattern(pattern)*'

       (search for strings containing one  instance  of  the  pattern,  followed  by  0  or  more
       instances of the pattern).

       The following can cause an infinite loop: agrep pattern * > output_file.  If the number of
       matches is high, they may be deposited in output_file before it is completely read leading
       to  more  matches  of  the  pattern  within output_file (the matches are against the whole
       directory).  It's not clear whether this is a "bug"  (grep  will  do  the  same),  but  be
       warned.

       The  maximum  size  of  the  patternfile is limited to be 250Kb, and the maximum number of
       patterns is limited to be 30,000.

       Standard input is the default if no input file is given.  However, if  standard  input  is
       keyed  in directly (as opposed to through a pipe, for example) agrep may not work for some
       non-simple patterns.

       There is no size limit for simple  patterns.   More  complicated  patterns  are  currently
       limited  to  approximately  30 characters.  Lines are limited to 1024 characters.  Records
       are limited to 48K, and may be truncated if they are  larger  than  that.   The  limit  of
       record length can be changed by modifying the parameter Max_record in agrep.h.

DIAGNOSTICS

       Exit  status is 0 if any matches are found, 1 if none, 2 for syntax errors or inaccessible
       files.

AUTHORS

       Sun Wu and Udi Manber, Department of Computer Science, University of Arizona,  Tucson,  AZ
       85721.  {sw|udi}@cs.arizona.edu.

                                           Jan 17, 1992                                  AGREP(1)