lunar (1) pdfgrep.1.gz

Provided by: pdfgrep_2.1.2-1build1_amd64 bug

NAME

       pdfgrep - search PDF files for a regular expression

SYNOPSIS

       pdfgrep [OPTION...] PATTERN [FILE...]
       pdfgrep [OPTION...] [-e PATTERN | -f FILE] [FILE...]

DESCRIPTION

       Search for PATTERN in each PDF FILE and print matching lines. By default, PATTERN is an
       extended regular expression.

       pdfgrep tries to be mostly compatible with GNU grep with some PDF-specific distinctions
       and additional options. Most notably, -n prints page instead of line numbers.

OPTIONS

   General Information
       --help
           Print a short summary of the options.

       -V, --version
           Show version information.

   Pattern Interpretation
       -F, --fixed-strings
           Interpret PATTERN as a list of fixed strings separated by newlines, any of which is to
           be matched.

       -P, --perl-regexp
           Interpret PATTERN as a Perl compatible regular expression (PCRE). See pcresyntax(3)
           for a quick overview.

   Matching Control
       -e PATTERN, --regexp=PATTERN
           Use PATTERN as the pattern to search for. If this option is specified multiple times
           or combined with --file, all patterns are tried in turn until one of them matches.

       -f FILE, --file=FILE
           Read patterns from FILE, one per line. If FILE contains multiple patterns or if this
           option is applied multiple times or combined with -e, all patterns are tried in turn
           until one of them matches. An empty pattern list matches nothing.

       -i, --ignore-case
           Ignore case distinctions in both the PATTERN and the input files.

   General Output Control
       -c, --count
           Suppress normal output. Instead print the number of matches for each input file. Note
           that unlike grep, multiple matches on the same page will be counted individually.

       -p, --page-count
           Like -c, but prints the number of matches per page. Implies -n.

       --color WHEN
           Surround file names, page numbers and matched text with escape sequences to display
           them in color on the terminal.  WHEN can be:

           always   Always use colors, even when
                    stdout is not a terminal.
           never    Do not use colors.
           auto     Use colors only when stdout is a
                    terminal (this is the default).

       -L, --files-without-match
           Suppress normal output. Instead print the name of each input file that doesn’t contain
           a match. This works well with -Z, but many other output options like -n or -c are
           ignored when -L is specified.

       -l, --files-with-matches
           Suppress normal output. Instead print the name of each input file that contains a
           match. This works well with -Z, but many other output options like -n or -c are
           ignored when -l is specified.

       -m, --max-count NUM
           Stop reading a file after NUM matches. When the -c or --count option is also used,
           pdfgrep does not output a count greater than NUM.

       -o, --only-matching
           Print only the matched part of a line without any surrounding context.

       -q, --quiet
           Suppress all normal output to stdout. Exit immediately with exit status 0 if a match
           is found, even in case of errors. Use this if you only care about the presence of
           matches, not their number or content.

   Line Prefix Control
       -H, --with-filename
           Print the file name for each match. This is the default setting when there is more
           than one file to search.

       -h, --no-filename
           Suppress the prefixing of file name on output. This is the default setting when there
           is only one file to search.

       -n, --page-number
           Prefix each match with the number of the page where it was found.

       -Z, --null
           Output a null byte (called NUL in ASCII and '\0' in C) instead of the colon that
           usually separates a filename from the rest of the line. This option makes the output
           unambiguous in the presence of colons, spaces or newlines in the filename. It can be
           used in conjunction with commands such as xargs -0 or perl -0.

       --match-prefix-separator SEP
           Changes the colon used to separate filename, line number and text in the output to
           SEP, which can be an arbitrary string. This is useful when filenames contain colons,
           but only for interactive usage. For scripting, --null should be used.

   Context Control
       -A NUM, --after-context=NUM
           Print NUM lines of context after matching lines. Contiguous groups of matches are
           separated by a line containing --. With -o, this option has no effect.

       -B NUM, --before-context=NUM
           Print NUM lines of context before matching lines. Contiguous groups of matches are
           separated by a line containing --. With -o, this option has no effect.

       -C NUM, --context=NUM
           Print NUM lines of context before and after matching lines. Contiguous groups of
           matches are separated by a line containing --. With -o, this option has no effect.

   File Selection
       -r, --recursive
           Recursively search all files (restricted by --include and --exclude) under each
           directory, following symlinks only if they are on the command line.

       -R, --dereference-recursive
           Same as -r, but follows all symlinks.

       --exclude=GLOB
           Skip files whose base name matches GLOB. See glob(7) for wildcards you can use. You
           can use this option multiple times to exclude more patterns. It takes precedence over
           --include. Note, that in- and excludes apply only to files found via --recursive and
           not to the argument list.

       --include=GLOB
           Only search files whose base name matches GLOB. See --exclude for details. The default
           is *.pdf.

   Other Options
       --cache
           Use a cache for the rendered text to speed up the operation on large files.

       --password=PASSWORD
           Use PASSWORD to decrypt the PDF-files. Can be specified multiple times; all passwords
           will be tried on all PDFs.  Note that this password will show up in your command
           history and the output of ps(1). So please do not use this if the security of PASSWORD
           is important.

       --page-range=RANGE
           Limit search to a specified set of pages.  RANGE is a comma separated list of either a
           single page number or a range expression of the form PAGE1-PAGE2. Example: 2-3,5,7-10.

       --debug
           Enable debug output.  Note: Due to limitations of poppler before version 0.30.0, some
           debug output is also printed without --debug when using such a poppler version.

       --warn-empty
           Print a warning to stderr if a PDF contains no searchable text. This is the case for
           PDFs that consist only of images, for example scanned documents.

       --unac
           Remove accents and ligatures from both the search pattern and the PDF documents. This
           is useful if you want to search for a word containing "ae", but the PDF uses the
           single character "æ" instead. See unac(3) and unaccent(1) for details.

           This option is experimental and only available if pdfgrep is compiled with unac
           support.

EXIT STATUS

       Normally, the exit status is 0 if at least one match is found, 1 if no match is found and
       2 if an error occurred. But if the --quiet or -q option is used and a match was found,
       pdfgrep will return 0 regardless of errors.

ENVIRONMENT VARIABLES

       The behavior of pdfgrep is affected by the following environment variable.

       GREP_COLORS
           Specifies the colors and other attributes used to highlight various parts of the
           output. The syntax and values are like GREP_COLORS of grep. See grep(1) for more
           details. Currently only the capabilities mt, ms, mc, fn, ln and se are used by
           pdfgrep, where mt, ms and mc have the same effect.

FILES

       ${XDG_CACHE_HOME}/pdfgrep/*
           Cache files written and used when --cache is enabled. At most 200 cache entries older
           than a day are retained.

EXAMPLES

       Print the first ten lines matching pattern and print their page number:

               pdfgrep -n --max-count 10 pattern foo.pdf

       Search all .pdf files whose names begin with foo recursively in the current directory:

               pdfgrep -r --include "foo*.pdf" pattern

       Search all PDFs in the current directory for foo that also contain bar:

               pdfgrep -Z --files-with-matches "bar" *.pdf | xargs -0 pdfgrep -H foo

       Search all .pdf files that are smaller than 12M recursively in the current directory:

               find . -name "*.pdf" -size -12M -print0 | xargs -0 pdfgrep pattern

           Note that in contrast to the previous examples, this task could not be solved with
           pdfgrep alone, but the Unix tools find(1) and xargs(1) had to be used. That’s because
           pdfgrep itself doesn’t include options to exclude files by their size. But as you see,
           it doesn’t have to!

BUGS

   Reporting Bugs
       Bugs can either be reportet to the mailing list (pdfgrep-users@pdfgrep.org) or to the
       bugtracker on gitlab (https://gitlab.com/pdfgrep/pdfgrep/issues).

AUTHORS

       pdfgrep is maintained by Hans-Peter Deifel.

       See the AUTHORS file in the source for a full list of contributors.

SEE ALSO

       grep(1), pcre(3), regex(7)

       See pdfgrep’s website https://pdfgrep.org for more information, downloads, git repository
       and more.