Provided by: pdfsandwich_0.1.4-1build2_amd64 bug

NAME

       pdfsandwich - A generator for sandwich OCR pdfs from scanned pdf files

SYNOPSIS

       pdfsandwich [options] inputfile.pdf

DESCRIPTION

       pdfsandwich  generates  "sandwich" OCR pdf files, i.e. pdf files which contain only images
       (no text) will be processed by optical character recognition (OCR) and the  text  will  be
       added  to  each  page  invisibly  "behind"  the  images.   Note that pdfsandwich needs the
       following programs: unpaper, convert, gs, hocr2pdf (for tesseract < 3.03), and  tesseract.
       As  tesseract  >=  3.03 can write pdf files, hocr2pdf is only needed for older versions of
       tesseract.  Please visit http://www.tobias-elze.de/pdfsandwich.

OPTIONS

       -convert
              -convert filename : name of convert binary (default: convert)

       -coo   -coo  options  :  additional  convert  options;  make  sure  to  quote;  e.g.  -coo
              "-normalize  -black-threshold  75%"  call  convert  --help  or  man convert for all
              convert options

       -debug keep all temporary files in /tmp (for debugging)

       -enforcehocr2pdf
              use hocr2pdf even if tesseract >= 3.03

       -first_page
              -first_page number : number of page to start OCR from (default: 1)

       -grayfilter
              enable unpaper's gray filter; further options can be set by -unpo

       -gs    -gs filename : name of gs binary (default: gs)

       -hocr2pdf
              -hocr2pdf filename : name of  hocr2pdf  binary  (default:  hocr2pdf);  ignored  for
              tesseract >= 3.03 unless option -enforcehocr2pdf is set

       -hoo   -hoo options : additional hocr2pdf options; make sure to quote

       -identify
              -identify filename : name of identify binary (default: identify)

       -last_page
              -last_page  number  : number of page up to which to process OCR (default: number of
              pages in inputfile)

       -lang  -lang language : language of the text; option to tesseract (defaut: eng) e.g:  eng,
              deu,  deu-frak,  fra,  rus,  swe,  spa,  ita, ...  see option -list_langs; Multiple
              languages may be specified, separated by plus characters.

       -layout
              -layout { single | double | none } : layout of the scanned pages; requires  unpaper
              single:  one  page  per  sheet  double:  two  pages  per sheet none: no auto-layout
              (default)

       -list_langs
              list currently available  languages  and  exit;  in  case  of  custom  binaries  of
              tesseract, place this after the -tesseract option

       -maxpixels
              -maxpixels   NUM   :   maximal   number   of  pixels  allowed  for  input  file  if
              (resolution/72)^2 *width*height > maxpixels then scale  page  of  input  file  down
              prior  to  OCR  so  that  page  size  in  pixels corresponds to maxpixels; default:
              17415167 (A3 @ 300 dpi)

       -noimage
              do  not  place  the  image  over  the  text  (requires  hocr2pdf;  ignored  without
              -enforcehocr2pdf option)

       -nopreproc
              do not preprocess with unpaper

       -nthreads
              -nthreads  number : number of parallel threads (default: guessed number of CPUs; if
              guessing fails: 1)

       -o     -o filename : output file; default: inputfile_ocr.pdf (if  extension  is  different
              from .pdf, original extension is kept)

       -pagesize
              -pagesize  {  original  | NUMxNUM } : set page size of output pdf original: same as
              input file (default) NUMxNUM: width x height  in  pixel  (e.g.  for  A4:  -pagesize
              595x842)

       -resolution
              -resolution NUM : resolution (dpi) used for OCR (default: 300)

       -rgb   use  RGB  color  space for images (default: black and white); use with care: causes
              problems with some color spaces

       -sloppy_text
              sloppily place text, group words, do not draw single glyphs; ignored for  tesseract
              >= 3.03 unless option -enforcehocr2pdf is set

       -tesseract
              -tesseract filename : name of tesseract binary (default: tesseract)

       -tesso -tesso options : additional tesseract options; make sure to quote

       -unpaper
              -unpaper filename : name of unpaper binary (default: unpaper)

       -unpo  -unpo options : additional unpaper options; make sure to quote

       -quiet suppress output

       -verbose
              produce more output

       -version
              print version and quit

       -help  Display this list of options

       --help Display this list of options

LANGUAGES

       Via    Tesseract,   numerous   language   packagess   available   -   follow   this   link
       http://code.google.com/p/tesseract-ocr/downloads/list for a  complete  list.  Here  is  an
       incomplete selection of supported languages and their abbreviations:

       ara  (Arabic),  aze  (Azerbauijani),  bul (Bulgarian), cat (Catalan), ces (Czech), chi_sim
       (Simplified Chinese), chi_tra (Traditional Chinese), chr (Cherokee),  dan  (Danish),  dan-
       frak  (Danish (Fraktur)), deu (German), ell (Greek), eng (English), enm (Old English), epo
       (Esperanto),  est  (Estonian),  fin  (Finnish),  fra  (French),  frm  (Old  French),   glg
       (Galician),  heb (Hebrew), hin (Hindi), hrv (Croation), hun (Hungarian), ind (Indonesian),
       ita (Italian), jpn (Japanese), kor (Korean), lav (Latvian), lit (Lithuanian), nld (Dutch),
       nor  (Norwegian),  pol  (Polish),  por  (Portuguese),  ron  (Romanian), rus (Russian), slk
       (Slovakian), slv (Slovenian), sqi (Albanian), spa (Spanish), srp (Serbian), swe (Swedish),
       tam  (Tamil), tel (Telugu), tgl (Tagalog), tha (Thai), tur (Turkish), ukr (Ukrainian), vie
       (Vietnamese)

       Multiple languages  may  be  specified,  separated  by  plus  characters.  Note  that  the
       respective tesseract language package needs to be installed on your system to be usable by
       pdfsandwich. Option -list_langs lists the languages which are available on your system.

AVAILABILITY

       Sources and packages as well as comprehensive help  can  be  found  at  http://www.tobias-
       elze.de/pdfsandwich.

AUTHOR

       Tobias Elze <sourceforge@tobias-elze.de>

                                         12 February 2016                          PDFSANDWICH(1)