Ubuntu Manpage: gscan2pdf - A GUI to produce PDFs or DjVus from scanned documents

NAME

       gscan2pdf - A GUI to produce PDFs or DjVus from scanned documents

USAGE

       1. Scan one or several pages in with File/Scan
       2. Create PDF of selected pages with File/Save

REQUIRED ARGUMENTS

       None

OPTIONS

       gscan2pdf has the following command-line options:

       --device=<device> Specifies the device to use, instead of getting the list of devices from via the SANE
       API. This can be useful if the scanner is on a remote computer which is not broadcasting its existence.
       --help Displays this help page and exits.
       --log=<log file> Specifies a file to store logging messages.
       --(debug|info|warn|error|fatal) Defines the log level. If a log file is specified, this defaults to
       'debug', otherwise 'warn'.
       --version Displays the program version and exits.

       Scanning  is  handled  with  SANE  via  scanimage.   PDF conversion is done by PDF::API2.  TIFF export is
       handled by libtiff (faster and smaller memory footprint for multipage files).

DIAGNOSTICS

       To diagnose a possible error, start gscan2pdf from the command line with logging enabled:

       "gscan2pdf --log=file.log"

       and check file.log.

EXIT STATUS

       None

CONFIGURATION

       gscan2pdf creates a text resource file  called  .gscan2pdf  in  the  user's  home  directory.  Generally,
       however,  preferences  should  be  changed  via  the Edit/Preferences menu, or are captured automatically
       during normal usage of the program.

INCOMPATIBILITIES

       None known.

BUGS AND LIMITATIONS

       Whilst it is possible to import PDFs, this is  intended  to  be  able  to  round-trip  files  created  by
       gscan2pdf. Hence, only the images are imported, and all text is ignored.

Download

       gscan2pdf is available on Sourceforge (<http://sourceforge.net/projects/gscan2pdf/files/gscan2pdf/>).

   Debian-based
       If you are using Debian, you should find that sid has the latest version already packaged.

       If you are using a Ubuntu-based system, you can automatically keep up to date with the latest version via
       the ppa:

       "sudo apt-add-repository ppa:jeffreyratcliffe/ppa"

       If you are you are using Synaptic, then use menu Edit/Reload Package Information, search for gscan2pdf in
       the package list, and lo and behold, you can install the nice shiny new version.

       From the command line:

       "sudo apt-get update"

       "sudo apt-get install gscan2pdf"

   RPMs
       Download the rpm from Sourceforge, and then install it with "rpm -i gscan2pdf-version.rpm"

   From source
       The   source   is   hosted   in   the   files   section   of   the   gscan2pdf   project  on  Sourceforge
       (<http://sourceforge.net/projects/gscan2pdf/files/>).

   From the repository
       gscan2pdf   uses   Git   for   its   Revision   Control   System.   You   can   browse   the   tree    at
       <http://sourceforge.net/p/gscan2pdf/code/>.

       Git users can clone the complete tree with "git clone git://git.code.sf.net/p/gscan2pdf/code"

Building gscan2pdf from source

       Having  downloaded  the source either from a Sourceforge file release, or from the Git repository, unpack
       it if necessary with "tar xvfz gscan2pdf-x.x.x.tar.gz cd gscan2pdf-x.x.x"

       "perl Makefile.PL", will create the Makefile.  There is a "make test", but this is not machine-dependent,
       and therefore really just for my benefit to make sure  I  haven't  broken  the  device-dependent  options
       parsing routine.

       You  can  install  directly from the source with "make install", but building the appropriate package for
       your distribution should be as straightforward as "make debdist" or "make  rpmdist".  However,  you  will
       additionally need the rpm, devscripts, fakeroot, debhelper and gettext packages.

Dependencies

The list below looks daunting, but all packages are available from any reasonable up-to-date
distribution. If you are using Synaptic, having installed gscan2pdf, locate the gscan2pdf entry in
Synaptic, right-click it and you can install them under Recommends. Note also that the library names
given below are the Debian/Ubuntu ones. Those distributions using RPM typically use perl(module) where
Debian has libmodule-perl.

Required
libgtk2.0-0 (>= 2.4)
The GTK+ graphical user interface library.

libglib-perl (>= 1.100-1)
Perl interface to the GLib and GObject libraries

libgtk2-perl (>= 1:1.043-1)
Perl interface to the 2.x series of the Gimp Toolkit library

libgtk2-imageview-perl
Perl bindings to the gtkimageview widget. See <http://trac.bjourne.webfactional.com/>

libgtk2-ex-simple-list-perl
A simple interface to Gtk2's complex MVC list widget

liblocale-gettext-perl (>= 1.05)
Using libc functions for internationalisation in Perl

libpdf-api2-perl
provides the functions for creating PDF documents in Perl

libsane
API library for scanners

libsane-perl
Perl bindings for libsane.

libset-intspan-perl
manages sets of integers

libtiff-tools
TIFF manipulation and conversion tools

Imagemagick
Image manipulation programs

perlmagick
A perl interface to the libMagick graphics routines

sane-utils
API library for scanners -- utilities.

Optional
sane
scanner graphical frontends. Only required for the scanadf frontend.

libgtk2-ex-podviewer-perl
Perl Gtk2 widget for displaying Plain Old Documentation (POD). Not required if you don't need the
gscan2pdf documentation (which is anyway repeated on the website).

unpaper
post-processing tool for scanned pages. See <http://unpaper.berlios.de/>.

xdg-utils
Desktop integration utilities from freedesktop.org. Required for Email as PDF. See
<http://portland.freedesktop.org/wiki/>

djvulibre-bin
Utilities for the DjVu image format. See <http://djvu.sourceforge.net/>

gocr
A command line OCR. See <http://jocr.sourceforge.net/>.

tesseract
A command line OCR. See <http://code.google.com/p/tesseract-ocr/>

ocropus
A command line OCR. See <http://code.google.com/p/ocropus/>

cuneiform
A command line OCR. See <http://launchpad.net/cuneiform-linux>

Support

       There are two mailing lists for gscan2pdf:

       gscan2pdf-announce
           A  low-traffic  list  for  announcements,  mostly   of   new   releases.   You   can   subscribe   at
           <http://lists.sourceforge.net/lists/listinfo/gscan2pdf-announce>

       gscan2pdf-help
           General        support,        questions,        etc..        You        can       subscribe       at
           <http://lists.sourceforge.net/lists/listinfo/gscan2pdf-help>

Reporting bugs

       Before reporting bugs, please read the "FAQs" section.

       Please report any bugs found, preferably against the Debian package[1][2].  You  do  not  need  to  be  a
       Debian user, or set up an account to do this.

       1. http://packages.debian.org/sid/gscan2pdf
       2. http://www.debian.org/Bugs/

       Alternatively,    there    is    a    bug    tracker   for   the   gscan2pdf   project   on   Sourceforge
       (<http://sourceforge.net/p/gscan2pdf/_list/tickets?source=navbar>).

       Please include the log file created by "gscan2pdf --log=log" with any new bug report.

Translations

       gscan2pdf has already been partly translated several languages.  If you would like to  contribute  to  an
       existing or new translation, please check out Rosetta: <https://translations.launchpad.net/gscan2pdf>

       Note  that  the  translations for the scanner options are taken directly from sane-backends. If you would
       like  to  contribute  to  these,  you  can  do  so  either  at  contact  the  sane-devel   mailing   list
       (sane-devel@lists.alioth.debian.org)   and  have  a  look  at  the  po/  directory  in  the  source  code
       <http://www.sane-project.org/cvs.html>.

       Alternatively, Ubuntu has its own translation  project.  For  the  9.04  release,  the  translations  are
       available at <https://translations.launchpad.net/ubuntu/jaunty/+source/sane-backends/+pots/sane-backends>

DESCRIPTION

File
New

Clears the page list.

Open

Opens any format that imagemagick supports. PDFs will have their embedded images extracted and imported
one per page.

Scan

Sets options before scanning via SANE.

Device

Chooses between available scanners.

# Pages

Selects the number of pages, or all pages to scan.

Source document

Selects between single sided or double sides pages.

This affects the page numbering. Single sided scans are numbered consecutively. Double sided scans are
incremented (or decremented, see below) by 2, i.e. 1, 3, 5, etc..

Side to scan

If double sided is selected above, assuming a non-duplex scanner, i.e. a scanner that cannot
automatically scan both sides of a page, this determines whether the page number is incremented or
decremented by 2.

To scan both sides of three pages, i.e. 6 sides:

1. Select:
# Pages = 3 (or "all" if your scanner can detect when it is out of paper)

Double sided

Facing side

2. Scans sides 1, 3 & 5.
3. Put pile back with scanner ready to scan back of last page.
4. Select:
# Pages = 3 (or "all" if your scanner can detect when it is out of paper)

Double sided

Reverse side

5. Scans sides 6, 4 & 2.
6. gscan2pdf automatically sorts the pages so that they appear in the correct order.

Device-dependent options

These, naturally, depend on your scanner. They can include

Page size.
Mode (colour/black & white/greyscale)
Resolution (in PPI)
Batch-scan
Guarantees that a "no documents" condition will be returned after the last scanned page, to prevent
endless flatbed scans after a batch scan.

Wait-for-button/Button-wait
After sending the scan command, wait until the button on the scanner is pressed before actually
starting the scan process.

Source
Selects the document source. Possible options can include Flatbed or ADF. On some scanners, this is
the only way of generating an out-of-documents signal.

Save

Saves the selected or all pages as a PDF, DjVu, TIFF, PNG, JPEG, PNM or GIF.

PDF Metadata

Metadata are information that are not visible when viewing the PDF, but are embedded in the file and so
searchable and can be examined, typically with the "Properties" option of the PDF viewer.

The metadata are completely optional, but can also be used to generate the filename see preferences for
details.

DjVu

Both black and white, and colour images produce better compression than PDF. See
<http://www.djvuzone.org/> for more details.

Email as PDF

Attaches the selected or all pages as a PDF to a blank email. This requires xdg-email, which is in the
xdg-utils package. If this is not present, the option is ghosted out.

Prints the selected or all pages.

Compress temporary files

If your temporary ($TMPDIR) directory is getting full, this function can be useful - compressing all
images at LZW-compressed TIFFs. These require much less space than the PNM files that are typically
produced by SANE or by importing a PDF.

Edit
Delete

Deletes the selected page.

Renumber

Renumbers the pages from 1..n.

Note that the page order can also be changed by drag and drop in the thumbnail view.

Select

The select menus can be used to select, all, even, odd, blank, dark or modified pages. Selecting blank or
dark pages runs imagemagick to make the decision. Selecting modified pages selects those which have
modified by threshold, unsharp, etc., since the last OCR run was made.

Preferences

The preferences menu item allows the control of the default behaviour of various functions. Most of these
are self-explanatory.

Frontend

gscan2pdf supports two frontends, scanimage and scanadf. scanadf support was added when it was realised
that scanadf works better than scanimage with some scanners. On Debian-based systems, scanadf is in the
sane package, not, like scanimage, in sane-utils. If scanadf is not present, the option is obviously
ghosted out.

In 0.9.27, Perl bindings for SANE were introduced and two further frontends, scanimage-perl and scanadf-
perl (scanimage and scanadf transliterated from C into Perl) were added.

Before 1.2.0, options available through CLI frontends like scanimage were made visible as users asked for
them. In 1.2.0, all options can be shown or hidden via Edit/Preferences, along with the ability to
specify which options trigger a reload.

Default filename for PDF files

The following variables are available, which are replaced by the corresponding metadata:

%a author
%t title
%y document's year
%Y today's year
%m document's month
%M today's month
%d document's day
%D today's day
%h document's hour
%i document's minute
%s document's second

View
Zoom 100%

Zooms to 1:1. How this appears depends on the desktop resolution.

Zoom to fit

Scales the view such that all the page is visible.

Zoom in

Zoom out

Rotate 90 clockwise

The rotate options require the package imagemagick and, if this is not present, are ghosted out.

Rotate 180

Rotate 90 anticlockwise

Tools
Threshold

Changes all pixels darker than the given value to black; all others become white.

Unsharp mask

The unsharp option sharpens an image. The image is convolved with a Gaussian operator of the given radius
and standard deviation (sigma). For reasonable results, radius should be larger than sigma. Use a radius
of 0 to have the method select a suitable radius.

Crop

unpaper

unpaper (see <http://unpaper.berlios.de/>) is a utility for cleaning up a scan.

OCR (Optical Character Recognition)

The gocr, tesseract, ocropus or cuneiform utilities are used to produce text from an image.

There is an OCR output buffer for each page and is embedded as plain text behind the scanned image in the
PDF produced. This way, Beagle can index (i.e. search) the plain text.

In DjVu files, the OCR output buffer is embedded in the hidden text layer. Thus these can also be
indexed by Beagle.

There is an interesting review of OCR software at
<http://web.archive.org/web/20080529012847/http://groundstate.ca/ocr>. An important conclusion was that
400ppi is necessary for decent results.

Up to v2.04, the only way to tell which languages were available to tesseract was to look for the
language files. Therefore, gscan2pdf checks the path returned by:

tesseract '' '' -l ''

If there are no language files in the above location, then gscan2pdf assumes that tesseract v1.0 is
installed, which had no language files.

Variables for user-defined tools

The following variables are available:

%i input filename
%o output filename
%r resolution

An image can be modified in-place by just specifying %i.

FAQs

   Why isn't option xyz available in the scan window?
       Possibly because SANE or your scanner doesn't support it.

       If  an option listed in the output of "scanimage --help" that you would like to use isn't available, send
       me the output and I will look at implementing it.

   I've only got an old flatbed scanner with no automatic sheetfeeder. How do I scan a multipage document?
       If you are lucky, you have an option like Wait-for-button or Button-wait, where the scanner will wait for
       you to press the scan button on the device before it starts the scan, allowing you to scan multiple pages
       without touching the computer.

       Otherwise, you have to set the number of pages to scan to 1 and hit the scan button on  the  scan  window
       for each page.

   Why is option xyz ghosted out?
       Probably  because the package required for that option is not installed.  Email as PDF requires xdg-email
       (xdg-utils), unpaper and the rotate options require imagemagick.

   Why can I not scan from the flatbed of my HP scanner?
       Generally for HP scanners with an ADF, to scan from the flatbed, you should set "#  Pages"  to  "1",  and
       possibly "Batch scan" to "No".

   When I update gscan2pdf using the Update Manager in Ubuntu, why is the list of changes never displayed?
       As  far  as I can tell, this is pulled from changelogs.ubuntu.com, and therefore only the changelogs from
       official Ubuntu builds are displayed.

   Why can gscan2pdf not find my scanner?
       If your scanner is not connected directly to the machine on which you are running gscan2pdf and you  have
       not  installed  the  SANE  daemon,  saned,  gscan2pdf cannot automatically find it. In this case, you can
       specify the scanner device on the command line:

       "gscan2pdf --device <device">

   How can I search for text in the OCR layer of the finished PDF or DJVU file?
       pdftotext or djvutxt can extract the text layer from PDF or DJVU files. See the respective man pages  for
       details.

       Having opened a PDF or DJVU file in evince or Acrobat Reader, the search function will typically find the
       page with the requested text and highlight it.

       There are various tools for searching or indexing files, including PDF and DJVU:

       •   (meta) Tracker (<https://projects.gnome.org/tracker/>)

       •   plone (<http://plone.org/>)

       •   pdfgrep (<http://pdfgrep.sourceforge.net/>

       •   swish-e (<http://www.swish-e.org/>)

       •   recoll (<http://www.lesbonscomptes.com/recoll/>)

       •   terrier (<http://www.lesbonscomptes.com/recoll/>)

Author

       Jeffrey Ratcliffe (ra28145 at users dot sf dot net)

Thanks to

       •   all the people who have sent patches, translations, bugs and feedback.

       •   the GTK2 project for a most excellent graphics toolkit.

       •   the Gtk2-Perl project for their superb Perl bindings for GTK2.

       •   The SANE project for scanner access

       •   Björn Lindqvist for the gtkimageview widget

       •   Sourceforge for hosting the project.

LICENSE AND COPYRIGHT

       Copyright (C) 2006--2016 Jeffrey Ratcliffe <Jeffrey.Ratcliffe@gmail.com>

       This  program is free software: you can redistribute it and/or modify it under the terms of the version 3
       GNU General Public License as published by the Free Software Foundation.

       This program is distributed in the hope that it will be useful, but WITHOUT ANY  WARRANTY;  without  even
       the  implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
       License for more details.

       You should have received a copy of the GNU General Public License along with this program.  If  not,  see
       <http://www.gnu.org/licenses/>.

perl v5.22.1                                       2016-03-06                                      GSCAN2PDF(1p)