Provided by: texlive-binaries_2024.20240313.70630+ds-5_amd64 bug

NAME

       pdftosrc - extract source file or stream from PDF file

SYNOPSIS

       pdftosrc PDF-file [stream-object-number]

DESCRIPTION

       If only PDF-file is given as argument, pdftosrc extracts the embedded source file from the
       first found stream object with /Type /SourceFile within the PDF-file and writes  it  to  a
       file  with  the  name  /SourceName  as  defined in that PDF stream object (see application
       example below).

       If both PDF-file and stream-object-number are given as arguments, and stream-object-number
       is  positive, pdftosrc extracts and uncompresses the PDF stream of the object given by its
       stream-object-number from the PDF-file and writes it  to  a  file  named  PDF-file.stream-
       object-number with the ending .pdf or .PDF stripped from the original PDF-file name.

       A  special  case  is related to XRef object streams that are part of the PDF standard from
       PDF-1.5 onward: If stream-object-number equals -1, then  pdftosrc  decompresses  the  XRef
       stream  from the PDF file and writes it in human-readable PDF cross-reference table format
       to a file named PDF-file.xref (these XRef streams can not  be  extracted  just  by  giving
       their object number).

       In any case an existing file with the output file name will be overwritten.

OPTIONS

       None.

FILES

       Just the executable pdftosrc.

ENVIRONMENT

       None.

DIAGNOSTICS

       At success the exit code of pdftosrc is 0, else 1.

       All  messages  go  to  stderr.  At program invocation, pdftosrc issues the current version
       number of the program xpdf, on which pdftosrc is based:

              pdftosrc version 3.01

       When pdftosrc was successful with the output file writing, one of the  following  messages
       will be issued:

              Source file extracted to source-file-name

       or

              Stream object extracted to PDF-file.stream-object-number

       or

              Cross-reference table extracted to PDF-file.xref

       When  the  object  given  by  the stream-object-number does not contain a stream, pdftosrc
       issues the following error message:

              Not a Stream object

       When the PDF-file can't be opened, the error message is:

              Error: Couldn't open file 'PDF-file'.

       When pdftosrc encounters an invalid PDF file, the error message (several lines) is:

              Error: May not be a PDF file (continuing anyway)
              (more lines)
              Invalid PDF file

       There are also more error messages from pdftosrc for various kinds of broken PDF files.

NOTES

       An embedded source file will be written out unchanged, i. e. it will not  be  uncompressed
       in this process.

       Only the stream of the object will be written, i. e. not the dictionary of that object.

       Knowing  which  stream-object-number to query requires information about the PDF file that
       has to be gained elsewhere, e. g. by looking into the PDF file with an editor.

       The stream extraction capabilities of pdftosrc (e. g. regarding  understood  PDF  versions
       and filter types) follow the capabilities of the underlying xpdf program version.

       Currently  the generation number of the stream object is not supported.  The default value
       0 (zero) is taken.

       The wording stream-object-number has nothing to do with the `object streams' introduced by
       the Adobe PDF Reference, 5th edition, version 1.6.

EXAMPLES

       When  using  pdftex,  a  source  file  can  be embedded into some PDF-file by using pdftex
       primitives, as illustrated by the following example:

       \immediate\pdfobj
           stream attr {/Type /SourceFile /SourceName (myfile.zip)}
           file{myfile.zip}
       \pdfcatalog{/SourceObject \the\pdflastobj\space 0 R}

       Then this zip file can be extracted from the PDF-file by calling pdftosrc PDF-file.

BUGS

       Not all embedded source files will be extracted, only the first found one.

       Email bug reports to pdftex@tug.org.

SEE ALSO

       xpdf(1), pdfimages(1), pdftotext(1), pdftex(1),

AUTHORS

       pdftosrc written by Han The Thanh, using xpdf functionality from Derek Noonburg.

       Man page written by Hartmut Henkel.

COPYRIGHT

       Copyright (c) 1996-2006 Han The Thanh, <thanh@pdftex.org>

       This file is part of pdfTeX.

       pdfTeX is free software; you can redistribute it and/or modify it under the terms  of  the
       GNU  General Public License as published by the Free Software Foundation; either version 2
       of the License, or (at your option) any later version.

       pdfTeX is distributed in the hope that it  will  be  useful,  but  WITHOUT  ANY  WARRANTY;
       without  even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
       See the GNU General Public License for more details.

       You should have received a copy of the GNU General Public License along  with  pdfTeX;  if
       not,  write  to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
       02111-1307  USA