oracular (3) App::DocKnot::Spin::Text.3pm.gz

Provided by: docknot_8.0.1-1_all bug

NAME

       App::DocKnot::Spin::Text - Convert some particular text formats into HTML

SYNOPSIS

           use App::DocKnot::Spin::Text;

           my $text = App::DocKnot::Spin::Text->new({style => '/styles/faq.css'});
           $text->spin_text_file('/path/to/input', '/path/to/output.html');

REQUIREMENTS

       Perl 5.24 or later and the modules List::SomeUtils, Path::Tiny, and Sort::Versions,
       available from CPAN.

DESCRIPTION

       This is another of those odd breed of partially functional beasts, a text to HTML
       converter.

       This is not truly possible in general; people do too many varied things with their text to
       intuit document structure from it.  This is therefore a converter that will translate
       documents written the way I write.  It may or may not work for you.  The chances that it
       will work for you are directly proportional to how much your writing looks like mine.

       App::DocKnot::Spin::Text understands digest separators (lines of exactly thirty hyphens,
       from the minimal digest standard) and will treat a "Subject" header immediately after them
       as a section header.  Beyond that, headings must either be outdented, underlined on the
       following line, or in all caps to be recognized as section headers.  (Outdenting means
       that the regular text is indented by a few spaces, but headers start in column 0, or at
       least in a column farther to the left than the regular text.)

       Section headers that begin with numbers (with any number of periods) will be given "<a
       id>" tags containing that number prepended with "S".  As a special case of the parsing,
       any section with a header containing "contents" will have lines beginning with numbers
       turned into links to the appropriate <a id> tags in the same document.  You can use this
       to turn the table of contents of your minimal digest format FAQ into a real table of
       contents with links in the HTML version.

       Text with embedded whitespace more than a single space or a couple of spaces at a sentence
       boundary or after a colon (and any text with literal tabs) will be wrapped in "<pre>"
       tags.  So will any indented text that doesn't look like English paragraphs.  URLs
       surrounded by "<...>" or "<URL:...>" will be turned into links.  Other URLs will not be
       turned into links, nor is any effort made to turn random body text into links because it
       happens to look like a link.

       Bullet lists and numbered lists will be turned into the appropriate HTML structures.  Some
       attempt is also made to recognize description lists, but App::DocKnot::Spin::Text was
       written by someone who writes a lot of technical documentation and therefore tends to
       prefer "<pre>" if unsure whether something is a description list or preformatted text.
       Description lists are therefore only going to work if the description titles aren't
       indented relative to the surrounding text.

       Regular indented paragraphs or paragraphs quoted with a consistent non-alphanumeric quote
       character are recognized and turned into HTML block quotes.

       It's worthwhile paying attention to the headers at the top of your document so that
       App::DocKnot::Spin::Text can get a few things right.  If you use RCS or CVS, put the RCS
       "Id" keyword as the first line of your document; it will be stripped out of the resulting
       output and App::DocKnot::Spin::Text will use it to determine the document revision.  This
       should be followed by regular message headers and news.answers subheaders if the document
       is an actual FAQ, and App::DocKnot::Spin::Text will use the "From" and "Subject" headers
       to figure out a title and headings to use.  As a special case, an HTML-title header in the
       subheaders will override any other title that App::DocKnot::Spin::Text thinks it should
       use for the document.

       App::DocKnot::Spin::Text expects your document to have an "<h1>" title, and will add one
       from the Subject header if it doesn't find one.  It will also add subheaders
       ("class="subheading"") giving the author (from the "From" header) and the last modified
       time and revision (from the RCS "Id" string) if there are no subheadings already.  If
       there's a subheading that contains RCS identifiers, it will be replaced by a nicely
       formatted heading generated from the RCS "Id" information in the HTML output.

       Text marked as "*bold*" using the standard asterisk notation will be surrounded by
       "<strong>" tags, if the asterisks appear to be marking bold text rather than serving as
       wildcards or some other function.

       App::DocKnot::Spin::Text produces output (at least in the absence of any lurking bugs)
       which complies with the XHTML 1.0 Transitional standard.  The input and output character
       set is assumed to be UTF-8.

CLASS METHODS

       new(ARGS)
           Create a new App::DocKnot::Spin::Text object.  A single converter object can be reused
           to convert multiple files provided that they have the same options.  ARGS should be a
           hash reference with one or more of the following keys, all of which are optional:

           output
               The path to the root of the output tree when converting a tree of files.  This
               will be used to calculate relative path names for generating inter-page links
               using the provided "sitemap" argument.  If "sitemap" is given, this option should
               also always be given.

           modified
               Add a last modified subheader to the document.  This will always be done if an RCS
               "Id" string is present in the input.  Otherwise, a last modified subheader based
               on the last modification date of the input file will be added if the input is a
               file and this option is set to a true value.  The default is false.

           sitemap
               An App::DocKnot::Spin::Sitemap object.  This will be used to create inter-page
               links.  For inter-page links, the "output" argument must also be provided.

           style
               The URL to the style sheet to use.  The appropriate HTML will be added to the
               "<head>" section of the resulting document.

           title
               The HTML page title to use.  This will also be used as the "<h1>" heading if the
               document doesn't contain one, but will not override a heading found in the
               document (only the HTML "<title>" attribute).

INSTANCE METHODS

       spin_text_file([INPUT[, OUTPUT]])
           Convert a single text file to HTML.  INPUT is the path of the input file and OUTPUT is
           the path of the output file.  OUTPUT or both INPUT and OUTPUT may be omitted, in which
           case standard input or standard output, respectively, will be used.

           If OUTPUT is omitted, App::DocKnot::Spin::Text will not be able to obtain sitemap
           information even if a sitemap was provided, and therefore will not add inter-page
           links.

NOTES

       I wrote this program because every other text to HTML converter that I've seen made
       specific assumptions about the document format and wanted you to write like it wanted you
       to write rather than like the way you wanted to write.  This program instead wants you to
       write like I write, which from my perspective is an improvement.

       I don't claim that this is the be-all and end-all of text to HTML converters, as I don't
       believe such a beast exists.  I do believe it's pretty close to being the be-all and end-
       all of text to HTML converters for text that I personally have written, since I've written
       into it a lot of knowledge of the sorts of text formatting conventions that I use.  If you
       happen to use the same ones, you may be delighted with this module.  If you don't, you'll
       probably be very frustrated with it.

       In any case, I took to this project the perspective that whenever there was something this
       program couldn't handle, I wanted to make it smarter rather than change the input.  I've
       mostly been successful at that, so far.

CAVEATS

       This program attempts to intuit structure from an unstructured markup format.  It
       therefore relies on a whole bunch of fussy heuristics, poorly-understood assumptions, and
       sheer blind luck.  To fully document the boundary cases of this program would take more
       time and patience than I care to invest; see the source code if you're curious.  This is
       not a predictable or easily documentable program.  Instead, it attempts to do what I mean
       without bugging me about it.

       There is therefore, at least currently, no way to control or adjust parameters in this
       program without editing it.  I may someday add that, but I'm leery of it, since the code
       complexity would start increasing exponentially if I tried to let people tweak everything.
       I've given up on more than one text to HTML converter because it had more options than ls
       and expected you to try to figure out which ones should be used for a document yourself.

       English month names are used for the last modification dates, and the resulting HTML
       always declares that the document is in English.  This could be made configurable if
       anyone wishes.

AUTHOR

       Russ Allbery <rra@cpan.org>

       Copyright 1999-2002, 2004-2005, 2008, 2010, 2013, 2021-2024 Russ Allbery <rra@cpan.org>

       Permission is hereby granted, free of charge, to any person obtaining a copy of this
       software and associated documentation files (the "Software"), to deal in the Software
       without restriction, including without limitation the rights to use, copy, modify, merge,
       publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons
       to whom the Software is furnished to do so, subject to the following conditions:

       The above copyright notice and this permission notice shall be included in all copies or
       substantial portions of the Software.

       THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
       INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
       PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE
       FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
       OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
       DEALINGS IN THE SOFTWARE.

SEE ALSO

       docknot(1), App::DocKnot::Spin, App::DocKnot::Spin::Sitemap

       This module is part of the App-DocKnot distribution.  The current version of DocKnot is
       available from CPAN, or directly from its web site at
       <https://www.eyrie.org/~eagle/software/docknot/>.