Provided by: libsoldout1-dev_1.3-6_amd64 

NAME
markdown - markdown documents parsing
SYNOPSIS
#PACKAGE#
#include <markdown.h>
void markdown ( struct buf *ob,
struct buf *ib,
const struct mkd_renderer *rndr);
DESCRIPTION
is the only exported function in libsoldout and starts the parsing process of a markdown document. You
can have more information about the markdown language from John Gruber's website:
http://daringfireball.net/projects/markdown/
Libsoldout only performs the parsing of markdown input, the construction of the output is left to a
*renderer*, which is a set of callback functions called when markdown elements are encountered. Pointers
to these functions are gathered into a `struct mkd_renderer` along with some renderer-related data. I
think the struct declaration is pretty obvious:
struct mkd_renderer {
/* document level callbacks */
void (*prolog)(struct buf *ob, void *opaque);
void (*epilog)(struct buf *ob, void *opaque);
/* block level callbacks - NULL skips the block */
void (*blockcode)(struct buf *ob, struct buf *text, void *opaque);
void (*blockquote)(struct buf *ob, struct buf *text, void *opaque);
void (*blockhtml)(struct buf *ob, struct buf *text, void *opaque);
void (*header)(struct buf *ob, struct buf *text,
int level, void *opaque);
void (*hrule)(struct buf *ob, void *opaque);
void (*list)(struct buf *ob, struct buf *text, int flags,
void *opaque);
void (*listitem)(struct buf *ob, struct buf *text,
int flags, void *opaque);
void (*paragraph)(struct buf *ob, struct buf *text, void *opaque);
void (*table)(struct buf *ob, struct buf *head_row,
struct buf *rows,void *opaque);
void (*table_cell)(struct buf *ob, struct buf *text, int flags,
void *opaque);
void (*table_row)(struct buf *ob, struct buf *cells, int flags,
void *opaque);
/* span level callbacks - NULL or return 0
prints the span verbatim */
int (*autolink)(struct buf *ob, struct buf *link,
enum mkd_autolink type, void *opaque);
int (*codespan)(struct buf *ob, struct buf *text, void *opaque);
int (*double_emphasis)(struct buf *ob, struct buf *text,
char c, void *opaque);
int (*emphasis)(struct buf *ob, struct buf *text,
char c, void*opaque);
int (*image)(struct buf *ob, struct buf *link, struct buf *title,
struct buf *alt, void *opaque);
int (*linebreak)(struct buf *ob, void *opaque);
int (*link)(struct buf *ob, struct buf *link, struct buf *title,
struct buf *content, void *opaque);
int (*raw_html_tag)(struct buf *ob, struct buf *tag, void *opaque);
int (*triple_emphasis)(struct buf *ob, struct buf *text,
char c, void *opaque);
/* low level callbacks - NULL copies input directly
into the output */
void (*entity)(struct buf *ob, struct buf *entity, void *opaque);
void (*normal_text)(struct buf *ob, struct buf *text,
void *opaque);
/* renderer data */
int max_work_stack; /* prevent arbitrary deep recursion */
const char *emph_chars; /* chars that trigger emphasis rendering */
void *opaque; /* opaque data send to every rendering callback */ };
The first argument of a renderer function is always the output buffer, where the function is supposed to
write its output. It's not necessarily related to the output buffer given to `markdown()` because in some
cases render into a temporary buffer is needed.
The last argument of a renderer function is always an opaque pointer, which is equal to the `opaque`
member of `struct mkd_renderer`. The name "opaque" might not be well-chosen, but it means a pointer
*opaque for the parser, **not** for the renderer*. It means that my parser passes around blindy the
pointer which contains data you know about, in case you need to store an internal state or whatever. I
have not found anything to put in this pointer in my example renderers, so it is set to NULL in the
structure and never look at in the callbacks.
`emph_chars` is a zero-terminated string which contains the set of characters that trigger emphasis. In
regular markdown, emphasis is only triggered by '\_' and '\*', but in some extensions it might be useful
to add other characters to this list. For example in my extension to handle `<ins>` and `<del>` spans,
delimited respectively by "++" and "--", I have added '+' and '-' to `emph_chars`. The character that
triggered the emphasis is then passed to `emphasis`, `double_emphasis` and `triple_emphasis` through the
parameter `c`.
Function pointers in `struct mkd_renderer` can be NULL, but it has a different meaning whether the
callback is block-level or span-level. A null block-level callback will make the corresponding block
disappear from the output, as if the callback was an empty function. A null span-level callback will
cause the corresponding element to be treated as normal characters, copied verbatim to the output.
So for example, to disable link and images (e.g. because you consider them as dangerous), just put a null
pointer in `rndr.link` and `rndr.image` and the bracketed stuff will be present as-is in the output.
While a null pointer in `header` will remove all header-looking blocks. If you want an otherwise standard
markdown-to-XHTML conversion, you can take the example `mkd_xhtml` struct, copy it into your own `struct
mkd_renderer` and then assign NULL to `link` and `image` members.
Moreover, span-level callbacks return an integer, which tells whether the renderer accepts to render the
item (non-zero return value) or whether it should be copied verbatim (zero return value). This allows you
to only accept some specific inputs. For example, my extension for `<ins>` and `<del>` spans asks
*exactly* two '-' or '+' as delimiters, when `emphasis` and `triple_emphasis` are called with '-' or '+',
they return 0.
Special care should be taken when writing `autolink`, `link` and `image` callbacks, because the arguments
`link`, `title` and `alt` are unsanitized data taken directly from the input file. It is up to the
renderer to escape whatever needs escaping to prevent bad things from happening. To help you writing
renderers, the function `lus_attr_escape()` escapes all problematic characters in (X)HTML: `'<'`, `'>'`,
`'&'` and `'"'`.
The `normal_text` callback should also perform whatever escape is needed to have the output looking like
the input data.
PHP-MARKDOWN-LIKE TABLES
Tables are one of the few extensions that are quite difficult and/or hacky to implement using vanilla
Markdown parser and a renderer. Thus a support has been introduced into the parser, using dedicated
callbacks:
- `table_cell`, which is called with the span-level contents of the cell;
- `table_row`, which is called with data returned by `table_cell`;
- `table`, which called with data returned by `table_row`.
The input format to describe tables is taken from PHP-Markdown, and looks like this:
header 1 | header 2 | header 3 | header 4
------------|:-------------:|--------------:|:--------------
first line | centered | right-aligned | left-aligned
second line | centered |: centered :| left-aligned
third line |: left-aglined | right-aligned | right-aligned :
column-separator | don't need | to be | aligned in the source
| extra speratators | are allowed | at both ends | of the line |
| correct number of cell per row is not enforced |
| pipe characters can be embedded in cell text by escaping it: |
Each row of the input text is a single row in the output, except the header rule, which is purely
syntactic.
Each cell in a row is delimited by a pipe (`|`) character. Optionally, a pipe character can also be
present at the beginning and/or at the end of the line. Column separator don't have to be aligned in the
input, but it makes the input more readable.
There is no check of "squareness" of the table: `table_cell` is called once for each cell provided in the
input, which can be a number of times different from one row to the other. If the output *has* to respect
a given number of cell per row, it's up to the renderer to enforce it, using state transmitted through
the `opaque` pointer.
The header rule is a line containing only horizontal blanks (space and tab), dashes (`-`), colons (`:`)
and separator. Moreover, it *must* be the second line of the table. In case such a header rule is
detected, the first line of the table is considered as a header, and passed as the `head_row` argument to
`table` callback. Moreover `table_row` and `table_cell` are called for that specific row with
`MKD_CELL_HEAD` flag.
Alignment is defined on a per-cell basis, and specified by a colon (`:`) at the very beginning of the
input span (i.e. directly after the `|` separator, or as the first character on the line) and/or at the
very end of it (i.e. directly before the separator, or as the last character on the line). A cell with
such a leading colon only is left-aligned (`MKD_CELL_ALIGN_LEFT`), one with a trailing colon only is
right-aligned (`MKD_CELL_ALIGN_RIGHT`), and one with both is centered (`MKD_CELL_ALIGN_CENTER`).
A column-wise default alignment can be specified with the same syntax on the header rule.
RENDERER EXAMPLES
While libsoldout is designed to perform only the parsing of markdown files, and to let you provide the
renderer callbacks, a few renderers have been included, both to illustrate how to write a set of renderer
functions and to allow anybody who do not need special extensions to use libsoldout without hassle.
All the examples provided here comme with two flavors, `_html` producing HTML code (self-closing tags are
rendered like this: `<hr>`), and `_xhtml` producing XHTML code (self-closing tags like `<hr />`).
STANDARD MARKDOWN RENDERER
`mkd_html` and `mkd_xhtml` implement standard Markdown to (X)HTML translation without any extension.
DISCOUNT-ISH
`discount_html` and `discount_xhtml` implement on top of the standard markdown *some* of the extensions
found in Discount.
Actually, all Discount extensions that are not provided here cannot be easily implemented in libsoldout
without touching to the parsing code, hence they do not belong strictly to the renderer realm. However
some (maybe all, not sure about tables) extensions can be implemented fairly easily with libsoldout by
using both a dedicated renderer and some preprocessing to make the extension look like something closer
to the original markdown syntax.
Here is a list of all extensions included in these renderers:
- image size specitication, by appending " =(width)x(height)" to
the link,
- pseudo-protocols in links:
* abbr:_description_ for `<abbr title="`_description_`">...</abbr>`
* class:_name_ for `<span class="`_name_`">...</span>`
* id:_name_ for `<a id="`_name_`>...</a>`
* raw:_text_ for verbatim unprocessed _text_ inclusion
- class blocks: blockquotes beginning with %_class_% will be
rendered as a `div` of the given class(es).
NATASHA'S OWN EXTENSIONS
`nat_html` and `nat_xhtml` implement on top of Discount extensions some things that I need to convert
losslessly my existing HTML into extended markdown.
Here is a list of these extensions :
- id attribute for headers, using the syntax _id_#_Header text_
- class attribute for paragraphs, by putting class name(s) between
parenthesis at the very beginning of the paragraph
- `<ins>` and `<del>` spans, using respectively `++` and `--` as
delimiters (with emphasis-like restrictions, i.e. an opening
delimiter cannot be followed by a whitespace, and a closing
delimiter cannot be preceded by a whitespace).
- plain `<span>` without attribute, using emphasis-like delimiter `|`
Follows an example use of all of them:
###atx_id#ID was chosen to look nice in atx-style headers ###
setext_id#Though it will also work in setext-style headers
----------------------------------------------------------
Here is a paragraph with --deleted-- and ++inserted++ text.
I use CSS rules to render poetry and other verses, using a plain
`<span>` for each verse, and enclosing each group of verses in
a `<p class="verse">`. Here is how it would look like:
(verse)|And on the pedestal these words appear:|
|"My name is Ozymandias, king of kings:|
|Look on my works, ye Mighty, and despair!"|
COPYRIGHT
Copyright © 2009 Natasha Porte' <natbsd@instinctive.eu>
SEE ALSO
John Gruber's website http://daringfireball.net/projects/markdown/
Natacha Porté website http://fossil.instinctive.eu/
2009 MARKDOWN(3)