Provided by: libsoldout1-dev_1.3-6_amd64
NAME
markdown - markdown documents parsing
SYNOPSIS
#PACKAGE# #include <markdown.h> void markdown ( struct buf *ob, struct buf *ib, const struct mkd_renderer *rndr);
DESCRIPTION
is the only exported function in libsoldout and starts the parsing process of a markdown document. You can have more information about the markdown language from John Gruber's website: http://daringfireball.net/projects/markdown/ Libsoldout only performs the parsing of markdown input, the construction of the output is left to a *renderer*, which is a set of callback functions called when markdown elements are encountered. Pointers to these functions are gathered into a `struct mkd_renderer` along with some renderer-related data. I think the struct declaration is pretty obvious: struct mkd_renderer { /* document level callbacks */ void (*prolog)(struct buf *ob, void *opaque); void (*epilog)(struct buf *ob, void *opaque); /* block level callbacks - NULL skips the block */ void (*blockcode)(struct buf *ob, struct buf *text, void *opaque); void (*blockquote)(struct buf *ob, struct buf *text, void *opaque); void (*blockhtml)(struct buf *ob, struct buf *text, void *opaque); void (*header)(struct buf *ob, struct buf *text, int level, void *opaque); void (*hrule)(struct buf *ob, void *opaque); void (*list)(struct buf *ob, struct buf *text, int flags, void *opaque); void (*listitem)(struct buf *ob, struct buf *text, int flags, void *opaque); void (*paragraph)(struct buf *ob, struct buf *text, void *opaque); void (*table)(struct buf *ob, struct buf *head_row, struct buf *rows,void *opaque); void (*table_cell)(struct buf *ob, struct buf *text, int flags, void *opaque); void (*table_row)(struct buf *ob, struct buf *cells, int flags, void *opaque); /* span level callbacks - NULL or return 0 prints the span verbatim */ int (*autolink)(struct buf *ob, struct buf *link, enum mkd_autolink type, void *opaque); int (*codespan)(struct buf *ob, struct buf *text, void *opaque); int (*double_emphasis)(struct buf *ob, struct buf *text, char c, void *opaque); int (*emphasis)(struct buf *ob, struct buf *text, char c, void*opaque); int (*image)(struct buf *ob, struct buf *link, struct buf *title, struct buf *alt, void *opaque); int (*linebreak)(struct buf *ob, void *opaque); int (*link)(struct buf *ob, struct buf *link, struct buf *title, struct buf *content, void *opaque); int (*raw_html_tag)(struct buf *ob, struct buf *tag, void *opaque); int (*triple_emphasis)(struct buf *ob, struct buf *text, char c, void *opaque); /* low level callbacks - NULL copies input directly into the output */ void (*entity)(struct buf *ob, struct buf *entity, void *opaque); void (*normal_text)(struct buf *ob, struct buf *text, void *opaque); /* renderer data */ int max_work_stack; /* prevent arbitrary deep recursion */ const char *emph_chars; /* chars that trigger emphasis rendering */ void *opaque; /* opaque data send to every rendering callback */ }; The first argument of a renderer function is always the output buffer, where the function is supposed to write its output. It's not necessarily related to the output buffer given to `markdown()` because in some cases render into a temporary buffer is needed. The last argument of a renderer function is always an opaque pointer, which is equal to the `opaque` member of `struct mkd_renderer`. The name "opaque" might not be well-chosen, but it means a pointer *opaque for the parser, **not** for the renderer*. It means that my parser passes around blindy the pointer which contains data you know about, in case you need to store an internal state or whatever. I have not found anything to put in this pointer in my example renderers, so it is set to NULL in the structure and never look at in the callbacks. `emph_chars` is a zero-terminated string which contains the set of characters that trigger emphasis. In regular markdown, emphasis is only triggered by '\_' and '\*', but in some extensions it might be useful to add other characters to this list. For example in my extension to handle `<ins>` and `<del>` spans, delimited respectively by "++" and "--", I have added '+' and '-' to `emph_chars`. The character that triggered the emphasis is then passed to `emphasis`, `double_emphasis` and `triple_emphasis` through the parameter `c`. Function pointers in `struct mkd_renderer` can be NULL, but it has a different meaning whether the callback is block-level or span-level. A null block-level callback will make the corresponding block disappear from the output, as if the callback was an empty function. A null span-level callback will cause the corresponding element to be treated as normal characters, copied verbatim to the output. So for example, to disable link and images (e.g. because you consider them as dangerous), just put a null pointer in `rndr.link` and `rndr.image` and the bracketed stuff will be present as-is in the output. While a null pointer in `header` will remove all header- looking blocks. If you want an otherwise standard markdown-to-XHTML conversion, you can take the example `mkd_xhtml` struct, copy it into your own `struct mkd_renderer` and then assign NULL to `link` and `image` members. Moreover, span-level callbacks return an integer, which tells whether the renderer accepts to render the item (non-zero return value) or whether it should be copied verbatim (zero return value). This allows you to only accept some specific inputs. For example, my extension for `<ins>` and `<del>` spans asks *exactly* two '-' or '+' as delimiters, when `emphasis` and `triple_emphasis` are called with '-' or '+', they return 0. Special care should be taken when writing `autolink`, `link` and `image` callbacks, because the arguments `link`, `title` and `alt` are unsanitized data taken directly from the input file. It is up to the renderer to escape whatever needs escaping to prevent bad things from happening. To help you writing renderers, the function `lus_attr_escape()` escapes all problematic characters in (X)HTML: `'<'`, `'>'`, `'&'` and `'"'`. The `normal_text` callback should also perform whatever escape is needed to have the output looking like the input data.
PHP-MARKDOWN-LIKE TABLES
Tables are one of the few extensions that are quite difficult and/or hacky to implement using vanilla Markdown parser and a renderer. Thus a support has been introduced into the parser, using dedicated callbacks: - `table_cell`, which is called with the span-level contents of the cell; - `table_row`, which is called with data returned by `table_cell`; - `table`, which called with data returned by `table_row`. The input format to describe tables is taken from PHP-Markdown, and looks like this: header 1 | header 2 | header 3 | header 4 ------------|:-------------:|--------------:|:-------------- first line | centered | right-aligned | left-aligned second line | centered |: centered :| left-aligned third line |: left-aglined | right-aligned | right-aligned : column-separator | don't need | to be | aligned in the source | extra speratators | are allowed | at both ends | of the line | | correct number of cell per row is not enforced | | pipe characters can be embedded in cell text by escaping it: | Each row of the input text is a single row in the output, except the header rule, which is purely syntactic. Each cell in a row is delimited by a pipe (`|`) character. Optionally, a pipe character can also be present at the beginning and/or at the end of the line. Column separator don't have to be aligned in the input, but it makes the input more readable. There is no check of "squareness" of the table: `table_cell` is called once for each cell provided in the input, which can be a number of times different from one row to the other. If the output *has* to respect a given number of cell per row, it's up to the renderer to enforce it, using state transmitted through the `opaque` pointer. The header rule is a line containing only horizontal blanks (space and tab), dashes (`-`), colons (`:`) and separator. Moreover, it *must* be the second line of the table. In case such a header rule is detected, the first line of the table is considered as a header, and passed as the `head_row` argument to `table` callback. Moreover `table_row` and `table_cell` are called for that specific row with `MKD_CELL_HEAD` flag. Alignment is defined on a per-cell basis, and specified by a colon (`:`) at the very beginning of the input span (i.e. directly after the `|` separator, or as the first character on the line) and/or at the very end of it (i.e. directly before the separator, or as the last character on the line). A cell with such a leading colon only is left- aligned (`MKD_CELL_ALIGN_LEFT`), one with a trailing colon only is right-aligned (`MKD_CELL_ALIGN_RIGHT`), and one with both is centered (`MKD_CELL_ALIGN_CENTER`). A column-wise default alignment can be specified with the same syntax on the header rule.
RENDERER EXAMPLES
While libsoldout is designed to perform only the parsing of markdown files, and to let you provide the renderer callbacks, a few renderers have been included, both to illustrate how to write a set of renderer functions and to allow anybody who do not need special extensions to use libsoldout without hassle. All the examples provided here comme with two flavors, `_html` producing HTML code (self- closing tags are rendered like this: `<hr>`), and `_xhtml` producing XHTML code (self- closing tags like `<hr />`).
STANDARD MARKDOWN RENDERER
`mkd_html` and `mkd_xhtml` implement standard Markdown to (X)HTML translation without any extension.
DISCOUNT-ISH
`discount_html` and `discount_xhtml` implement on top of the standard markdown *some* of the extensions found in Discount. Actually, all Discount extensions that are not provided here cannot be easily implemented in libsoldout without touching to the parsing code, hence they do not belong strictly to the renderer realm. However some (maybe all, not sure about tables) extensions can be implemented fairly easily with libsoldout by using both a dedicated renderer and some preprocessing to make the extension look like something closer to the original markdown syntax. Here is a list of all extensions included in these renderers: - image size specitication, by appending " =(width)x(height)" to the link, - pseudo-protocols in links: * abbr:_description_ for `<abbr title="`_description_`">...</abbr>` * class:_name_ for `<span class="`_name_`">...</span>` * id:_name_ for `<a id="`_name_`>...</a>` * raw:_text_ for verbatim unprocessed _text_ inclusion - class blocks: blockquotes beginning with %_class_% will be rendered as a `div` of the given class(es).
NATASHA'S OWN EXTENSIONS
`nat_html` and `nat_xhtml` implement on top of Discount extensions some things that I need to convert losslessly my existing HTML into extended markdown. Here is a list of these extensions : - id attribute for headers, using the syntax _id_#_Header text_ - class attribute for paragraphs, by putting class name(s) between parenthesis at the very beginning of the paragraph - `<ins>` and `<del>` spans, using respectively `++` and `--` as delimiters (with emphasis-like restrictions, i.e. an opening delimiter cannot be followed by a whitespace, and a closing delimiter cannot be preceded by a whitespace). - plain `<span>` without attribute, using emphasis-like delimiter `|` Follows an example use of all of them: ###atx_id#ID was chosen to look nice in atx-style headers ### setext_id#Though it will also work in setext-style headers ---------------------------------------------------------- Here is a paragraph with --deleted-- and ++inserted++ text. I use CSS rules to render poetry and other verses, using a plain `<span>` for each verse, and enclosing each group of verses in a `<p class="verse">`. Here is how it would look like: (verse)|And on the pedestal these words appear:| |"My name is Ozymandias, king of kings:| |Look on my works, ye Mighty, and despair!"|
COPYRIGHT
Copyright © 2009 Natasha Porte' <natbsd@instinctive.eu>
SEE ALSO
John Gruber's website http://daringfireball.net/projects/markdown/ Natacha Porté website http://fossil.instinctive.eu/ 2009 MARKDOWN(3)