Provided by: sympa_6.2.70~dfsg-2_amd64
NAME
Sympa::Tools::Text - Text-related functions
DESCRIPTION
This package provides some text-related functions. Functions addrencode ( $addr, [ $phrase, [ $charset, [ $comment ] ] ] ) Returns formatted (and encoded) name-addr as RFC5322 3.4. canonic_email ( $email ) Function. Returns canonical form of e-mail address. Leading and trailing white spaces are removed. Latin letters without accents are lower-cased. For malformed inputs returns "undef". canonic_message_id ( $message_id ) Returns canonical form of message ID without trailing or leading whitespaces or "<", ">". canonic_text ( $text ) Canonicalizes text. $text should be a binary string encoded by UTF-8 character set or a Unicode string. Forbidden sequences in binary string will be replaced by U+FFFD REPLACEMENT CHARACTERs, and Normalization Form C (NFC) will be applied. clip ( $string, $length ) Function. Clips $string according to $length by bytes, considering boundary of grapheme clusters. UTF-8 is assumed for $string as bytestring. decode_filesystem_safe ( $str ) Function. Decodes a string encoded by encode_filesystem_safe(). Parameter: $str String to be decoded. Returns: Decoded string, stripped "utf8" flag if any. decode_html ( $str ) Function. Decodes HTML entities in a string encoded by UTF-8 or a Unicode string. Parameter: $str String to be decoded. Returns: Decoded string, stripped "utf8" flag if any. encode_filesystem_safe ( $str ) Function. Encodes a string $str to be suitable for filesystem. Parameter: $str String to be encoded. Returns: Encoded string, stripped "utf8" flag if any. All bytes except '-', '+', '.', '@' and alphanumeric characters are encoded to sequences '_' followed by two hexdigits. Note that '/' will also be encoded. encode_html ( $str, [ $additional_unsafe ] ) Function. Encodes characters in a string $str to HTML entities. By default '<', '>', '&' and '"' are encoded. Parameter: $str String to be encoded. $additional_unsafe Character or range of characters additionally encoded as entity references. This optional parameter was introduced on Sympa 6.2.37b.3. Returns: Encoded string, not stripping utf8 flag if any. encode_uri ( $str, [ omit => $chars ] ) Function. Encodes potentially unsafe characters in the string using "percent" encoding suitable for URIs. Parameters: $str String to be encoded. omit => $chars By default, all characters except those defined as "unreserved" in RFC 3986 are encoded, that is, "[^-A-Za-z0-9._~]". If this parameter is given, it will prevent encoding additional characters. Returns: Encoded string, stripped "utf8" flag if any. escape_chars ( $str ) Deprecated. Use "encode_filesystem_safe". Escape weird characters. escape_url ( $str ) DEPRECATED. Would be better to use "encode_uri" or "mailtourl". foldcase ( $str ) Function. Returns "fold-case" string suitable for case-insensitive match. For example, a code below looks for a needle in haystack not regarding case, even if they are non-ASCII UTF-8 strings. $haystack = Sympa::Tools::Text::foldcase($HayStack); $needle = Sympa::Tools::Text::foldcase($NeedLe); if (index $haystack, $needle >= 0) { ... } Parameter: $str A string. guessed_to_utf8( $text, [ lang, ... ] ) Function. Guesses text charset considering language context and returns the text reencoded by UTF-8. Parameters: $text Text to be reencoded. lang, ... Language tag(s) which may be given by "implicated_langs" in Sympa::Language. Returns: Reencoded text. If any charsets could not be guessed, "iso-8859-1" will be used as the last resort, just because it covers full range of 8-bit. mailtourl ( $email, [ decode_html => 1 ], [ query => {key => val, ...} ] ) Function. Constructs a "mailto:" URL for given e-mail. Parameters: $email E-mail address. decode_html => 1 If set, arguments are assumed to include HTML entities. query => {key => val, ...} Optional query. Returns: Constructed URL. pad ( $str, $width ) Pads space a string so that result will not be narrower than given width. Parameters: $str A string. $width If $width is false value or width of $str is not less than $width, does nothing. If $width is less than 0, pads right. Otherwise, pads left. Returns: Padded string. qdecode_filename ( $filename ) Q-Decodes web file name. ToDo: This should be obsoleted in the future release: Would be better to use "decode_filesystem_safe". qencode_filename ( $filename ) Q-Encodes web file name. ToDo: This should be obsoleted in the future release: Would be better to use "encode_filesystem_safe". slurp ( $file ) Get entire content of the file. Normalization by canonic_text() is applied. $file is the path to text file. unescape_chars ( $str ) Deprecated. Use "decode_filesystem_safe". Unescape weird characters. valid_email ( $string ) Basic check of an email address. weburl ( $base, \@paths, [ decode_html => 1 ], [ fragment => $fragment ], [ query => \%query ] ) Constructs a "http:" or "https:" URL under given base URI. Parameters: $base Base URI. \@paths Additional path components. decode_html => 1 If set, arguments are assumed to include HTML entities. Exception is $base: It is assumed not to include entities. fragment => $fragment Optional fragment. query => \%query Optional query. Returns: A URI. wrap_text ( $text, [ $init_tab, [ $subsequent_tab, [ $cols ] ] ] ) Function. Returns line-wrapped text. Parameters: $text The text to be folded. $init_tab Indentation prepended to the first line of paragraph. Default is '', no indentation. $subsequent_tab Indentation prepended to each subsequent line of folded paragraph. Default is '', no indentation. $cols Max number of columns of folded text. Default is 78.
HISTORY
Sympa::Tools::Text appeared on Sympa 6.2a.41. decode_filesystem_safe() and encode_filesystem_safe() were added on Sympa 6.2.10. decode_html(), encode_html(), encode_uri() and mailtourl() were added on Sympa 6.2.14, and escape_url() was deprecated. guessed_to_utf8() and pad() were added on Sympa 6.2.17. canonic_text() and slurp() were added on Sympa 6.2.53b. clip() was added on Sympa 6.2.61b.