Ubuntu Manpage: string - String processing functions.

Provided by: erlang-manpages_20.2.2+dfsg-1ubuntu2_all

NAME

       string - String processing functions.

DESCRIPTION

       This module provides functions for string processing.

       A  string  in  this  module is represented by unicode:chardata(), that is, a list of codepoints, binaries
       with UTF-8-encoded codepoints (UTF-8 binaries), or a mix of the two.

       "abcd"               is a valid string
       <<"abcd">>           is a valid string
       ["abcd"]             is a valid string
       <<"abc..åäö"/utf8>>  is a valid string
       <<"abc..åäö">>       is NOT a valid string,
                            but a binary with Latin-1-encoded codepoints
       [<<"abc">>, "..åäö"] is a valid string
       [atom]               is NOT a valid string

       This module operates on grapheme clusters. A grapheme cluster is a user-perceived character, which can be
       represented by several codepoints.

       "å"  [229] or [97, 778]
       "e̊"  [101, 778]

       The string length of "ß↑e̊" is 3, even though it is represented by the codepoints  [223,8593,101,778]  or
       the UTF-8 binary <<195,159,226,134,145,101,204,138>>.

       Grapheme  clusters  for  codepoints of class prepend and non-modern (or decomposed) Hangul is not handled
       for performance reasons in find/3, replace/3, split/2, split/2 and trim/3.

       Splitting and appending strings is to be done on grapheme clusters borders. There is no verification that
       the results of appending strings are valid or normalized.

       Most  of  the  functions  expect  all  input  to  be  normalized   to   one   form,   see   for   example
       unicode:characters_to_nfc_list/1.

       Language or locale specific handling of input is not considered in any function.

       The functions can crash for non-valid input strings. For example, the functions expect UTF-8 binaries but
       not all functions verify that all binaries are encoded correctly.

       Unless  otherwise  specified  the  return value type is the same as the input type. That is, binary input
       returns binary output, list input returns a list output, and mixed input can return a mixed output.

       1> string:trim("  sarah  ").
       "sarah"
       2> string:trim(<<"  sarah  ">>).
       <<"sarah">>
       3> string:lexemes("foo bar", " ").
       ["foo","bar"]
       4> string:lexemes(<<"foo bar">>, " ").
       [<<"foo">>,<<"bar">>]

       This module has been reworked in Erlang/OTP 20 to  handle  unicode:chardata()  and  operate  on  grapheme
       clusters.  The  old functions that only work on Latin-1 lists as input are still available but should not
       be used. They will be deprecated in Erlang/OTP 21.

DATA TYPES

       direction() = leading | trailing

       grapheme_cluster() = char() | [char()]

              A user-perceived character, consisting of one or more codepoints.

EXPORTS

       casefold(String :: unicode:chardata()) -> unicode:chardata()

              Converts String to a case-agnostic  comparable  string.  Function  casefold/1  is  preferred  over
              lowercase/1 when two strings are to be compared for equality. See also equal/4.

              Example:

              1> string:casefold("Ω and ẞ SHARP S").
              "ω and ss sharp s"

       chomp(String :: unicode:chardata()) -> unicode:chardata()

              Returns a string where any trailing \n or \r\n have been removed from String.

              Example:

              182> string:chomp(<<"\nHello\n\n">>).
              <<"\nHello">>
              183> string:chomp("\nHello\r\r\n").
              "\nHello\r"

       equal(A, B) -> boolean()

       equal(A, B, IgnoreCase) -> boolean()

       equal(A, B, IgnoreCase, Norm) -> boolean()

              Types:

                 A = B = unicode:chardata()
                 IgnoreCase = boolean()
                 Norm = none | nfc | nfd | nfkc | nfkd

              Returns true if A and B are equal, otherwise false.

              If IgnoreCase is true the function does casefolding on the fly before the equality test.

              If  Norm is not none the function applies normalization on the fly before the equality test. There
              are four available normalization forms: nfc, nfd, nfkc, and nfkd.

              By default, IgnoreCase is false and Norm is none.

              Example:

              1> string:equal("åäö", <<"åäö"/utf8>>).
              true
              2> string:equal("åäö", unicode:characters_to_nfd_binary("åäö")).
              false
              3> string:equal("åäö", unicode:characters_to_nfd_binary("ÅÄÖ"), true, nfc).
              true

       find(String, SearchPattern) -> unicode:chardata() | nomatch

       find(String, SearchPattern, Dir) -> unicode:chardata() | nomatch

              Types:

                 String = SearchPattern = unicode:chardata()
                 Dir = direction()

              Removes anything before SearchPattern in String and returns the remainder of the string or nomatch
              if SearchPattern is not found. Dir, which  can  be  leading  or  trailing,  indicates  from  which
              direction characters are to be searched.

              By default, Dir is leading.

              Example:

              1> string:find("ab..cd..ef", ".").
              "..cd..ef"
              2> string:find(<<"ab..cd..ef">>, "..", trailing).
              <<"..ef">>
              3> string:find(<<"ab..cd..ef">>, "x", leading).
              nomatch
              4> string:find("ab..cd..ef", "x", trailing).
              nomatch

       is_empty(String :: unicode:chardata()) -> boolean()

              Returns true if String is the empty string, otherwise false.

              Example:

              1> string:is_empty("foo").
              false
              2> string:is_empty(["",<<>>]).
              true

       length(String :: unicode:chardata()) -> integer() >= 0

              Returns the number of grapheme clusters in String.

              Example:

              1> string:length("ß↑e̊").
              3
              2> string:length(<<195,159,226,134,145,101,204,138>>).
              3

       lexemes(String :: unicode:chardata(),
               SeparatorList :: [grapheme_cluster()]) ->
                  [unicode:chardata()]

              Returns a list of lexemes in String, separated by the grapheme clusters in SeparatorList.

              Notice that, as shown in this example, two or more adjacent separator graphemes clusters in String
              are treated as one. That is, there are no empty strings in the resulting list of lexemes. See also
              split/3 which returns empty strings.

              Notice that [$\r,$\n] is one grapheme cluster.

              Example:

              1> string:lexemes("abc de̊fxxghix jkl\r\nfoo", "x e" ++ [[$\r,$\n]]).
              ["abc","de̊f","ghi","jkl","foo"]
              2> string:lexemes(<<"abc de̊fxxghix jkl\r\nfoo"/utf8>>, "x e" ++ [$\r,$\n]).
              [<<"abc">>,<<"de̊f"/utf8>>,<<"ghi">>,<<"jkl\r\nfoo">>]

       lowercase(String :: unicode:chardata()) -> unicode:chardata()

              Converts String to lowercase.

              Notice that function casefold/1 should be used when converting a string to be tested for equality.

              Example:

              2> string:lowercase(string:uppercase("Michał")).
              "michał"

       next_codepoint(String :: unicode:chardata()) ->
                         maybe_improper_list(char(), unicode:chardata()) |
                         {error, unicode:chardata()}

              Returns the first codepoint in String and the rest of String in the tail. Returns an empty list if
              String is empty or an {error, String} tuple if the next byte is invalid.

              Example:

              1> string:next_codepoint(unicode:characters_to_binary("e̊fg")).
              [101|<<"̊fg"/utf8>>]

       next_grapheme(String :: unicode:chardata()) ->
                        maybe_improper_list(grapheme_cluster(),
                                            unicode:chardata()) |
                        {error, unicode:chardata()}

              Returns  the first grapheme cluster in String and the rest of String in the tail. Returns an empty
              list if String is empty or an {error, String} tuple if the next byte is invalid.

              Example:

              1> string:next_grapheme(unicode:characters_to_binary("e̊fg")).
              ["e̊"|<<"fg">>]

       nth_lexeme(String, N, SeparatorList) -> unicode:chardata()

              Types:

                 String = unicode:chardata()
                 N = integer() >= 0
                 SeparatorList = [grapheme_cluster()]

              Returns lexeme number N in String, where  lexemes  are  separated  by  the  grapheme  clusters  in
              SeparatorList.

              Example:

              1> string:nth_lexeme("abc.de̊f.ghiejkl", 3, ".e").
              "ghi"

       pad(String, Length) -> unicode:charlist()

       pad(String, Length, Dir) -> unicode:charlist()

       pad(String, Length, Dir, Char) -> unicode:charlist()

              Types:

                 String = unicode:chardata()
                 Length = integer()
                 Dir = direction() | both
                 Char = grapheme_cluster()

              Pads  String  to  Length with grapheme cluster Char. Dir, which can be leading, trailing, or both,
              indicates where the padding should be added.

              By default, Char is $\s and Dir is trailing.

              Example:

              1> string:pad(<<"He̊llö"/utf8>>, 8).
              [<<72,101,204,138,108,108,195,182>>,32,32,32]
              2> io:format("'~ts'~n",[string:pad("He̊llö", 8, leading)]).
              3> io:format("'~ts'~n",[string:pad("He̊llö", 8, both)]).

       prefix(String :: unicode:chardata(), Prefix :: unicode:chardata()) ->
                 nomatch | unicode:chardata()

              If Prefix is the prefix of String, removes it and  returns  the  remainder  of  String,  otherwise
              returns nomatch.

              Example:

              1> string:prefix(<<"prefix of string">>, "pre").
              <<"fix of string">>
              2> string:prefix("pre", "prefix").
              nomatch

       replace(String, SearchPattern, Replacement) ->
                  [unicode:chardata()]

       replace(String, SearchPattern, Replacement, Where) ->
                  [unicode:chardata()]

              Types:

                 String = SearchPattern = Replacement = unicode:chardata()
                 Where = direction() | all

              Replaces  SearchPattern  in String with Replacement. Where, default leading, indicates whether the
              leading, the trailing or all encounters of SearchPattern are to be replaced.

              Can be implemented as:

              lists:join(Replacement, split(String, SearchPattern, Where)).

              Example:

              1> string:replace(<<"ab..cd..ef">>, "..", "*").
              [<<"ab">>,"*",<<"cd..ef">>]
              2> string:replace(<<"ab..cd..ef">>, "..", "*", all).
              [<<"ab">>,"*",<<"cd">>,"*",<<"ef">>]

       reverse(String :: unicode:chardata()) -> [grapheme_cluster()]

              Returns the reverse list of the grapheme clusters in String.

              Example:

              1> Reverse = string:reverse(unicode:characters_to_nfd_binary("ÅÄÖ")).
              [[79,776],[65,776],[65,778]]
              2> io:format("~ts~n",[Reverse]).
              ÖÄÅ

       slice(String, Start) -> Slice

       slice(String, Start, Length) -> Slice

              Types:

                 String = unicode:chardata()
                 Start = integer() >= 0
                 Length = infinity | integer() >= 0
                 Slice = unicode:chardata()

              Returns a substring of String of at most Length grapheme clusters, starting at position Start.

              By default, Length is infinity.

              Example:

              1> string:slice(<<"He̊llö Wörld"/utf8>>, 4).
              <<"ö Wörld"/utf8>>
              2> string:slice(["He̊llö ", <<"Wörld"/utf8>>], 4,4).
              "ö Wö"
              3> string:slice(["He̊llö ", <<"Wörld"/utf8>>], 4,50).
              "ö Wörld"

       split(String, SearchPattern) -> [unicode:chardata()]

       split(String, SearchPattern, Where) -> [unicode:chardata()]

              Types:

                 String = SearchPattern = unicode:chardata()
                 Where = direction() | all

              Splits String where SearchPattern is encountered and return the remaining  parts.  Where,  default
              leading, indicates whether the leading, the trailing or all encounters of SearchPattern will split
              String.

              Example:

              0> string:split("ab..bc..cd", "..").
              ["ab","bc..cd"]
              1> string:split(<<"ab..bc..cd">>, "..", trailing).
              [<<"ab..bc">>,<<"cd">>]
              2> string:split(<<"ab..bc....cd">>, "..", all).
              [<<"ab">>,<<"bc">>,<<>>,<<"cd">>]

       take(String, Characters) -> {Leading, Trailing}

       take(String, Characters, Complement) -> {Leading, Trailing}

       take(String, Characters, Complement, Dir) -> {Leading, Trailing}

              Types:

                 String = unicode:chardata()
                 Characters = [grapheme_cluster()]
                 Complement = boolean()
                 Dir = direction()
                 Leading = Trailing = unicode:chardata()

              Takes  characters  from  String  as  long  as  the characters are members of set Characters or the
              complement of set Characters. Dir,  which  can  be  leading  or  trailing,  indicates  from  which
              direction characters are to be taken.

              Example:

              5> string:take("abc0z123", lists:seq($a,$z)).
              {"abc","0z123"}
              6> string:take(<<"abc0z123">>, lists:seq($0,$9), true, leading).
              {<<"abc">>,<<"0z123">>}
              7> string:take("abc0z123", lists:seq($0,$9), false, trailing).
              {"abc0z","123"}
              8> string:take(<<"abc0z123">>, lists:seq($a,$z), true, trailing).
              {<<"abc0z">>,<<"123">>}

       titlecase(String :: unicode:chardata()) -> unicode:chardata()

              Converts String to titlecase.

              Example:

              1> string:titlecase("ß is a SHARP s").
              "Ss is a SHARP s"

       to_float(String) -> {Float, Rest} | {error, Reason}

              Types:

                 String = unicode:chardata()
                 Float = float()
                 Rest = unicode:chardata()
                 Reason = no_float | badarg

              Argument  String  is  expected  to start with a valid text represented float (the digits are ASCII
              values). Remaining characters in the string after the float are returned in Rest.

              Example:

              > {F1,Fs} = string:to_float("1.0-1.0e-1"),
              > {F2,[]} = string:to_float(Fs),
              > F1+F2.
              0.9
              > string:to_float("3/2=1.5").
              {error,no_float}
              > string:to_float("-1.5eX").
              {-1.5,"eX"}

       to_integer(String) -> {Int, Rest} | {error, Reason}

              Types:

                 String = unicode:chardata()
                 Int = integer()
                 Rest = unicode:chardata()
                 Reason = no_integer | badarg

              Argument String is expected to start with a valid text represented integer (the digits  are  ASCII
              values). Remaining characters in the string after the integer are returned in Rest.

              Example:

              > {I1,Is} = string:to_integer("33+22"),
              > {I2,[]} = string:to_integer(Is),
              > I1-I2.
              11
              > string:to_integer("0.5").
              {0,".5"}
              > string:to_integer("x=2").
              {error,no_integer}

       to_graphemes(String :: unicode:chardata()) -> [grapheme_cluster()]

              Converts String to a list of grapheme clusters.

              Example:

              1> string:to_graphemes("ß↑e̊").
              [223,8593,[101,778]]
              2> string:to_graphemes(<<"ß↑e̊"/utf8>>).
              [223,8593,[101,778]]

       trim(String) -> unicode:chardata()

       trim(String, Dir) -> unicode:chardata()

       trim(String, Dir, Characters) -> unicode:chardata()

              Types:

                 String = unicode:chardata()
                 Dir = direction() | both
                 Characters = [grapheme_cluster()]

              Returns  a string, where leading or trailing, or both, Characters have been removed. Dir which can
              be leading, trailing, or both, indicates from which direction characters are to be removed.

              Default  Characters  are   the   set   of   nonbreakable   whitespace   codepoints,   defined   as
              Pattern_White_Space in Unicode Standard Annex #31. By default, Dir is both.

              Notice that [$\r,$\n] is one grapheme cluster according to the Unicode Standard.

              Example:

              1> string:trim("\t Hello \n").
              "Hello"
              2> string:trim(<<"\t Hello \n">>, leading).
              <<"Hello  \n">>
              3> string:trim(<<".Hello.\n">>, trailing, "\n.").
              <<".Hello">>

       uppercase(String :: unicode:chardata()) -> unicode:chardata()

              Converts String to uppercase.

              See also titlecase/1.

              Example:

              1> string:uppercase("Michał").
              "MICHAŁ"

OBSOLETE API FUNCTIONS

       Here follows the function of the old API. These functions only work on a list of Latin-1 characters.

   Note:
       The  functions  are  kept for backward compatibility, but are not recommended. They will be deprecated in
       Erlang/OTP 21.

       Any undocumented functions in string are not to be used.

EXPORTS

       centre(String, Number) -> Centered

       centre(String, Number, Character) -> Centered

              Types:

                 String = Centered = string()
                 Number = integer() >= 0
                 Character = char()

              Returns a string, where String is centered in the string and surrounded by  blanks  or  Character.
              The resulting string has length Number.

              This function is obsolete. Use pad/3.

       chars(Character, Number) -> String

       chars(Character, Number, Tail) -> String

              Types:

                 Character = char()
                 Number = integer() >= 0
                 Tail = String = string()

              Returns  a  string  consisting of Number characters Character. Optionally, the string can end with
              string Tail.

              This function is obsolete. Use lists:duplicate/2.

       chr(String, Character) -> Index

              Types:

                 String = string()
                 Character = char()
                 Index = integer() >= 0

              Returns the index of the first occurrence of Character in String. Returns 0 if Character does  not
              occur.

              This function is obsolete. Use find/2.

       concat(String1, String2) -> String3

              Types:

                 String1 = String2 = String3 = string()

              Concatenates String1 and String2 to form a new string String3, which is returned.

              This   function   is   obsolete.   Use   [String1,   String2]   as   Data   argument,   and   call
              unicode:characters_to_list/2 or unicode:characters_to_binary/2 to flatten the output.

       copies(String, Number) -> Copies

              Types:

                 String = Copies = string()
                 Number = integer() >= 0

              Returns a string containing String repeated Number times.

              This function is obsolete. Use lists:duplicate/2.

       cspan(String, Chars) -> Length

              Types:

                 String = Chars = string()
                 Length = integer() >= 0

              Returns the length of the maximum initial segment of String, which consists entirely of characters
              not from Chars.

              This function is obsolete. Use take/3.

              Example:

              > string:cspan("\t    abcdef", " \t").
              0

       join(StringList, Separator) -> String

              Types:

                 StringList = [string()]
                 Separator = String = string()

              Returns a string with the elements of StringList separated by the string in Separator.

              This function is obsolete. Use lists:join/2.

              Example:

              > join(["one", "two", "three"], ", ").
              "one, two, three"

       left(String, Number) -> Left

       left(String, Number, Character) -> Left

              Types:

                 String = Left = string()
                 Number = integer() >= 0
                 Character = char()

              Returns String with the length adjusted in accordance with Number. The left margin  is  fixed.  If
              length(String) < Number, then String is padded with blanks or Characters.

              This function is obsolete. Use pad/2 or pad/3.

              Example:

              > string:left("Hello",10,$.).
              "Hello....."

       len(String) -> Length

              Types:

                 String = string()
                 Length = integer() >= 0

              Returns the number of characters in String.

              This function is obsolete. Use length/1.

       rchr(String, Character) -> Index

              Types:

                 String = string()
                 Character = char()
                 Index = integer() >= 0

              Returns  the  index of the last occurrence of Character in String. Returns 0 if Character does not
              occur.

              This function is obsolete. Use find/3.

       right(String, Number) -> Right

       right(String, Number, Character) -> Right

              Types:

                 String = Right = string()
                 Number = integer() >= 0
                 Character = char()

              Returns String with the length adjusted in accordance with Number. The right margin is  fixed.  If
              the length of (String) < Number, then String is padded with blanks or Characters.

              This function is obsolete. Use pad/3.

              Example:

              > string:right("Hello", 10, $.).
              ".....Hello"

       rstr(String, SubString) -> Index

              Types:

                 String = SubString = string()
                 Index = integer() >= 0

              Returns  the  position  where  the  last  occurrence  of  SubString begins in String. Returns 0 if
              SubString does not exist in String.

              This function is obsolete. Use find/3.

              Example:

              > string:rstr(" Hello Hello World World ", "Hello World").
              8

       span(String, Chars) -> Length

              Types:

                 String = Chars = string()
                 Length = integer() >= 0

              Returns the length of the maximum initial segment of String, which consists entirely of characters
              from Chars.

              This function is obsolete. Use take/2.

              Example:

              > string:span("\t    abcdef", " \t").
              5

       str(String, SubString) -> Index

              Types:

                 String = SubString = string()
                 Index = integer() >= 0

              Returns the position where the first occurrence of  SubString  begins  in  String.  Returns  0  if
              SubString does not exist in String.

              This function is obsolete. Use find/2.

              Example:

              > string:str(" Hello Hello World World ", "Hello World").
              8

       strip(String :: string()) -> string()

       strip(String, Direction) -> Stripped

       strip(String, Direction, Character) -> Stripped

              Types:

                 String = Stripped = string()
                 Direction = left | right | both
                 Character = char()

              Returns  a  string,  where leading or trailing, or both, blanks or a number of Character have been
              removed. Direction, which can be left, right, or both, indicates from which direction  blanks  are
              to be removed. strip/1 is equivalent to strip(String, both).

              This function is obsolete. Use trim/3.

              Example:

              > string:strip("...Hello.....", both, $.).
              "Hello"

       sub_string(String, Start) -> SubString

       sub_string(String, Start, Stop) -> SubString

              Types:

                 String = SubString = string()
                 Start = Stop = integer() >= 1

              Returns  a  substring  of  String,  starting at position Start to the end of the string, or to and
              including position Stop.

              This function is obsolete. Use slice/3.

              Example:

              sub_string("Hello World", 4, 8).
              "lo Wo"

       substr(String, Start) -> SubString

       substr(String, Start, Length) -> SubString

              Types:

                 String = SubString = string()
                 Start = integer() >= 1
                 Length = integer() >= 0

              Returns a substring of String, starting at position Start, and ending at the end of the string  or
              at length Length.

              This function is obsolete. Use slice/3.

              Example:

              > substr("Hello World", 4, 5).
              "lo Wo"

       sub_word(String, Number) -> Word

       sub_word(String, Number, Character) -> Word

              Types:

                 String = Word = string()
                 Number = integer()
                 Character = char()

              Returns the word in position Number of String. Words are separated by blanks or Characters.

              This function is obsolete. Use nth_lexeme/3.

              Example:

              > string:sub_word(" Hello old boy !",3,$o).
              "ld b"

       to_lower(String) -> Result

       to_lower(Char) -> CharResult

       to_upper(String) -> Result

       to_upper(Char) -> CharResult

              Types:

                 String = Result = io_lib:latin1_string()
                 Char = CharResult = char()

              The  specified  string  or character is case-converted. Notice that the supported character set is
              ISO/IEC 8859-1 (also called Latin 1); all values outside this set are unchanged

              This function is obsolete use lowercase/1, uppercase/1, titlecase/1 or casefold/1.

       tokens(String, SeparatorList) -> Tokens

              Types:

                 String = SeparatorList = string()
                 Tokens = [Token :: nonempty_string()]

              Returns a list of tokens in String, separated by the characters in SeparatorList.

              Example:

              > tokens("abc defxxghix jkl", "x ").
              ["abc", "def", "ghi", "jkl"]

              Notice that, as shown in this example, two or more adjacent separator  characters  in  String  are
              treated as one. That is, there are no empty strings in the resulting list of tokens.

              This function is obsolete. Use lexemes/2.

       words(String) -> Count

       words(String, Character) -> Count

              Types:

                 String = string()
                 Character = char()
                 Count = integer() >= 1

              Returns the number of words in String, separated by blanks or Character.

              This function is obsolete. Use lexemes/2.

              Example:

              > words(" Hello old boy!", $o).
              4

NOTES

       Some  of  the  general  string  functions  can seem to overlap each other. The reason is that this string
       package is the combination of two earlier packages and all functions of both packages have been retained.

Ericsson AB                                       stdlib 3.4.3                                      string(3erl)