Provided by: erlang-manpages_20.2.2+dfsg-1ubuntu2_all bug

NAME

       string - String processing functions.

DESCRIPTION

       This module provides functions for string processing.

       A  string  in  this  module is represented by unicode:chardata(), that is, a list of codepoints, binaries
       with UTF-8-encoded codepoints (UTF-8 binaries), or a mix of the two.

       "abcd"               is a valid string
       <<"abcd">>           is a valid string
       ["abcd"]             is a valid string
       <<"abc..åäö"/utf8>>  is a valid string
       <<"abc..åäö">>       is NOT a valid string,
                            but a binary with Latin-1-encoded codepoints
       [<<"abc">>, "..åäö"] is a valid string
       [atom]               is NOT a valid string

       This module operates on grapheme clusters. A grapheme cluster is a user-perceived character, which can be
       represented by several codepoints.

       "å"  [229] or [97, 778]
       "e̊"  [101, 778]

       The  string  length of "ß↑e̊" is 3, even though it is represented by the codepoints [223,8593,101,778] or
       the UTF-8 binary <<195,159,226,134,145,101,204,138>>.

       Grapheme clusters for codepoints of class prepend and non-modern (or decomposed) Hangul  is  not  handled
       for performance reasons in find/3, replace/3, split/2, split/2 and trim/3.

       Splitting and appending strings is to be done on grapheme clusters borders. There is no verification that
       the results of appending strings are valid or normalized.

       Most  of  the  functions  expect  all  input  to  be  normalized   to   one   form,   see   for   example
       unicode:characters_to_nfc_list/1.

       Language or locale specific handling of input is not considered in any function.

       The functions can crash for non-valid input strings. For example, the functions expect UTF-8 binaries but
       not all functions verify that all binaries are encoded correctly.

       Unless otherwise specified the return value type is the same as the input type.  That  is,  binary  input
       returns binary output, list input returns a list output, and mixed input can return a mixed output.

       1> string:trim("  sarah  ").
       "sarah"
       2> string:trim(<<"  sarah  ">>).
       <<"sarah">>
       3> string:lexemes("foo bar", " ").
       ["foo","bar"]
       4> string:lexemes(<<"foo bar">>, " ").
       [<<"foo">>,<<"bar">>]

       This  module  has  been  reworked  in  Erlang/OTP 20 to handle unicode:chardata() and operate on grapheme
       clusters. The old functions that only work on Latin-1 lists as input are still available but  should  not
       be used. They will be deprecated in Erlang/OTP 21.

DATA TYPES

       direction() = leading | trailing

       grapheme_cluster() = char() | [char()]

              A user-perceived character, consisting of one or more codepoints.

EXPORTS

       casefold(String :: unicode:chardata()) -> unicode:chardata()

              Converts  String  to  a  case-agnostic  comparable  string.  Function casefold/1 is preferred over
              lowercase/1 when two strings are to be compared for equality. See also equal/4.

              Example:

              1> string:casefold("Ω and ẞ SHARP S").
              "ω and ss sharp s"

       chomp(String :: unicode:chardata()) -> unicode:chardata()

              Returns a string where any trailing \n or \r\n have been removed from String.

              Example:

              182> string:chomp(<<"\nHello\n\n">>).
              <<"\nHello">>
              183> string:chomp("\nHello\r\r\n").
              "\nHello\r"

       equal(A, B) -> boolean()

       equal(A, B, IgnoreCase) -> boolean()

       equal(A, B, IgnoreCase, Norm) -> boolean()

              Types:

                 A = B = unicode:chardata()
                 IgnoreCase = boolean()
                 Norm = none | nfc | nfd | nfkc | nfkd

              Returns true if A and B are equal, otherwise false.

              If IgnoreCase is true the function does casefolding on the fly before the equality test.

              If Norm is not none the function applies normalization on the fly before the equality test.  There
              are four available normalization forms: nfc, nfd, nfkc, and nfkd.

              By default, IgnoreCase is false and Norm is none.

              Example:

              1> string:equal("åäö", <<"åäö"/utf8>>).
              true
              2> string:equal("åäö", unicode:characters_to_nfd_binary("åäö")).
              false
              3> string:equal("åäö", unicode:characters_to_nfd_binary("ÅÄÖ"), true, nfc).
              true

       find(String, SearchPattern) -> unicode:chardata() | nomatch

       find(String, SearchPattern, Dir) -> unicode:chardata() | nomatch

              Types:

                 String = SearchPattern = unicode:chardata()
                 Dir = direction()

              Removes anything before SearchPattern in String and returns the remainder of the string or nomatch
              if SearchPattern is not found. Dir, which  can  be  leading  or  trailing,  indicates  from  which
              direction characters are to be searched.

              By default, Dir is leading.

              Example:

              1> string:find("ab..cd..ef", ".").
              "..cd..ef"
              2> string:find(<<"ab..cd..ef">>, "..", trailing).
              <<"..ef">>
              3> string:find(<<"ab..cd..ef">>, "x", leading).
              nomatch
              4> string:find("ab..cd..ef", "x", trailing).
              nomatch

       is_empty(String :: unicode:chardata()) -> boolean()

              Returns true if String is the empty string, otherwise false.

              Example:

              1> string:is_empty("foo").
              false
              2> string:is_empty(["",<<>>]).
              true

       length(String :: unicode:chardata()) -> integer() >= 0

              Returns the number of grapheme clusters in String.

              Example:

              1> string:length("ß↑e̊").
              3
              2> string:length(<<195,159,226,134,145,101,204,138>>).
              3

       lexemes(String :: unicode:chardata(),
               SeparatorList :: [grapheme_cluster()]) ->
                  [unicode:chardata()]

              Returns a list of lexemes in String, separated by the grapheme clusters in SeparatorList.

              Notice that, as shown in this example, two or more adjacent separator graphemes clusters in String
              are treated as one. That is, there are no empty strings in the resulting list of lexemes. See also
              split/3 which returns empty strings.

              Notice that [$\r,$\n] is one grapheme cluster.

              Example:

              1> string:lexemes("abc de̊fxxghix jkl\r\nfoo", "x e" ++ [[$\r,$\n]]).
              ["abc","de̊f","ghi","jkl","foo"]
              2> string:lexemes(<<"abc de̊fxxghix jkl\r\nfoo"/utf8>>, "x e" ++ [$\r,$\n]).
              [<<"abc">>,<<"de̊f"/utf8>>,<<"ghi">>,<<"jkl\r\nfoo">>]

       lowercase(String :: unicode:chardata()) -> unicode:chardata()

              Converts String to lowercase.

              Notice that function casefold/1 should be used when converting a string to be tested for equality.

              Example:

              2> string:lowercase(string:uppercase("Michał")).
              "michał"

       next_codepoint(String :: unicode:chardata()) ->
                         maybe_improper_list(char(), unicode:chardata()) |
                         {error, unicode:chardata()}

              Returns the first codepoint in String and the rest of String in the tail. Returns an empty list if
              String is empty or an {error, String} tuple if the next byte is invalid.

              Example:

              1> string:next_codepoint(unicode:characters_to_binary("e̊fg")).
              [101|<<"̊fg"/utf8>>]

       next_grapheme(String :: unicode:chardata()) ->
                        maybe_improper_list(grapheme_cluster(),
                                            unicode:chardata()) |
                        {error, unicode:chardata()}

              Returns the first grapheme cluster in String and the rest of String in the tail. Returns an  empty
              list if String is empty or an {error, String} tuple if the next byte is invalid.

              Example:

              1> string:next_grapheme(unicode:characters_to_binary("e̊fg")).
              ["e̊"|<<"fg">>]

       nth_lexeme(String, N, SeparatorList) -> unicode:chardata()

              Types:

                 String = unicode:chardata()
                 N = integer() >= 0
                 SeparatorList = [grapheme_cluster()]

              Returns  lexeme  number  N  in  String,  where  lexemes  are separated by the grapheme clusters in
              SeparatorList.

              Example:

              1> string:nth_lexeme("abc.de̊f.ghiejkl", 3, ".e").
              "ghi"

       pad(String, Length) -> unicode:charlist()

       pad(String, Length, Dir) -> unicode:charlist()

       pad(String, Length, Dir, Char) -> unicode:charlist()

              Types:

                 String = unicode:chardata()
                 Length = integer()
                 Dir = direction() | both
                 Char = grapheme_cluster()

              Pads String to Length with grapheme cluster Char. Dir, which can be leading,  trailing,  or  both,
              indicates where the padding should be added.

              By default, Char is $\s and Dir is trailing.

              Example:

              1> string:pad(<<"He̊llö"/utf8>>, 8).
              [<<72,101,204,138,108,108,195,182>>,32,32,32]
              2> io:format("'~ts'~n",[string:pad("He̊llö", 8, leading)]).
              3> io:format("'~ts'~n",[string:pad("He̊llö", 8, both)]).

       prefix(String :: unicode:chardata(), Prefix :: unicode:chardata()) ->
                 nomatch | unicode:chardata()

              If  Prefix  is  the  prefix  of  String, removes it and returns the remainder of String, otherwise
              returns nomatch.

              Example:

              1> string:prefix(<<"prefix of string">>, "pre").
              <<"fix of string">>
              2> string:prefix("pre", "prefix").
              nomatch

       replace(String, SearchPattern, Replacement) ->
                  [unicode:chardata()]

       replace(String, SearchPattern, Replacement, Where) ->
                  [unicode:chardata()]

              Types:

                 String = SearchPattern = Replacement = unicode:chardata()
                 Where = direction() | all

              Replaces SearchPattern in String with Replacement. Where, default leading, indicates  whether  the
              leading, the trailing or all encounters of SearchPattern are to be replaced.

              Can be implemented as:

              lists:join(Replacement, split(String, SearchPattern, Where)).

              Example:

              1> string:replace(<<"ab..cd..ef">>, "..", "*").
              [<<"ab">>,"*",<<"cd..ef">>]
              2> string:replace(<<"ab..cd..ef">>, "..", "*", all).
              [<<"ab">>,"*",<<"cd">>,"*",<<"ef">>]

       reverse(String :: unicode:chardata()) -> [grapheme_cluster()]

              Returns the reverse list of the grapheme clusters in String.

              Example:

              1> Reverse = string:reverse(unicode:characters_to_nfd_binary("ÅÄÖ")).
              [[79,776],[65,776],[65,778]]
              2> io:format("~ts~n",[Reverse]).
              ÖÄÅ

       slice(String, Start) -> Slice

       slice(String, Start, Length) -> Slice

              Types:

                 String = unicode:chardata()
                 Start = integer() >= 0
                 Length = infinity | integer() >= 0
                 Slice = unicode:chardata()

              Returns a substring of String of at most Length grapheme clusters, starting at position Start.

              By default, Length is infinity.

              Example:

              1> string:slice(<<"He̊llö Wörld"/utf8>>, 4).
              <<"ö Wörld"/utf8>>
              2> string:slice(["He̊llö ", <<"Wörld"/utf8>>], 4,4).
              "ö Wö"
              3> string:slice(["He̊llö ", <<"Wörld"/utf8>>], 4,50).
              "ö Wörld"

       split(String, SearchPattern) -> [unicode:chardata()]

       split(String, SearchPattern, Where) -> [unicode:chardata()]

              Types:

                 String = SearchPattern = unicode:chardata()
                 Where = direction() | all

              Splits  String  where  SearchPattern is encountered and return the remaining parts. Where, default
              leading, indicates whether the leading, the trailing or all encounters of SearchPattern will split
              String.

              Example:

              0> string:split("ab..bc..cd", "..").
              ["ab","bc..cd"]
              1> string:split(<<"ab..bc..cd">>, "..", trailing).
              [<<"ab..bc">>,<<"cd">>]
              2> string:split(<<"ab..bc....cd">>, "..", all).
              [<<"ab">>,<<"bc">>,<<>>,<<"cd">>]

       take(String, Characters) -> {Leading, Trailing}

       take(String, Characters, Complement) -> {Leading, Trailing}

       take(String, Characters, Complement, Dir) -> {Leading, Trailing}

              Types:

                 String = unicode:chardata()
                 Characters = [grapheme_cluster()]
                 Complement = boolean()
                 Dir = direction()
                 Leading = Trailing = unicode:chardata()

              Takes  characters  from  String  as  long  as  the characters are members of set Characters or the
              complement of set Characters. Dir,  which  can  be  leading  or  trailing,  indicates  from  which
              direction characters are to be taken.

              Example:

              5> string:take("abc0z123", lists:seq($a,$z)).
              {"abc","0z123"}
              6> string:take(<<"abc0z123">>, lists:seq($0,$9), true, leading).
              {<<"abc">>,<<"0z123">>}
              7> string:take("abc0z123", lists:seq($0,$9), false, trailing).
              {"abc0z","123"}
              8> string:take(<<"abc0z123">>, lists:seq($a,$z), true, trailing).
              {<<"abc0z">>,<<"123">>}

       titlecase(String :: unicode:chardata()) -> unicode:chardata()

              Converts String to titlecase.

              Example:

              1> string:titlecase("ß is a SHARP s").
              "Ss is a SHARP s"

       to_float(String) -> {Float, Rest} | {error, Reason}

              Types:

                 String = unicode:chardata()
                 Float = float()
                 Rest = unicode:chardata()
                 Reason = no_float | badarg

              Argument  String  is  expected  to start with a valid text represented float (the digits are ASCII
              values). Remaining characters in the string after the float are returned in Rest.

              Example:

              > {F1,Fs} = string:to_float("1.0-1.0e-1"),
              > {F2,[]} = string:to_float(Fs),
              > F1+F2.
              0.9
              > string:to_float("3/2=1.5").
              {error,no_float}
              > string:to_float("-1.5eX").
              {-1.5,"eX"}

       to_integer(String) -> {Int, Rest} | {error, Reason}

              Types:

                 String = unicode:chardata()
                 Int = integer()
                 Rest = unicode:chardata()
                 Reason = no_integer | badarg

              Argument String is expected to start with a valid text represented integer (the digits  are  ASCII
              values). Remaining characters in the string after the integer are returned in Rest.

              Example:

              > {I1,Is} = string:to_integer("33+22"),
              > {I2,[]} = string:to_integer(Is),
              > I1-I2.
              11
              > string:to_integer("0.5").
              {0,".5"}
              > string:to_integer("x=2").
              {error,no_integer}

       to_graphemes(String :: unicode:chardata()) -> [grapheme_cluster()]

              Converts String to a list of grapheme clusters.

              Example:

              1> string:to_graphemes("ß↑e̊").
              [223,8593,[101,778]]
              2> string:to_graphemes(<<"ß↑e̊"/utf8>>).
              [223,8593,[101,778]]

       trim(String) -> unicode:chardata()

       trim(String, Dir) -> unicode:chardata()

       trim(String, Dir, Characters) -> unicode:chardata()

              Types:

                 String = unicode:chardata()
                 Dir = direction() | both
                 Characters = [grapheme_cluster()]

              Returns  a string, where leading or trailing, or both, Characters have been removed. Dir which can
              be leading, trailing, or both, indicates from which direction characters are to be removed.

              Default  Characters  are   the   set   of   nonbreakable   whitespace   codepoints,   defined   as
              Pattern_White_Space in Unicode Standard Annex #31. By default, Dir is both.

              Notice that [$\r,$\n] is one grapheme cluster according to the Unicode Standard.

              Example:

              1> string:trim("\t Hello \n").
              "Hello"
              2> string:trim(<<"\t Hello \n">>, leading).
              <<"Hello  \n">>
              3> string:trim(<<".Hello.\n">>, trailing, "\n.").
              <<".Hello">>

       uppercase(String :: unicode:chardata()) -> unicode:chardata()

              Converts String to uppercase.

              See also titlecase/1.

              Example:

              1> string:uppercase("Michał").
              "MICHAŁ"

OBSOLETE API FUNCTIONS

       Here follows the function of the old API. These functions only work on a list of Latin-1 characters.

   Note:
       The  functions  are  kept for backward compatibility, but are not recommended. They will be deprecated in
       Erlang/OTP 21.

       Any undocumented functions in string are not to be used.

EXPORTS

       centre(String, Number) -> Centered

       centre(String, Number, Character) -> Centered

              Types:

                 String = Centered = string()
                 Number = integer() >= 0
                 Character = char()

              Returns a string, where String is centered in the string and surrounded by  blanks  or  Character.
              The resulting string has length Number.

              This function is obsolete. Use pad/3.

       chars(Character, Number) -> String

       chars(Character, Number, Tail) -> String

              Types:

                 Character = char()
                 Number = integer() >= 0
                 Tail = String = string()

              Returns  a  string  consisting of Number characters Character. Optionally, the string can end with
              string Tail.

              This function is obsolete. Use lists:duplicate/2.

       chr(String, Character) -> Index

              Types:

                 String = string()
                 Character = char()
                 Index = integer() >= 0

              Returns the index of the first occurrence of Character in String. Returns 0 if Character does  not
              occur.

              This function is obsolete. Use find/2.

       concat(String1, String2) -> String3

              Types:

                 String1 = String2 = String3 = string()

              Concatenates String1 and String2 to form a new string String3, which is returned.

              This   function   is   obsolete.   Use   [String1,   String2]   as   Data   argument,   and   call
              unicode:characters_to_list/2 or unicode:characters_to_binary/2 to flatten the output.

       copies(String, Number) -> Copies

              Types:

                 String = Copies = string()
                 Number = integer() >= 0

              Returns a string containing String repeated Number times.

              This function is obsolete. Use lists:duplicate/2.

       cspan(String, Chars) -> Length

              Types:

                 String = Chars = string()
                 Length = integer() >= 0

              Returns the length of the maximum initial segment of String, which consists entirely of characters
              not from Chars.

              This function is obsolete. Use take/3.

              Example:

              > string:cspan("\t    abcdef", " \t").
              0

       join(StringList, Separator) -> String

              Types:

                 StringList = [string()]
                 Separator = String = string()

              Returns a string with the elements of StringList separated by the string in Separator.

              This function is obsolete. Use lists:join/2.

              Example:

              > join(["one", "two", "three"], ", ").
              "one, two, three"

       left(String, Number) -> Left

       left(String, Number, Character) -> Left

              Types:

                 String = Left = string()
                 Number = integer() >= 0
                 Character = char()

              Returns  String  with  the length adjusted in accordance with Number. The left margin is fixed. If
              length(String) < Number, then String is padded with blanks or Characters.

              This function is obsolete. Use pad/2 or pad/3.

              Example:

              > string:left("Hello",10,$.).
              "Hello....."

       len(String) -> Length

              Types:

                 String = string()
                 Length = integer() >= 0

              Returns the number of characters in String.

              This function is obsolete. Use length/1.

       rchr(String, Character) -> Index

              Types:

                 String = string()
                 Character = char()
                 Index = integer() >= 0

              Returns the index of the last occurrence of Character in String. Returns 0 if Character  does  not
              occur.

              This function is obsolete. Use find/3.

       right(String, Number) -> Right

       right(String, Number, Character) -> Right

              Types:

                 String = Right = string()
                 Number = integer() >= 0
                 Character = char()

              Returns  String  with the length adjusted in accordance with Number. The right margin is fixed. If
              the length of (String) < Number, then String is padded with blanks or Characters.

              This function is obsolete. Use pad/3.

              Example:

              > string:right("Hello", 10, $.).
              ".....Hello"

       rstr(String, SubString) -> Index

              Types:

                 String = SubString = string()
                 Index = integer() >= 0

              Returns the position where the last occurrence  of  SubString  begins  in  String.  Returns  0  if
              SubString does not exist in String.

              This function is obsolete. Use find/3.

              Example:

              > string:rstr(" Hello Hello World World ", "Hello World").
              8

       span(String, Chars) -> Length

              Types:

                 String = Chars = string()
                 Length = integer() >= 0

              Returns the length of the maximum initial segment of String, which consists entirely of characters
              from Chars.

              This function is obsolete. Use take/2.

              Example:

              > string:span("\t    abcdef", " \t").
              5

       str(String, SubString) -> Index

              Types:

                 String = SubString = string()
                 Index = integer() >= 0

              Returns the position where the first occurrence of  SubString  begins  in  String.  Returns  0  if
              SubString does not exist in String.

              This function is obsolete. Use find/2.

              Example:

              > string:str(" Hello Hello World World ", "Hello World").
              8

       strip(String :: string()) -> string()

       strip(String, Direction) -> Stripped

       strip(String, Direction, Character) -> Stripped

              Types:

                 String = Stripped = string()
                 Direction = left | right | both
                 Character = char()

              Returns  a  string,  where leading or trailing, or both, blanks or a number of Character have been
              removed. Direction, which can be left, right, or both, indicates from which direction  blanks  are
              to be removed. strip/1 is equivalent to strip(String, both).

              This function is obsolete. Use trim/3.

              Example:

              > string:strip("...Hello.....", both, $.).
              "Hello"

       sub_string(String, Start) -> SubString

       sub_string(String, Start, Stop) -> SubString

              Types:

                 String = SubString = string()
                 Start = Stop = integer() >= 1

              Returns  a  substring  of  String,  starting at position Start to the end of the string, or to and
              including position Stop.

              This function is obsolete. Use slice/3.

              Example:

              sub_string("Hello World", 4, 8).
              "lo Wo"

       substr(String, Start) -> SubString

       substr(String, Start, Length) -> SubString

              Types:

                 String = SubString = string()
                 Start = integer() >= 1
                 Length = integer() >= 0

              Returns a substring of String, starting at position Start, and ending at the end of the string  or
              at length Length.

              This function is obsolete. Use slice/3.

              Example:

              > substr("Hello World", 4, 5).
              "lo Wo"

       sub_word(String, Number) -> Word

       sub_word(String, Number, Character) -> Word

              Types:

                 String = Word = string()
                 Number = integer()
                 Character = char()

              Returns the word in position Number of String. Words are separated by blanks or Characters.

              This function is obsolete. Use nth_lexeme/3.

              Example:

              > string:sub_word(" Hello old boy !",3,$o).
              "ld b"

       to_lower(String) -> Result

       to_lower(Char) -> CharResult

       to_upper(String) -> Result

       to_upper(Char) -> CharResult

              Types:

                 String = Result = io_lib:latin1_string()
                 Char = CharResult = char()

              The  specified  string  or character is case-converted. Notice that the supported character set is
              ISO/IEC 8859-1 (also called Latin 1); all values outside this set are unchanged

              This function is obsolete use lowercase/1, uppercase/1, titlecase/1 or casefold/1.

       tokens(String, SeparatorList) -> Tokens

              Types:

                 String = SeparatorList = string()
                 Tokens = [Token :: nonempty_string()]

              Returns a list of tokens in String, separated by the characters in SeparatorList.

              Example:

              > tokens("abc defxxghix jkl", "x ").
              ["abc", "def", "ghi", "jkl"]

              Notice that, as shown in this example, two or more adjacent separator  characters  in  String  are
              treated as one. That is, there are no empty strings in the resulting list of tokens.

              This function is obsolete. Use lexemes/2.

       words(String) -> Count

       words(String, Character) -> Count

              Types:

                 String = string()
                 Character = char()
                 Count = integer() >= 1

              Returns the number of words in String, separated by blanks or Character.

              This function is obsolete. Use lexemes/2.

              Example:

              > words(" Hello old boy!", $o).
              4

NOTES

       Some  of  the  general  string  functions  can seem to overlap each other. The reason is that this string
       package is the combination of two earlier packages and all functions of both packages have been retained.