Ubuntu Manpage: string - String processing functions.

Provided by: erlang-manpages_20.2.2+dfsg-1ubuntu2_all

NAME

       string - String processing functions.

DESCRIPTION

       This module provides functions for string processing.

       A  string  in  this  module  is  represented  by  unicode:chardata(),  that  is, a list of
       codepoints, binaries with UTF-8-encoded codepoints (UTF-8 binaries), or a mix of the two.

       "abcd"               is a valid string
       <<"abcd">>           is a valid string
       ["abcd"]             is a valid string
       <<"abc..åäö"/utf8>>  is a valid string
       <<"abc..åäö">>       is NOT a valid string,
                            but a binary with Latin-1-encoded codepoints
       [<<"abc">>, "..åäö"] is a valid string
       [atom]               is NOT a valid string

       This module operates  on  grapheme  clusters.  A  grapheme  cluster  is  a  user-perceived
       character, which can be represented by several codepoints.

       "å"  [229] or [97, 778]
       "e̊"  [101, 778]

       The  string  length  of  "ß↑e̊"  is  3,  even  though  it is represented by the codepoints
       [223,8593,101,778] or the UTF-8 binary <<195,159,226,134,145,101,204,138>>.

       Grapheme clusters for codepoints of class prepend and non-modern (or decomposed) Hangul is
       not handled for performance reasons in find/3, replace/3, split/2, split/2 and trim/3.

       Splitting  and  appending  strings is to be done on grapheme clusters borders. There is no
       verification that the results of appending strings are valid or normalized.

       Most of the functions expect all input to be normalized  to  one  form,  see  for  example
       unicode:characters_to_nfc_list/1.

       Language or locale specific handling of input is not considered in any function.

       The  functions  can  crash  for non-valid input strings. For example, the functions expect
       UTF-8 binaries but not all functions verify that all binaries are encoded correctly.

       Unless otherwise specified the return value type is the same as the input type.  That  is,
       binary  input returns binary output, list input returns a list output, and mixed input can
       return a mixed output.

       1> string:trim("  sarah  ").
       "sarah"
       2> string:trim(<<"  sarah  ">>).
       <<"sarah">>
       3> string:lexemes("foo bar", " ").
       ["foo","bar"]
       4> string:lexemes(<<"foo bar">>, " ").
       [<<"foo">>,<<"bar">>]

       This module has been reworked in Erlang/OTP 20 to handle unicode:chardata() and operate on
       grapheme  clusters.  The  old functions that only work on Latin-1 lists as input are still
       available but should not be used. They will be deprecated in Erlang/OTP 21.

DATA TYPES

       direction() = leading | trailing

       grapheme_cluster() = char() | [char()]

              A user-perceived character, consisting of one or more codepoints.

EXPORTS

       casefold(String :: unicode:chardata()) -> unicode:chardata()

              Converts String to  a  case-agnostic  comparable  string.  Function  casefold/1  is
              preferred  over  lowercase/1  when two strings are to be compared for equality. See
              also equal/4.

              Example:

              1> string:casefold("Ω and ẞ SHARP S").
              "ω and ss sharp s"

       chomp(String :: unicode:chardata()) -> unicode:chardata()

              Returns a string where any trailing \n or \r\n have been removed from String.

              Example:

              182> string:chomp(<<"\nHello\n\n">>).
              <<"\nHello">>
              183> string:chomp("\nHello\r\r\n").
              "\nHello\r"

       equal(A, B) -> boolean()

       equal(A, B, IgnoreCase) -> boolean()

       equal(A, B, IgnoreCase, Norm) -> boolean()

              Types:

                 A = B = unicode:chardata()
                 IgnoreCase = boolean()
                 Norm = none | nfc | nfd | nfkc | nfkd

              Returns true if A and B are equal, otherwise false.

              If IgnoreCase is true the function does casefolding on the fly before the  equality
              test.

              If  Norm  is  not  none  the  function  applies normalization on the fly before the
              equality test. There are four available normalization forms: nfc,  nfd,  nfkc,  and
              nfkd.

              By default, IgnoreCase is false and Norm is none.

              Example:

              1> string:equal("åäö", <<"åäö"/utf8>>).
              true
              2> string:equal("åäö", unicode:characters_to_nfd_binary("åäö")).
              false
              3> string:equal("åäö", unicode:characters_to_nfd_binary("ÅÄÖ"), true, nfc).
              true

       find(String, SearchPattern) -> unicode:chardata() | nomatch

       find(String, SearchPattern, Dir) -> unicode:chardata() | nomatch

              Types:

                 String = SearchPattern = unicode:chardata()
                 Dir = direction()

              Removes  anything  before  SearchPattern in String and returns the remainder of the
              string or nomatch if SearchPattern is not found.  Dir,  which  can  be  leading  or
              trailing, indicates from which direction characters are to be searched.

              By default, Dir is leading.

              Example:

              1> string:find("ab..cd..ef", ".").
              "..cd..ef"
              2> string:find(<<"ab..cd..ef">>, "..", trailing).
              <<"..ef">>
              3> string:find(<<"ab..cd..ef">>, "x", leading).
              nomatch
              4> string:find("ab..cd..ef", "x", trailing).
              nomatch

       is_empty(String :: unicode:chardata()) -> boolean()

              Returns true if String is the empty string, otherwise false.

              Example:

              1> string:is_empty("foo").
              false
              2> string:is_empty(["",<<>>]).
              true

       length(String :: unicode:chardata()) -> integer() >= 0

              Returns the number of grapheme clusters in String.

              Example:

              1> string:length("ß↑e̊").
              3
              2> string:length(<<195,159,226,134,145,101,204,138>>).
              3

       lexemes(String :: unicode:chardata(),
               SeparatorList :: [grapheme_cluster()]) ->
                  [unicode:chardata()]

              Returns  a  list  of  lexemes  in  String,  separated  by  the grapheme clusters in
              SeparatorList.

              Notice that, as shown in this example, two or  more  adjacent  separator  graphemes
              clusters  in  String are treated as one. That is, there are no empty strings in the
              resulting list of lexemes. See also split/3 which returns empty strings.

              Notice that [$\r,$\n] is one grapheme cluster.

              Example:

              1> string:lexemes("abc de̊fxxghix jkl\r\nfoo", "x e" ++ [[$\r,$\n]]).
              ["abc","de̊f","ghi","jkl","foo"]
              2> string:lexemes(<<"abc de̊fxxghix jkl\r\nfoo"/utf8>>, "x e" ++ [$\r,$\n]).
              [<<"abc">>,<<"de̊f"/utf8>>,<<"ghi">>,<<"jkl\r\nfoo">>]

       lowercase(String :: unicode:chardata()) -> unicode:chardata()

              Converts String to lowercase.

              Notice that function casefold/1 should be used  when  converting  a  string  to  be
              tested for equality.

              Example:

              2> string:lowercase(string:uppercase("Michał")).
              "michał"

       next_codepoint(String :: unicode:chardata()) ->
                         maybe_improper_list(char(), unicode:chardata()) |
                         {error, unicode:chardata()}

              Returns  the  first codepoint in String and the rest of String in the tail. Returns
              an empty list if String is empty or an {error, String} tuple if the  next  byte  is
              invalid.

              Example:

              1> string:next_codepoint(unicode:characters_to_binary("e̊fg")).
              [101|<<"̊fg"/utf8>>]

       next_grapheme(String :: unicode:chardata()) ->
                        maybe_improper_list(grapheme_cluster(),
                                            unicode:chardata()) |
                        {error, unicode:chardata()}

              Returns  the  first  grapheme cluster in String and the rest of String in the tail.
              Returns an empty list if String is empty or an {error, String} tuple  if  the  next
              byte is invalid.

              Example:

              1> string:next_grapheme(unicode:characters_to_binary("e̊fg")).
              ["e̊"|<<"fg">>]

       nth_lexeme(String, N, SeparatorList) -> unicode:chardata()

              Types:

                 String = unicode:chardata()
                 N = integer() >= 0
                 SeparatorList = [grapheme_cluster()]

              Returns  lexeme  number  N  in  String, where lexemes are separated by the grapheme
              clusters in SeparatorList.

              Example:

              1> string:nth_lexeme("abc.de̊f.ghiejkl", 3, ".e").
              "ghi"

       pad(String, Length) -> unicode:charlist()

       pad(String, Length, Dir) -> unicode:charlist()

       pad(String, Length, Dir, Char) -> unicode:charlist()

              Types:

                 String = unicode:chardata()
                 Length = integer()
                 Dir = direction() | both
                 Char = grapheme_cluster()

              Pads String to Length with grapheme  cluster  Char.  Dir,  which  can  be  leading,
              trailing, or both, indicates where the padding should be added.

              By default, Char is $\s and Dir is trailing.

              Example:

              1> string:pad(<<"He̊llö"/utf8>>, 8).
              [<<72,101,204,138,108,108,195,182>>,32,32,32]
              2> io:format("'~ts'~n",[string:pad("He̊llö", 8, leading)]).
              3> io:format("'~ts'~n",[string:pad("He̊llö", 8, both)]).

       prefix(String :: unicode:chardata(), Prefix :: unicode:chardata()) ->
                 nomatch | unicode:chardata()

              If  Prefix is the prefix of String, removes it and returns the remainder of String,
              otherwise returns nomatch.

              Example:

              1> string:prefix(<<"prefix of string">>, "pre").
              <<"fix of string">>
              2> string:prefix("pre", "prefix").
              nomatch

       replace(String, SearchPattern, Replacement) ->
                  [unicode:chardata()]

       replace(String, SearchPattern, Replacement, Where) ->
                  [unicode:chardata()]

              Types:

                 String = SearchPattern = Replacement = unicode:chardata()
                 Where = direction() | all

              Replaces  SearchPattern  in  String  with  Replacement.  Where,  default   leading,
              indicates  whether the leading, the trailing or all encounters of SearchPattern are
              to be replaced.

              Can be implemented as:

              lists:join(Replacement, split(String, SearchPattern, Where)).

              Example:

              1> string:replace(<<"ab..cd..ef">>, "..", "*").
              [<<"ab">>,"*",<<"cd..ef">>]
              2> string:replace(<<"ab..cd..ef">>, "..", "*", all).
              [<<"ab">>,"*",<<"cd">>,"*",<<"ef">>]

       reverse(String :: unicode:chardata()) -> [grapheme_cluster()]

              Returns the reverse list of the grapheme clusters in String.

              Example:

              1> Reverse = string:reverse(unicode:characters_to_nfd_binary("ÅÄÖ")).
              [[79,776],[65,776],[65,778]]
              2> io:format("~ts~n",[Reverse]).
              ÖÄÅ

       slice(String, Start) -> Slice

       slice(String, Start, Length) -> Slice

              Types:

                 String = unicode:chardata()
                 Start = integer() >= 0
                 Length = infinity | integer() >= 0
                 Slice = unicode:chardata()

              Returns a substring of String of at most  Length  grapheme  clusters,  starting  at
              position Start.

              By default, Length is infinity.

              Example:

              1> string:slice(<<"He̊llö Wörld"/utf8>>, 4).
              <<"ö Wörld"/utf8>>
              2> string:slice(["He̊llö ", <<"Wörld"/utf8>>], 4,4).
              "ö Wö"
              3> string:slice(["He̊llö ", <<"Wörld"/utf8>>], 4,50).
              "ö Wörld"

       split(String, SearchPattern) -> [unicode:chardata()]

       split(String, SearchPattern, Where) -> [unicode:chardata()]

              Types:

                 String = SearchPattern = unicode:chardata()
                 Where = direction() | all

              Splits  String  where  SearchPattern is encountered and return the remaining parts.
              Where, default  leading,  indicates  whether  the  leading,  the  trailing  or  all
              encounters of SearchPattern will split String.

              Example:

              0> string:split("ab..bc..cd", "..").
              ["ab","bc..cd"]
              1> string:split(<<"ab..bc..cd">>, "..", trailing).
              [<<"ab..bc">>,<<"cd">>]
              2> string:split(<<"ab..bc....cd">>, "..", all).
              [<<"ab">>,<<"bc">>,<<>>,<<"cd">>]

       take(String, Characters) -> {Leading, Trailing}

       take(String, Characters, Complement) -> {Leading, Trailing}

       take(String, Characters, Complement, Dir) -> {Leading, Trailing}

              Types:

                 String = unicode:chardata()
                 Characters = [grapheme_cluster()]
                 Complement = boolean()
                 Dir = direction()
                 Leading = Trailing = unicode:chardata()

              Takes  characters  from  String  as  long  as  the  characters  are  members of set
              Characters or the complement of set  Characters.  Dir,  which  can  be  leading  or
              trailing, indicates from which direction characters are to be taken.

              Example:

              5> string:take("abc0z123", lists:seq($a,$z)).
              {"abc","0z123"}
              6> string:take(<<"abc0z123">>, lists:seq($0,$9), true, leading).
              {<<"abc">>,<<"0z123">>}
              7> string:take("abc0z123", lists:seq($0,$9), false, trailing).
              {"abc0z","123"}
              8> string:take(<<"abc0z123">>, lists:seq($a,$z), true, trailing).
              {<<"abc0z">>,<<"123">>}

       titlecase(String :: unicode:chardata()) -> unicode:chardata()

              Converts String to titlecase.

              Example:

              1> string:titlecase("ß is a SHARP s").
              "Ss is a SHARP s"

       to_float(String) -> {Float, Rest} | {error, Reason}

              Types:

                 String = unicode:chardata()
                 Float = float()
                 Rest = unicode:chardata()
                 Reason = no_float | badarg

              Argument  String  is  expected  to  start  with a valid text represented float (the
              digits are ASCII values). Remaining characters in the string after  the  float  are
              returned in Rest.

              Example:

              > {F1,Fs} = string:to_float("1.0-1.0e-1"),
              > {F2,[]} = string:to_float(Fs),
              > F1+F2.
              0.9
              > string:to_float("3/2=1.5").
              {error,no_float}
              > string:to_float("-1.5eX").
              {-1.5,"eX"}

       to_integer(String) -> {Int, Rest} | {error, Reason}

              Types:

                 String = unicode:chardata()
                 Int = integer()
                 Rest = unicode:chardata()
                 Reason = no_integer | badarg

              Argument  String  is  expected  to start with a valid text represented integer (the
              digits are ASCII values). Remaining characters in the string after the integer  are
              returned in Rest.

              Example:

              > {I1,Is} = string:to_integer("33+22"),
              > {I2,[]} = string:to_integer(Is),
              > I1-I2.
              11
              > string:to_integer("0.5").
              {0,".5"}
              > string:to_integer("x=2").
              {error,no_integer}

       to_graphemes(String :: unicode:chardata()) -> [grapheme_cluster()]

              Converts String to a list of grapheme clusters.

              Example:

              1> string:to_graphemes("ß↑e̊").
              [223,8593,[101,778]]
              2> string:to_graphemes(<<"ß↑e̊"/utf8>>).
              [223,8593,[101,778]]

       trim(String) -> unicode:chardata()

       trim(String, Dir) -> unicode:chardata()

       trim(String, Dir, Characters) -> unicode:chardata()

              Types:

                 String = unicode:chardata()
                 Dir = direction() | both
                 Characters = [grapheme_cluster()]

              Returns a string, where leading or trailing, or both, Characters have been removed.
              Dir which can be  leading,  trailing,  or  both,  indicates  from  which  direction
              characters are to be removed.

              Default  Characters  are  the set of nonbreakable whitespace codepoints, defined as
              Pattern_White_Space in Unicode Standard Annex #31. By default, Dir is both.

              Notice that [$\r,$\n] is one grapheme cluster according to the Unicode Standard.

              Example:

              1> string:trim("\t Hello \n").
              "Hello"
              2> string:trim(<<"\t Hello \n">>, leading).
              <<"Hello  \n">>
              3> string:trim(<<".Hello.\n">>, trailing, "\n.").
              <<".Hello">>

       uppercase(String :: unicode:chardata()) -> unicode:chardata()

              Converts String to uppercase.

              See also titlecase/1.

              Example:

              1> string:uppercase("Michał").
              "MICHAŁ"

OBSOLETE API FUNCTIONS

       Here follows the function of the old API. These functions only work on a list  of  Latin-1
       characters.

   Note:
       The  functions  are kept for backward compatibility, but are not recommended. They will be
       deprecated in Erlang/OTP 21.

       Any undocumented functions in string are not to be used.

EXPORTS

       centre(String, Number) -> Centered

       centre(String, Number, Character) -> Centered

              Types:

                 String = Centered = string()
                 Number = integer() >= 0
                 Character = char()

              Returns a string, where String is centered in the string and surrounded  by  blanks
              or Character. The resulting string has length Number.

              This function is obsolete. Use pad/3.

       chars(Character, Number) -> String

       chars(Character, Number, Tail) -> String

              Types:

                 Character = char()
                 Number = integer() >= 0
                 Tail = String = string()

              Returns  a string consisting of Number characters Character. Optionally, the string
              can end with string Tail.

              This function is obsolete. Use lists:duplicate/2.

       chr(String, Character) -> Index

              Types:

                 String = string()
                 Character = char()
                 Index = integer() >= 0

              Returns the index of the first occurrence of Character  in  String.  Returns  0  if
              Character does not occur.

              This function is obsolete. Use find/2.

       concat(String1, String2) -> String3

              Types:

                 String1 = String2 = String3 = string()

              Concatenates String1 and String2 to form a new string String3, which is returned.

              This  function  is  obsolete.  Use  [String1,  String2]  as Data argument, and call
              unicode:characters_to_list/2  or  unicode:characters_to_binary/2  to  flatten   the
              output.

       copies(String, Number) -> Copies

              Types:

                 String = Copies = string()
                 Number = integer() >= 0

              Returns a string containing String repeated Number times.

              This function is obsolete. Use lists:duplicate/2.

       cspan(String, Chars) -> Length

              Types:

                 String = Chars = string()
                 Length = integer() >= 0

              Returns  the  length  of  the  maximum  initial  segment  of String, which consists
              entirely of characters not from Chars.

              This function is obsolete. Use take/3.

              Example:

              > string:cspan("\t    abcdef", " \t").
              0

       join(StringList, Separator) -> String

              Types:

                 StringList = [string()]
                 Separator = String = string()

              Returns a string with the  elements  of  StringList  separated  by  the  string  in
              Separator.

              This function is obsolete. Use lists:join/2.

              Example:

              > join(["one", "two", "three"], ", ").
              "one, two, three"

       left(String, Number) -> Left

       left(String, Number, Character) -> Left

              Types:

                 String = Left = string()
                 Number = integer() >= 0
                 Character = char()

              Returns  String with the length adjusted in accordance with Number. The left margin
              is fixed. If length(String)  <  Number,  then  String  is  padded  with  blanks  or
              Characters.

              This function is obsolete. Use pad/2 or pad/3.

              Example:

              > string:left("Hello",10,$.).
              "Hello....."

       len(String) -> Length

              Types:

                 String = string()
                 Length = integer() >= 0

              Returns the number of characters in String.

              This function is obsolete. Use length/1.

       rchr(String, Character) -> Index

              Types:

                 String = string()
                 Character = char()
                 Index = integer() >= 0

              Returns  the  index  of  the  last  occurrence of Character in String. Returns 0 if
              Character does not occur.

              This function is obsolete. Use find/3.

       right(String, Number) -> Right

       right(String, Number, Character) -> Right

              Types:

                 String = Right = string()
                 Number = integer() >= 0
                 Character = char()

              Returns String with the length adjusted in accordance with Number. The right margin
              is  fixed. If the length of (String) < Number, then String is padded with blanks or
              Characters.

              This function is obsolete. Use pad/3.

              Example:

              > string:right("Hello", 10, $.).
              ".....Hello"

       rstr(String, SubString) -> Index

              Types:

                 String = SubString = string()
                 Index = integer() >= 0

              Returns the position where the last  occurrence  of  SubString  begins  in  String.
              Returns 0 if SubString does not exist in String.

              This function is obsolete. Use find/3.

              Example:

              > string:rstr(" Hello Hello World World ", "Hello World").
              8

       span(String, Chars) -> Length

              Types:

                 String = Chars = string()
                 Length = integer() >= 0

              Returns  the  length  of  the  maximum  initial  segment  of String, which consists
              entirely of characters from Chars.

              This function is obsolete. Use take/2.

              Example:

              > string:span("\t    abcdef", " \t").
              5

       str(String, SubString) -> Index

              Types:

                 String = SubString = string()
                 Index = integer() >= 0

              Returns the position where the first occurrence  of  SubString  begins  in  String.
              Returns 0 if SubString does not exist in String.

              This function is obsolete. Use find/2.

              Example:

              > string:str(" Hello Hello World World ", "Hello World").
              8

       strip(String :: string()) -> string()

       strip(String, Direction) -> Stripped

       strip(String, Direction, Character) -> Stripped

              Types:

                 String = Stripped = string()
                 Direction = left | right | both
                 Character = char()

              Returns  a  string,  where  leading  or  trailing,  or  both, blanks or a number of
              Character have been  removed.  Direction,  which  can  be  left,  right,  or  both,
              indicates  from  which direction blanks are to be removed. strip/1 is equivalent to
              strip(String, both).

              This function is obsolete. Use trim/3.

              Example:

              > string:strip("...Hello.....", both, $.).
              "Hello"

       sub_string(String, Start) -> SubString

       sub_string(String, Start, Stop) -> SubString

              Types:

                 String = SubString = string()
                 Start = Stop = integer() >= 1

              Returns a substring of String, starting at position Start to the end of the string,
              or to and including position Stop.

              This function is obsolete. Use slice/3.

              Example:

              sub_string("Hello World", 4, 8).
              "lo Wo"

       substr(String, Start) -> SubString

       substr(String, Start, Length) -> SubString

              Types:

                 String = SubString = string()
                 Start = integer() >= 1
                 Length = integer() >= 0

              Returns a substring of String, starting at position Start, and ending at the end of
              the string or at length Length.

              This function is obsolete. Use slice/3.

              Example:

              > substr("Hello World", 4, 5).
              "lo Wo"

       sub_word(String, Number) -> Word

       sub_word(String, Number, Character) -> Word

              Types:

                 String = Word = string()
                 Number = integer()
                 Character = char()

              Returns the word in position Number of String. Words are  separated  by  blanks  or
              Characters.

              This function is obsolete. Use nth_lexeme/3.

              Example:

              > string:sub_word(" Hello old boy !",3,$o).
              "ld b"

       to_lower(String) -> Result

       to_lower(Char) -> CharResult

       to_upper(String) -> Result

       to_upper(Char) -> CharResult

              Types:

                 String = Result = io_lib:latin1_string()
                 Char = CharResult = char()

              The  specified  string  or  character  is case-converted. Notice that the supported
              character set is ISO/IEC 8859-1 (also called Latin 1); all values outside this  set
              are unchanged

              This function is obsolete use lowercase/1, uppercase/1, titlecase/1 or casefold/1.

       tokens(String, SeparatorList) -> Tokens

              Types:

                 String = SeparatorList = string()
                 Tokens = [Token :: nonempty_string()]

              Returns a list of tokens in String, separated by the characters in SeparatorList.

              Example:

              > tokens("abc defxxghix jkl", "x ").
              ["abc", "def", "ghi", "jkl"]

              Notice that, as shown in this example, two or more adjacent separator characters in
              String are treated as one. That is, there are no empty  strings  in  the  resulting
              list of tokens.

              This function is obsolete. Use lexemes/2.

       words(String) -> Count

       words(String, Character) -> Count

              Types:

                 String = string()
                 Character = char()
                 Count = integer() >= 1

              Returns the number of words in String, separated by blanks or Character.

              This function is obsolete. Use lexemes/2.

              Example:

              > words(" Hello old boy!", $o).
              4

NOTES

       Some  of  the  general string functions can seem to overlap each other. The reason is that
       this string package is the combination of two earlier packages and all functions  of  both
       packages have been retained.