Provided by: erlang-manpages_25.2.3+dfsg-1_all bug

NAME

       uri_string - URI processing functions.

DESCRIPTION

       This  module  contains  functions  for  parsing  and  handling  URIs  (RFC 3986) and form-
       urlencoded query strings (HTML 5.2).

       Parsing and serializing non-UTF-8 form-urlencoded query strings are also  supported  (HTML
       5.0).

       A  URI  is  an  identifier consisting of a sequence of characters matching the syntax rule
       named URI in RFC 3986.

       The generic URI syntax consists of a hierarchical sequence of components  referred  to  as
       the scheme, authority, path, query, and fragment:

           URI         = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
           hier-part   = "//" authority path-abempty
                          / path-absolute
                          / path-rootless
                          / path-empty
           scheme      = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
           authority   = [ userinfo "@" ] host [ ":" port ]
           userinfo    = *( unreserved / pct-encoded / sub-delims / ":" )

           reserved    = gen-delims / sub-delims
           gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"
           sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
                       / "*" / "+" / "," / ";" / "="

           unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"

       The  interpretation  of  a  URI  depends  only on the characters used and not on how those
       characters are represented in a network protocol.

       The functions implemented by this module cover the following use cases:

         * Parsing URIs into its components and returing a map
           parse/1

         * Recomposing a map of URI components into a URI string
           recompose/1

         * Changing inbound binary and percent-encoding of URIs
           transcode/2

         * Transforming URIs into a normalized form
           normalize/1
           normalize/2

         * Composing form-urlencoded query strings from a list of key-value pairs
           compose_query/1
           compose_query/2

         * Dissecting form-urlencoded query strings into a list of key-value pairs
           dissect_query/1

         * Decoding percent-encoded triplets in URI map or a specific component of URI
           percent_decode/1

         * Preparing and retrieving application specific data included in URI components
           quote/1quote/2unquote/1

       There are four different encodings present during the handling of URIs:

         * Inbound binary encoding in binaries

         * Inbound percent-encoding in lists and binaries

         * Outbound binary encoding in binaries

         * Outbound percent-encoding in lists and binaries

       Functions with uri_string() argument accept lists, binaries and mixed  lists  (lists  with
       binary  elements)  as  input  type.  All of the functions but transcode/2 expects input as
       lists of unicode codepoints, UTF-8 encoded binaries and UTF-8  percent-encoded  URI  parts
       ("%C3%B6" corresponds to the unicode character "ö").

       Unless  otherwise  specified  the return value type and encoding are the same as the input
       type and encoding. That is, binary input returns binary output, list input returns a  list
       output but mixed input returns list output.

       In  case  of  lists  there  is  only  percent-encoding.  In binaries, however, both binary
       encoding and percent-encoding shall be  considered.  transcode/2  provides  the  means  to
       convert  between  the  supported  encodings, it takes a uri_string() and a list of options
       specifying inbound and outbound encodings.

       RFC 3986 does not mandate any specific character encoding and it is usually defined by the
       protocol  or surrounding text. This library takes the same assumption, binary and percent-
       encoding are handled as one configuration unit, they cannot be set to different values.

       Quoting functions are intended to be used by URI producing  application  during  component
       preparation  or retrieval phase to avoid conflicts between data and characters used in URI
       syntax. Quoting functions use percent encoding, but with different rules than for  example
       during  execution  of  recompose/1. It is user responsibility to provide quoting functions
       with application data only and using their output to combine an URI component.
       Quoting functions can for instance be used  for  constructing  a  path  component  with  a
       segment  containing  '/'  character  which  should  not  collide  with '/' used as general
       delimiter in path component.

DATA TYPES

       error() = {error, atom(), term()}

              Error tuple indicating the type of error. Possible values of the second component:

                * invalid_character

                * invalid_encoding

                * invalid_input

                * invalid_map

                * invalid_percent_encoding

                * invalid_scheme

                * invalid_uri

                * invalid_utf8

                * missing_value

              The third component is a term providing additional information about the  cause  of
              the error.

       uri_map() =
           #{fragment => unicode:chardata(),
             host => unicode:chardata(),
             path => unicode:chardata(),
             port => integer() >= 0 | undefined,
             query => unicode:chardata(),
             scheme => unicode:chardata(),
             userinfo => unicode:chardata()}

              Map holding the main components of a URI.

       uri_string() = iodata()

              List  of  unicode  codepoints,  a  UTF-8  encoded  binary,  or  a  mix  of the two,
              representing an RFC 3986 compliant URI (percent-encoded form). A URI is a  sequence
              of  characters  from  a  very limited set: the letters of the basic Latin alphabet,
              digits, and a few special characters.

EXPORTS

       allowed_characters() -> [{atom(), list()}]

              This is a utility function meant to be used in the shell for printing  the  allowed
              characters  in  each major URI component, and also in the most important characters
              sets. Please note that this function does not replace the ABNF rules defined by the
              standards,  these  character  sets  are  derived  directly from those aformentioned
              rules. For more  information  see  the  Uniform  Resource  Identifiers  chapter  in
              stdlib's Users Guide.

       compose_query(QueryList) -> QueryString

              Types:

                 QueryList = [{unicode:chardata(), unicode:chardata() | true}]
                 QueryString = uri_string() | error()

              Composes a form-urlencoded QueryString based on a QueryList, a list of non-percent-
              encoded key-value pairs. Form-urlencoding is defined in section  4.10.21.6  of  the
              HTML  5.2  specification and in section 4.10.22.6 of the HTML 5.0 specification for
              non-UTF-8 encodings.

              See also the opposite operation dissect_query/1.

              Example:

              1> uri_string:compose_query([{"foo bar","1"},{"city","örebro"}]).
              "foo+bar=1&city=%C3%B6rebro"
              2> uri_string:compose_query([{<<"foo bar">>,<<"1">>},
              2> {<<"city">>,<<"örebro"/utf8>>}]).
              <<"foo+bar=1&city=%C3%B6rebro">>

       compose_query(QueryList, Options) -> QueryString

              Types:

                 QueryList = [{unicode:chardata(), unicode:chardata() | true}]
                 Options = [{encoding, atom()}]
                 QueryString = uri_string() | error()

              Same as compose_query/1 but with an additional Options parameter, that controls the
              encoding  ("charset")  used  by  the  encoding  algorithm.  There are two supported
              encodings: utf8 (or unicode) and latin1.

              Each character in the entry's name and value that cannot  be  expressed  using  the
              selected  character  encoding,  is  replaced  by  a  string  consisting of a U+0026
              AMPERSAND character (&), a  "#"  (U+0023)  character,  one  or  more  ASCII  digits
              representing the Unicode code point of the character in base ten, and finally a ";"
              (U+003B) character.

              Bytes that are out of the range 0x2A, 0x2D, 0x2E, 0x30 to 0x39, 0x41 to 0x5A, 0x5F,
              0x61  to  0x7A,  are percent-encoded (U+0025 PERCENT SIGN character (%) followed by
              uppercase ASCII hex digits representing the hexadecimal value of the byte).

              See also the opposite operation dissect_query/1.

              Example:

              1> uri_string:compose_query([{"foo bar","1"},{"city","örebro"}],
              1> [{encoding, latin1}]).
              "foo+bar=1&city=%F6rebro"
              2> uri_string:compose_query([{<<"foo bar">>,<<"1">>},
              2> {<<"city">>,<<"東京"/utf8>>}], [{encoding, latin1}]).
              <<"foo+bar=1&city=%26%2326481%3B%26%2320140%3B">>

       dissect_query(QueryString) -> QueryList

              Types:

                 QueryString = uri_string()
                 QueryList =
                     [{unicode:chardata(), unicode:chardata() | true}] | error()

              Dissects an urlencoded QueryString and returns a QueryList, a list of  non-percent-
              encoded  key-value  pairs.  Form-urlencoding is defined in section 4.10.21.6 of the
              HTML 5.2 specification and in section 4.10.22.6 of the HTML 5.0  specification  for
              non-UTF-8 encodings.

              See also the opposite operation compose_query/1.

              Example:

              1> uri_string:dissect_query("foo+bar=1&city=%C3%B6rebro").
              [{"foo bar","1"},{"city","örebro"}]
              2> uri_string:dissect_query(<<"foo+bar=1&city=%26%2326481%3B%26%2320140%3B">>).
              [{<<"foo bar">>,<<"1">>},
               {<<"city">>,<<230,157,177,228,186,172>>}]

       normalize(URI) -> NormalizedURI

              Types:

                 URI = uri_string() | uri_map()
                 NormalizedURI = uri_string() | error()

              Transforms  an  URI  into  a  normalized  form  using Syntax-Based Normalization as
              defined by RFC 3986.

              This function implements case normalization, percent-encoding  normalization,  path
              segment normalization and scheme based normalization for HTTP(S) with basic support
              for FTP, SSH, SFTP and TFTP.

              Example:

              1> uri_string:normalize("/a/b/c/./../../g").
              "/a/g"
              2> uri_string:normalize(<<"mid/content=5/../6">>).
              <<"mid/6">>
              3> uri_string:normalize("http://localhost:80").
              "http://localhost/"
              4> uri_string:normalize(#{scheme => "http",port => 80,path => "/a/b/c/./../../g",
              4> host => "localhost-örebro"}).
              "http://localhost-%C3%B6rebro/a/g"

       normalize(URI, Options) -> NormalizedURI

              Types:

                 URI = uri_string() | uri_map()
                 Options = [return_map]
                 NormalizedURI = uri_string() | uri_map() | error()

              Same as normalize/1 but with an additional Options parameter, that controls whether
              the  normalized  URI  shall  be  returned  as  an uri_map(). There is one supported
              option: return_map.

              Example:

              1> uri_string:normalize("/a/b/c/./../../g", [return_map]).
              #{path => "/a/g"}
              2> uri_string:normalize(<<"mid/content=5/../6">>, [return_map]).
              #{path => <<"mid/6">>}
              3> uri_string:normalize("http://localhost:80", [return_map]).
              #{scheme => "http",path => "/",host => "localhost"}
              4> uri_string:normalize(#{scheme => "http",port => 80,path => "/a/b/c/./../../g",
              4> host => "localhost-örebro"}, [return_map]).
              #{scheme => "http",path => "/a/g",host => "localhost-örebro"}

       parse(URIString) -> URIMap

              Types:

                 URIString = uri_string()
                 URIMap = uri_map() | error()

              Parses an RFC 3986 compliant uri_string() into a uri_map(), that holds  the  parsed
              components of the URI. If parsing fails, an error tuple is returned.

              See also the opposite operation recompose/1.

              Example:

              1> uri_string:parse("foo://user@example.com:8042/over/there?name=ferret#nose").
              #{fragment => "nose",host => "example.com",
                path => "/over/there",port => 8042,query => "name=ferret",
                scheme => foo,userinfo => "user"}
              2> uri_string:parse(<<"foo://user@example.com:8042/over/there?name=ferret">>).
              #{host => <<"example.com">>,path => <<"/over/there">>,
                port => 8042,query => <<"name=ferret">>,scheme => <<"foo">>,
                userinfo => <<"user">>}

       percent_decode(URI) -> Result

              Types:

                 URI = uri_string() | uri_map()
                 Result =
                     uri_string() |
                     uri_map() |
                     {error, {invalid, {atom(), {term(), term()}}}}

              Decodes  all  percent-encoded triplets in the input that can be both a uri_string()
              and a uri_map(). Note, that this function performs raw decoding  and  it  shall  be
              used  on  already  parsed  URI  components.  Applying  this  function directly on a
              standard URI can effectively change it.

              If the input encoding is not UTF-8, an error tuple is returned.

              Example:

              1> uri_string:percent_decode(#{host => "localhost-%C3%B6rebro",path => [],
              1> scheme => "http"}).
              #{host => "localhost-örebro",path => [],scheme => "http"}
              2> uri_string:percent_decode(<<"%C3%B6rebro">>).
              <<"örebro"/utf8>>

          Warning:
              Using uri_string:percent_decode/1 directly on a  URI  is  not  safe.  This  example
              shows,  that  after  each consecutive application of the function the resulting URI
              will be changed. None of these URIs refer to the same resource.

              3> uri_string:percent_decode(<<"http://local%252Fhost/path">>).
              <<"http://local%2Fhost/path">>
              4> uri_string:percent_decode(<<"http://local%2Fhost/path">>).
              <<"http://local/host/path">>

       quote(Data) -> QuotedData

              Types:

                 Data = QuotedData = unicode:chardata()

              Replaces characters out of unreserved set with their percent encoded equivalents.

              Unreserved characters defined in RFC 3986 are not quoted.

              Example:

              1> uri_string:quote("SomeId/04").
              "SomeId%2F04"
              2> uri_string:quote(<<"SomeId/04">>).
              <<"SomeId%2F04">>

          Warning:
              Function is not aware about any URI component context and should  not  be  used  on
              whole  URI.  If  applied  more than once on the same data, might produce unexpected
              results.

       quote(Data, Safe) -> QuotedData

              Types:

                 Data = unicode:chardata()
                 Safe = string()
                 QuotedData = unicode:chardata()

              Same as quote/1, but Safe allows user  to  provide  a  list  of  characters  to  be
              protected from encoding.

              Example:

              1> uri_string:quote("SomeId/04", "/").
              "SomeId/04"
              2> uri_string:quote(<<"SomeId/04">>, "/").
              <<"SomeId/04">>

          Warning:
              Function  is  not  aware  about any URI component context and should not be used on
              whole URI. If applied more than once on the same  data,  might  produce  unexpected
              results.

       recompose(URIMap) -> URIString

              Types:

                 URIMap = uri_map()
                 URIString = uri_string() | error()

              Creates  an RFC 3986 compliant URIString (percent-encoded), based on the components
              of URIMap. If the URIMap is invalid, an error tuple is returned.

              See also the opposite operation parse/1.

              Example:

              1> URIMap = #{fragment => "nose", host => "example.com", path => "/over/there",
              1> port => 8042, query => "name=ferret", scheme => "foo", userinfo => "user"}.
              #{fragment => "nose",host => "example.com",
                path => "/over/there",port => 8042,query => "name=ferret",
                scheme => "foo",userinfo => "user"}

              2> uri_string:recompose(URIMap).
              "foo://example.com:8042/over/there?name=ferret#nose"

       resolve(RefURI, BaseURI) -> TargetURI

              Types:

                 RefURI = BaseURI = uri_string() | uri_map()
                 TargetURI = uri_string() | error()

              Convert a RefURI reference that might be relative to a  given  base  URI  into  the
              parsed  components  of the reference's target, which can then be recomposed to form
              the target URI.

              Example:

              1> uri_string:resolve("/abs/ol/ute", "http://localhost/a/b/c?q").
              "http://localhost/abs/ol/ute"
              2> uri_string:resolve("../relative", "http://localhost/a/b/c?q").
              "http://localhost/a/relative"
              3> uri_string:resolve("http://localhost/full", "http://localhost/a/b/c?q").
              "http://localhost/full"
              4> uri_string:resolve(#{path => "path", query => "xyz"}, "http://localhost/a/b/c?q").
              "http://localhost/a/b/path?xyz"

       resolve(RefURI, BaseURI, Options) -> TargetURI

              Types:

                 RefURI = BaseURI = uri_string() | uri_map()
                 Options = [return_map]
                 TargetURI = uri_string() | uri_map() | error()

              Same as resolve/2 but with an additional Options parameter, that  controls  whether
              the  target  URI  shall be returned as an uri_map(). There is one supported option:
              return_map.

              Example:

              1> uri_string:resolve("/abs/ol/ute", "http://localhost/a/b/c?q", [return_map]).
              #{host => "localhost",path => "/abs/ol/ute",scheme => "http"}
              2> uri_string:resolve(#{path => "/abs/ol/ute"}, #{scheme => "http",
              2> host => "localhost", path => "/a/b/c?q"}, [return_map]).
              #{host => "localhost",path => "/abs/ol/ute",scheme => "http"}

       transcode(URIString, Options) -> Result

              Types:

                 URIString = uri_string()
                 Options =
                     [{in_encoding, unicode:encoding()} |
                      {out_encoding, unicode:encoding()}]
                 Result = uri_string() | error()

              Transcodes an RFC 3986 compliant URIString, where  Options  is  a  list  of  tagged
              tuples, specifying the inbound (in_encoding) and outbound (out_encoding) encodings.
              in_encoding and out_encoding specifies both binary  encoding  and  percent-encoding
              for  the  input  and  output data. Mixed encoding, where binary encoding is not the
              same as percent-encoding, is not supported. If an argument  is  invalid,  an  error
              tuple is returned.

              Example:

              1> uri_string:transcode(<<"foo%00%00%00%F6bar"/utf32>>,
              1> [{in_encoding, utf32},{out_encoding, utf8}]).
              <<"foo%C3%B6bar"/utf8>>
              2> uri_string:transcode("foo%F6bar", [{in_encoding, latin1},
              2> {out_encoding, utf8}]).
              "foo%C3%B6bar"

       unquote(QuotedData) -> Data

              Types:

                 QuotedData = Data = unicode:chardata()

              Percent decode characters.

              Example:

              1> uri_string:unquote("SomeId%2F04").
              "SomeId/04"
              2> uri_string:unquote(<<"SomeId%2F04">>).
              <<"SomeId/04">>

          Warning:
              Function  is  not  aware  about any URI component context and should not be used on
              whole URI. If applied more than once on the same  data,  might  produce  unexpected
              results.