Ubuntu Manpage: binary - Library for handling binary data

Provided by: erlang-manpages_16.b.3-dfsg-1ubuntu2.2_all

NAME

       binary - Library for handling binary data

DESCRIPTION

       This  module  contains  functions  for  manipulating  byte-oriented binaries. Although the
       majority of functions could be implemented using bit-syntax, the functions in this library
       are  highly optimized and are expected to either execute faster or consume less memory (or
       both) than a counterpart written in pure Erlang.

       The module is implemented according to the EEP (Erlang Enhancement Proposal) 31.

   Note:
       The library handles byte-oriented data. Bitstrings that are not binaries (does not contain
       whole  octets  of  bits)  will  result  in a badarg exception being thrown from any of the
       functions in this module.

DATA TYPES

       cp()

              Opaque data-type representing a compiled search-pattern. Guaranteed to be a tuple()
              to allow programs to distinguish it from non precompiled search patterns.

       part() = {Start :: integer() >= 0, Length :: integer()}

              A representaion of a part (or range) in a binary. Start is a zero-based offset into
              a binary() and Length is the length of that part. As input  to  functions  in  this
              module,  a  reverse  part  specification  is  allowed,  constructed with a negative
              Length, so that the part of the binary begins at Start  +  Length  and  is  -Length
              long. This is useful for referencing the last N bytes of a binary as {size(Binary),
              -N}. The functions in this module always return part()'s with positive Length.

EXPORTS

       at(Subject, Pos) -> byte()

              Types:

                 Subject = binary()
                 Pos = integer() >= 0

              Returns the byte at position Pos (zero-based) in the binary Subject as an  integer.
              If Pos >= byte_size(Subject), a badarg exception is raised.

       bin_to_list(Subject) -> [byte()]

              Types:

                 Subject = binary()

              The same as bin_to_list(Subject,{0,byte_size(Subject)}).

       bin_to_list(Subject, PosLen) -> [byte()]

              Types:

                 Subject = binary()
                 PosLen = part()

              Converts Subject to a list of byte()s, each representing the value of one byte. The
              part() denotes which part of the binary() to convert. Example:

              1> binary:bin_to_list(<<"erlang">>,{1,3}).
              "rla"
              %% or [114,108,97] in list notation.

              If PosLen in any way references outside the binary, a badarg exception is raised.

       bin_to_list(Subject, Pos, Len) -> [byte()]

              Types:

                 Subject = binary()
                 Pos = integer() >= 0
                 Len = integer()

              The same as bin_to_list(Subject,{Pos,Len}).

       compile_pattern(Pattern) -> cp()

              Types:

                 Pattern = binary() | [binary()]

              Builds an internal structure representing a compilation of a search-pattern,  later
              to  be  used  in  the  match/3, matches/3, split/3 or replace/4 functions. The cp()
              returned is guaranteed to be a tuple() to allow programs to distinguish it from non
              pre-compiled search patterns

              When  a  list  of  binaries  is  given, it denotes a set of alternative binaries to
              search for. I.e if [<<"functional">>,<<"programming">>] is given as  Pattern,  this
              means  "either  <<"functional">>  or  <<"programming">>".  The  pattern is a set of
              alternatives; when only a single binary is given, the set has only one element. The
              order of alternatives in a pattern is not significant.

              The list of binaries used for search alternatives shall be flat and proper.

              If  Pattern  is  not  a binary or a flat proper list of binaries with length > 0, a
              badarg exception will be raised.

       copy(Subject) -> binary()

              Types:

                 Subject = binary()

              The same as copy(Subject, 1).

       copy(Subject, N) -> binary()

              Types:

                 Subject = binary()
                 N = integer() >= 0

              Creates a binary with the content of Subject duplicated N times.

              This function will always create a new binary, even if N = 1. By using copy/1 on  a
              binary referencing a larger binary, one might free up the larger binary for garbage
              collection.

          Note:
              By deliberately copying a single binary to avoid referencing a larger  binary,  one
              might, instead of freeing up the larger binary for later garbage collection, create
              much more binary data than needed. Sharing binary data is  usually  good.  Only  in
              special cases, when small parts reference large binaries and the large binaries are
              no longer used in any process, deliberate copying might be a good idea.

              If N < 0, a badarg exception is raised.

       decode_unsigned(Subject) -> Unsigned

              Types:

                 Subject = binary()
                 Unsigned = integer() >= 0

              The same as decode_unsigned(Subject, big).

       decode_unsigned(Subject, Endianess) -> Unsigned

              Types:

                 Subject = binary()
                 Endianess = big | little
                 Unsigned = integer() >= 0

              Converts the binary digit representation, in big or little endian,  of  a  positive
              integer in Subject to an Erlang integer().

              Example:

              1> binary:decode_unsigned(<<169,138,199>>,big).
              11111111

       encode_unsigned(Unsigned) -> binary()

              Types:

                 Unsigned = integer() >= 0

              The same as encode_unsigned(Unsigned, big).

       encode_unsigned(Unsigned, Endianess) -> binary()

              Types:

                 Unsigned = integer() >= 0
                 Endianess = big | little

              Converts  a  positive  integer  to the smallest possible representation in a binary
              digit representation, either big or little endian.

              Example:

              1> binary:encode_unsigned(11111111,big).
              <<169,138,199>>

       first(Subject) -> byte()

              Types:

                 Subject = binary()

              Returns the first byte of the binary Subject as an integer. If the size of  Subject
              is zero, a badarg exception is raised.

       last(Subject) -> byte()

              Types:

                 Subject = binary()

              Returns  the  last byte of the binary Subject as an integer. If the size of Subject
              is zero, a badarg exception is raised.

       list_to_bin(ByteList) -> binary()

              Types:

                 ByteList = iodata()

              Works exactly as erlang:list_to_binary/1, added for completeness.

       longest_common_prefix(Binaries) -> integer() >= 0

              Types:

                 Binaries = [binary()]

              Returns the length of the longest  common  prefix  of  the  binaries  in  the  list
              Binaries. Example:

              1> binary:longest_common_prefix([<<"erlang">>,<<"ergonomy">>]).
              2
              2> binary:longest_common_prefix([<<"erlang">>,<<"perl">>]).
              0

              If Binaries is not a flat list of binaries, a badarg exception is raised.

       longest_common_suffix(Binaries) -> integer() >= 0

              Types:

                 Binaries = [binary()]

              Returns  the  length  of  the  longest  common  suffix  of the binaries in the list
              Binaries. Example:

              1> binary:longest_common_suffix([<<"erlang">>,<<"fang">>]).
              3
              2> binary:longest_common_suffix([<<"erlang">>,<<"perl">>]).
              0

              If Binaries is not a flat list of binaries, a badarg exception is raised.

       match(Subject, Pattern) -> Found | nomatch

              Types:

                 Subject = binary()
                 Pattern = binary() | [binary()] | cp()
                 Found = part()

              The same as match(Subject, Pattern, []).

       match(Subject, Pattern, Options) -> Found | nomatch

              Types:

                 Subject = binary()
                 Pattern = binary() | [binary()] | cp()
                 Found = part()
                 Options = [Option]
                 Option = {scope, part()}
                 part() = {Start :: integer() >= 0, Length :: integer()}

              Searches for the first occurrence of Pattern in Subject and  returns  the  position
              and length.

              The  function  will  return {Pos, Length} for the binary in Pattern starting at the
              lowest position in Subject, Example:

              1> binary:match(<<"abcde">>, [<<"bcde">>,<<"cd">>],[]).
              {1,4}

              Even though <<"cd">>  ends  before  <<"bcde">>,  <<"bcde">>  begins  first  and  is
              therefore  the  first match. If two overlapping matches begin at the same position,
              the longest is returned.

              Summary of the options:

                {scope, {Start, Length}}:
                  Only the given part is searched. Return values  still  have  offsets  from  the
                  beginning  of  Subject.  A  negative Length is allowed as described in the DATA
                  TYPES section of this manual.

              If none of the strings in Pattern is found, the atom nomatch is returned.

              For a description of Pattern, see compile_pattern/1.

              If {scope, {Start,Length}} is given in the options such that Start is  larger  than
              the  size  of Subject, Start + Length is less than zero or Start + Length is larger
              than the size of Subject, a badarg exception is raised.

       matches(Subject, Pattern) -> Found

              Types:

                 Subject = binary()
                 Pattern = binary() | [binary()] | cp()
                 Found = [part()]

              The same as matches(Subject, Pattern, []).

       matches(Subject, Pattern, Options) -> Found

              Types:

                 Subject = binary()
                 Pattern = binary() | [binary()] | cp()
                 Found = [part()]
                 Options = [Option]
                 Option = {scope, part()}
                 part() = {Start :: integer() >= 0, Length :: integer()}

              Works like match/2, but the Subject is searched until exhausted and a list  of  all
              non-overlapping parts matching Pattern is returned (in order).

              The  first and longest match is preferred to a shorter, which is illustrated by the
              following example:

              1> binary:matches(<<"abcde">>,
                                [<<"bcde">>,<<"bc">>>,<<"de">>],[]).
              [{1,4}]

              The result shows that <<"bcde">> is selected instead of the shorter match  <<"bc">>
              (which  would have given raise to one more match,<<"de">>). This corresponds to the
              behavior of  posix  regular  expressions  (and  programs  like  awk),  but  is  not
              consistent  with  alternative  matches  in  re  (and  Perl),  where instead lexical
              ordering in the search pattern selects which string matches.

              If none of the strings in pattern is found, an empty list is returned.

              For a description of Pattern,  see  compile_pattern/1  and  for  a  description  of
              available options, see match/3.

              If  {scope,  {Start,Length}} is given in the options such that Start is larger than
              the size of Subject, Start + Length is less than zero or Start + Length  is  larger
              than the size of Subject, a badarg exception is raised.

       part(Subject, PosLen) -> binary()

              Types:

                 Subject = binary()
                 PosLen = part()

              Extracts the part of the binary Subject described by PosLen.

              Negative length can be used to extract bytes at the end of a binary:

              1> Bin = <<1,2,3,4,5,6,7,8,9,10>>.
              2> binary:part(Bin,{byte_size(Bin), -5}).
              <<6,7,8,9,10>>

          Note:
              part/2and  part/3  are  also  available  in  the  erlang  module  under  the  names
              binary_part/2 and binary_part/3. Those BIFs are allowed in guard tests.

              If PosLen in any way references outside the binary, a badarg exception is raised.

       part(Subject, Pos, Len) -> binary()

              Types:

                 Subject = binary()
                 Pos = integer() >= 0
                 Len = integer()

              The same as part(Subject, {Pos, Len}).

       referenced_byte_size(Binary) -> integer() >= 0

              Types:

                 Binary = binary()

              If a binary references a larger binary (often described as being a sub-binary),  it
              can be useful to get the size of the actual referenced binary. This function can be
              used in a program to trigger the use of copy/1. By  copying  a  binary,  one  might
              dereference  the  original,  possibly  large,  binary  which  a smaller binary is a
              reference to.

              Example:

              store(Binary, GBSet) ->
                NewBin =
                    case binary:referenced_byte_size(Binary) of
                        Large when Large > 2 * byte_size(Binary) ->
                           binary:copy(Binary);
                        _ ->
                           Binary
                    end,
                gb_sets:insert(NewBin,GBSet).

              In this example, we chose to copy the binary content before  inserting  it  in  the
              gb_set() if it references a binary more than twice the size of the data we're going
              to keep. Of course different  rules  for  when  copying  will  apply  to  different
              programs.

              Binary  sharing  will  occur  whenever  binaries  are  taken  apart,  this  is  the
              fundamental reason why binaries are fast, decomposition can  always  be  done  with
              O(1)  complexity.  In  rare circumstances this data sharing is however undesirable,
              why this function together with copy/1 might be useful when optimizing  for  memory
              use.

              Example of binary sharing:

              1> A = binary:copy(<<1>>,100).
              <<1,1,1,1,1 ...
              2> byte_size(A).
              100
              3> binary:referenced_byte_size(A)
              100
              4> <<_:10/binary,B:10/binary,_/binary>> = A.
              <<1,1,1,1,1 ...
              5> byte_size(B).
              10
              6> binary:referenced_byte_size(B)
              100

          Note:
              Binary  data  is  shared  among  processes. If another process still references the
              larger binary, copying the part this process uses only  consumes  more  memory  and
              will  not  free  up  the  larger  binary  for  garbage collection. Use this kind of
              intrusive functions with extreme care, and only if a real problem is detected.

       replace(Subject, Pattern, Replacement) -> Result

              Types:

                 Subject = binary()
                 Pattern = binary() | [binary()] | cp()
                 Replacement = Result = binary()

              The same as replace(Subject,Pattern,Replacement,[]).

       replace(Subject, Pattern, Replacement, Options) -> Result

              Types:

                 Subject = binary()
                 Pattern = binary() | [binary()] | cp()
                 Replacement = binary()
                 Options = [Option]
                 Option = global | {scope, part()} | {insert_replaced, InsPos}
                 InsPos = OnePos | [OnePos]
                 OnePos = integer() >= 0
                   An integer() =< byte_size(Replacement)
                 Result = binary()

              Constructs a new binary by replacing the parts in Subject matching Pattern with the
              content of Replacement.

              If  the  matching  sub-part  of  Subject  giving  raise to the replacement is to be
              inserted in the result,  the  option  {insert_replaced,  InsPos}  will  insert  the
              matching part into Replacement at the given position (or positions) before actually
              inserting Replacement into the Subject. Example:

              1> binary:replace(<<"abcde">>,<<"b">>,<<"[]">>,[{insert_replaced,1}]).
              <<"a[b]cde">>
              2> binary:replace(<<"abcde">>,[<<"b">>,<<"d">>],<<"[]">>,
                               [global,{insert_replaced,1}]).
              <<"a[b]c[d]e">>
              3> binary:replace(<<"abcde">>,[<<"b">>,<<"d">>],<<"[]">>,
                               [global,{insert_replaced,[1,1]}]).
              <<"a[bb]c[dd]e">>
              4> binary:replace(<<"abcde">>,[<<"b">>,<<"d">>],<<"[-]">>,
                               [global,{insert_replaced,[1,2]}]).
              <<"a[b-b]c[d-d]e">>

              If any position given in InsPos is greater than the size of the replacement binary,
              a badarg exception is raised.

              The  options  global  and  {scope,  part()} work as for split/3. The return type is
              always a binary().

              For a description of Pattern, see compile_pattern/1.

       split(Subject, Pattern) -> Parts

              Types:

                 Subject = binary()
                 Pattern = binary() | [binary()] | cp()
                 Parts = [binary()]

              The same as split(Subject, Pattern, []).

       split(Subject, Pattern, Options) -> Parts

              Types:

                 Subject = binary()
                 Pattern = binary() | [binary()] | cp()
                 Options = [Option]
                 Option = {scope, part()} | trim | global
                 Parts = [binary()]

              Splits Subject into a list of binaries based on Pattern. If the  option  global  is
              not  given,  only  the  first  occurrence of Pattern in Subject will give rise to a
              split.

              The parts of Pattern actually found in Subject are not included in the result.

              Example:

              1> binary:split(<<1,255,4,0,0,0,2,3>>, [<<0,0,0>>,<<2>>],[]).
              [<<1,255,4>>, <<2,3>>]
              2> binary:split(<<0,1,0,0,4,255,255,9>>, [<<0,0>>, <<255,255>>],[global]).
              [<<0,1>>,<<4>>,<<9>>]

              Summary of options:

                {scope, part()}:
                  Works as in match/3 and matches/3. Note that this only defines the scope of the
                  search  for  matching strings, it does not cut the binary before splitting. The
                  bytes before and after the scope will be kept in the result. See example below.

                trim:
                  Removes trailing empty parts of the result (as does trim in re:split/3)

                global:
                  Repeats the split until the  Subject  is  exhausted.  Conceptually  the  global
                  option  makes  split  work  on  the  positions  returned by matches/3, while it
                  normally works on the position returned by match/3.

              Example of the difference between a  scope  and  taking  the  binary  apart  before
              splitting:

              1> binary:split(<<"banana">>,[<<"a">>],[{scope,{2,3}}]).
              [<<"ban">>,<<"na">>]
              2> binary:split(binary:part(<<"banana">>,{2,3}),[<<"a">>],[]).
              [<<"n">>,<<"n">>]

              The return type is always a list of binaries that are all referencing Subject. This
              means that the data in Subject is not actually copied  to  new  binaries  and  that
              Subject  cannot  be  garbage collected until the results of the split are no longer
              referenced.

              For a description of Pattern, see compile_pattern/1.