Provided by: lmbench_3.0-a9-1_amd64 bug

NAME

       bw_mem - time memory bandwidth

SYNOPSIS

       bw_mem_cp   [   -P   <parallelism>   ]  [  -W  <warmups>  ]  [  -N  <repetitions>  ]  size
       rd|wr|rdwr|cp|fwr|frd|bzero|bcopy [align]

DESCRIPTION

       bw_mem allocates twice the specified amount of  memory,  zeros  it,  and  then  times  the
       copying of the first half to the second half.  Results are reported in megabytes moved per
       second.

       The size specification may end with ``k'' or ``m'' to mean kilobytes (* 1024) or megabytes
       (* 1024 * 1024).

OUTPUT

       Output format is "%0.2f %.2f\n", megabytes, megabytes_per_second, i.e.,

       8.00 25.33

       There  are  nine  different  memory  benchmarks  in  bw_mem.   They  each measure slightly
       different methods for reading, writing or copying data.

       rd     measures the time to read data into the processor.  It computes the sum of an array
              of integer values.  It accesses every fourth word.

       wr     measures  the  time  to  write data to memory.  It assigns a constant value to each
              memory of an array of integer values.  It accesses every fourth word.

       rdwr   measures the time to read data into memory and then write data to the  same  memory
              location.   For each element in an array it adds the current value to a running sum
              before assigning a new (constant) value to the element.  It accesses  every  fourth
              word.

       cp     measures  the  time  to  copy  data from one location to another.  It does an array
              copy: dest[i] = source[i].  It accesses every fourth word.

       frd    measures the time to read data into the processor.  It computes the sum of an array
              of integer values.

       fwr    measures  the  time  to  write data to memory.  It assigns a constant value to each
              memory of an array of integer values.

       fcp    measures the time to copy data from one location to  another.   It  does  an  array
              copy: dest[i] = source[i].

       bzero  measures how fast the system can bzero memory.

       bcopy  measures how fast the system can bcopy data.

MEMORY UTILIZATION

       This  benchmark can move up to three times the requested memory.  Bcopy will use 2-3 times
       as much memory bandwidth:  there  is  one  read  from  the  source  and  a  write  to  the
       destionation.  The write usually results in a cache line read and then a write back of the
       cache line at some later point.  Memory  utilization  might  be  reduced  by  1/3  if  the
       processor   architecture   implemented  ``load  cache  line''  and  ``store  cache  line''
       instructions (as well as ``getcachelinesize'').

SEE ALSO

       lmbench(8).

AUTHOR

       Carl Staelin and Larry McVoy

       Comments, suggestions, and bug reports are always welcome.

(c)1994-2000 Larry McVoy and Carl Staelin     $Date$                                    BW_MEM(8)