Ubuntu Manpage: lmbench - benchmarking toolbox

Provided by: lmbench_3.0-a9+debian.1-6build3_amd64

NAME

       lmbench - benchmarking toolbox

SYNOPSIS

       #include ``lmbench.h''

       typedef u_long iter_t

       typedef (*benchmp_f)(iter_t iterations, void* cookie)

       void benchmp(benchmp_f initialize, benchmp_f benchmark, benchmp_f cleanup, int enough, int
       parallel, int warmup, int repetitions, void* cookie)

       uint64    get_n()

       void milli(char *s, uint64 n)

       void micro(char *s, uint64 n)

       void nano(char *s, uint64 n)

       void mb(uint64 bytes)

       void kb(uint64 bytes)

DESCRIPTION

       Creating benchmarks using the lmbench timing harness is easy.  Since  it  is  so  easy  to
       measure  performance using lmbench , it is possible to quickly answer questions that arise
       during system design, development, or tuning.  For example, image processing

       There are two attributes that are critical for performance,  latency  and  bandwidth,  and
       lmbench´s timing harness makes it easy to measure and report results for both.  Latency is
       usually important for frequently executed operations, and bandwidth is  usually  important
       when moving large chunks of data.

       There are a number of factors to consider when building benchmarks.

       The timing harness requires that the benchmarked operation be idempotent so that it can be
       repeated indefinitely.

       The timing subsystem, benchmp, is passed up to three function pointers.   Some  benchmarks
       may need as few as one function pointer (for benchmark).

       void benchmp(initialize,   benchmark,  cleanup,  enough,  parallel,  warmup,  repetitions,
       cookie)
              measures the performance of benchmark repeatedly and  reports  the  median  result.
              benchmp  creates  parallel  sub-processes  which  run  benchmark in parallel.  This
              allows lmbench to measure the system's ability to scale as  the  number  of  client
              processes  increases.   Each  sub-process  executes  initialize before starting the
              benchmarking cycle with iterations set to 0.  It will call initialize , benchmark ,
              and  cleanup  with  iterations  set  to the number of iterations in the timing loop
              several times in order to collect repetitions results.  The calls to benchmark  are
              surrounded  by  start  and  stop call to time the amount of time it takes to do the
              benchmarked operation iterations times.  After all the benchmark results have  been
              collected,  cleanup  is  called  with  iterations set to 0 to cleanup any resources
              which may have been allocated by initialize or benchmark.  cookie is a void pointer
              to  a  hunk  of  memory  that  can be used to store any parameters or state that is
              needed by the benchmark.

       void benchmp_getstate()
              returns a void pointer to the lmbench-internal state used during benchmarking.  The
              state is not to be used or accessed directly by clients, but rather would be passed
              into benchmp_interval.

       iter_t    benchmp_interval(void* state)
              returns the number of times the benchmark should execute its benchmark loop  during
              this  timing  interval.   This  is  used  only  for  weird  benchmarks which cannot
              implement the benchmark body in a function which can return, such as the page fault
              handler.  Please see lat_sig.c for sample usage.

       uint64    get_n()
              returns the number of times loop_body was executed during the timing interval.

       void milli(char *s, uint64 n)
              print  out  the time per operation in milli-seconds.  n is the number of operations
              during the timing interval, which is passed as a parameter because  each  loop_body
              can contain several operations.

       void micro(char *s, uint64 n)
              print the time per opertaion in micro-seconds.

       void nano(char *s, uint64 n)
              print the time per operation in nano-seconds.

       void mb(uint64 bytes)
              print the bandwidth in megabytes per second.

       void kb(uint64 bytes)
              print the bandwidth in kilobytes per second.

USING lmbench

       Here  is  an  example of a simple benchmark that measures the latency of the random number
       generator lrand48():

              #include ``lmbench.h''

              void
              benchmark_lrand48(iter_t iterations, void* cookie) {
                   while(iterations-- > 0)
                        lrand48();
              }

              int
              main(int argc, char *argv[])
              {
                   benchmp(NULL, benchmark_lrand48, NULL, 0, 1, 0, TRIES, NULL);
                   micro( lrand48()", get_n());"
                   exit(0);
              }

       Here is a simple benchmark that measures and reports the bandwidth of bcopy:

              #include ``lmbench.h''

              #define MB (1024 * 1024)
              #define SIZE (8 * MB)

              struct _state {
                   int size;
                   char* a;
                   char* b;
              };

              void
              initialize_bcopy(iter_t iterations, void* cookie) {
                   struct _state* state = (struct _state*)cookie;

                  if (!iterations) return;
                   state->a = malloc(state->size);
                   state->b = malloc(state->size);
                   if (state->a == NULL || state->b == NULL)
                        exit(1);
              }

              void
              benchmark_bcopy(iter_t iterations, void* cookie) {
                   struct _state* state = (struct _state*)cookie;

                   while(iterations-- > 0)
                        bcopy(state->a, state->b, state->size);
              }

              void
              cleanup_bcopy(iter_t iterations, void* cookie) {
                   struct _state* state = (struct _state*)cookie;

                  if (!iterations) return;
                   free(state->a);
                   free(state->b);
              }

              int
              main(int argc, char *argv[])
              {
                   struct _state state;

                   state.size = SIZE;
                   benchmp(initialize_bcopy, benchmark_bcopy, cleanup_bcopy,
                        0, 1, 0, TRIES, &state);
                   mb(get_n() * state.size);
                   exit(0);
              }

       A slightly more complex version of the  bcopy  benchmark  might  measure  bandwidth  as  a
       function  of  memory  size  and  parallelism.   The main procedure in this case might look
       something like this:

              int
              main(int argc, char *argv[])
              {
                   int  size, par;
                   struct _state state;

                   for (size = 64; size <= SIZE; size <<= 1) {
                        for (par = 1; par < 32; par <<= 1) {
                             state.size = size;
                             benchmp(initialize_bcopy, benchmark_bcopy,
                                  cleanup_bcopy, 0, par, 0, TRIES, &state);
                             fprintf(stderr, d%d
                             mb(par * get_n() * state.size);
                        }
                   }
                   exit(0);
              }

VARIABLES

       There are three environment variables that can  be  used  to  modify  the  lmbench  timing
       subsystem: ENOUGH, TIMING_O, and LOOP_O.

FUTURES

       Development of lmbench is continuing.

AUTHOR

       Carl Staelin and Larry McVoy

       Comments, suggestions, and bug reports are always welcome.

(c)1998-2000 Larry McVoy and Carl Staelin    $Date:$                                   LMBENCH(3)

NAME

SYNOPSIS

DESCRIPTION

USING lmbench

VARIABLES

FUTURES

SEE ALSO

AUTHOR