Provided by: lmbench-doc_3.0-a9-1_all bug


       lmbench - benchmarking toolbox


       #include ``lmbench.h''

       typedef u_long iter_t

       typedef (*benchmp_f)(iter_t iterations, void* cookie)

       void benchmp(benchmp_f   initialize,   benchmp_f  benchmark,  benchmp_f
       cleanup, int enough, int parallel, int warmup, int  repetitions,  void*

       uint64    get_n()

       void milli(char *s, uint64 n)

       void micro(char *s, uint64 n)

       void nano(char *s, uint64 n)

       void mb(uint64 bytes)

       void kb(uint64 bytes)


       Creating benchmarks using the lmbench timing harness is easy.  Since it
       is so easy to measure performance using lmbench ,  it  is  possible  to
       quickly  answer questions that arise during system design, development,
       or tuning.  For example, image processing

       There are two attributes that are critical for performance, latency and
       bandwidth,  and  lmbench´s  timing harness makes it easy to measure and
       report results for both.  Latency is usually important  for  frequently
       executed  operations,  and  bandwidth  is usually important when moving
       large chunks of data.

       There are a number of factors to consider when building benchmarks.

       The  timing  harness  requires  that  the  benchmarked   operation   be
       idempotent so that it can be repeated indefinitely.

       The timing subsystem, benchmp, is passed up to three function pointers.
       Some  benchmarks  may  need  as  few  as  one  function  pointer   (for

       void benchmp(initialize,  benchmark, cleanup, enough, parallel, warmup,
       repetitions, cookie)
              measures the performance of benchmark repeatedly and reports the
              median result.  benchmp creates parallel sub-processes which run
              benchmark in parallel.   This  allows  lmbench  to  measure  the
              system's  ability  to  scale  as  the number of client processes
              increases.  Each sub-process executes initialize before starting
              the  benchmarking  cycle with iterations set to 0.  It will call
              initialize , benchmark , and cleanup with iterations set to  the
              number  of  iterations in the timing loop several times in order
              to collect repetitions results.   The  calls  to  benchmark  are
              surrounded  by start and stop call to time the amount of time it
              takes to do the benchmarked operation iterations  times.   After
              all the benchmark results have been collected, cleanup is called
              with iterations set to 0 to cleanup any resources which may have
              been  allocated  by  initialize  or benchmark.  cookie is a void
              pointer to a hunk of memory  that  can  be  used  to  store  any
              parameters or state that is needed by the benchmark.

       void benchmp_getstate()
              returns a void pointer to the lmbench-internal state used during
              benchmarking.  The state is not to be used or accessed  directly
              by clients, but rather would be passed into benchmp_interval.

       iter_t    benchmp_interval(void* state)
              returns  the  number  of  times the benchmark should execute its
              benchmark loop during this timing interval.  This is  used  only
              for  weird  benchmarks which cannot implement the benchmark body
              in a function which can return, such as the page fault  handler.
              Please see lat_sig.c for sample usage.

       uint64    get_n()
              returns  the  number  of times loop_body was executed during the
              timing interval.

       void milli(char *s, uint64 n)
              print out the time per operation in  milli-seconds.   n  is  the
              number of operations during the timing interval, which is passed
              as a  parameter  because  each  loop_body  can  contain  several

       void micro(char *s, uint64 n)
              print the time per opertaion in micro-seconds.

       void nano(char *s, uint64 n)
              print the time per operation in nano-seconds.

       void mb(uint64 bytes)
              print the bandwidth in megabytes per second.

       void kb(uint64 bytes)
              print the bandwidth in kilobytes per second.

USING lmbench

       Here  is  an example of a simple benchmark that measures the latency of
       the random number generator lrand48():

              #include ``lmbench.h''

              benchmark_lrand48(iter_t iterations, void* cookie) {
                   while(iterations-- > 0)

              main(int argc, char *argv[])
                   benchmp(NULL, benchmark_lrand48,  NULL,  0,  1,  0,  TRIES,
                   micro( lrand48()", get_n());"

       Here  is  a simple benchmark that measures and reports the bandwidth of

              #include ``lmbench.h''

              #define MB (1024 * 1024)
              #define SIZE (8 * MB)

              struct _state {
                   int size;
                   char* a;
                   char* b;

              initialize_bcopy(iter_t iterations, void* cookie) {
                   struct _state* state = (struct _state*)cookie;

                  if (!iterations) return;
                   state->a = malloc(state->size);
                   state->b = malloc(state->size);
                   if (state->a == NULL || state->b == NULL)

              benchmark_bcopy(iter_t iterations, void* cookie) {
                   struct _state* state = (struct _state*)cookie;

                   while(iterations-- > 0)
                        bcopy(state->a, state->b, state->size);

              cleanup_bcopy(iter_t iterations, void* cookie) {
                   struct _state* state = (struct _state*)cookie;

                  if (!iterations) return;

              main(int argc, char *argv[])
                   struct _state state;

                   state.size = SIZE;
                   benchmp(initialize_bcopy, benchmark_bcopy, cleanup_bcopy,
                        0, 1, 0, TRIES, &state);
                   mb(get_n() * state.size);

       A slightly more complex version of the bcopy  benchmark  might  measure
       bandwidth  as  a  function  of  memory  size and parallelism.  The main
       procedure in this case might look something like this:

              main(int argc, char *argv[])
                   int  size, par;
                   struct _state state;

                   for (size = 64; size <= SIZE; size <<= 1) {
                        for (par = 1; par < 32; par <<= 1) {
                             state.size = size;
                             benchmp(initialize_bcopy, benchmark_bcopy,
                                  cleanup_bcopy, 0, par, 0, TRIES, &state);
                             fprintf(stderr, d%d
                             mb(par * get_n() * state.size);


       There are three environment variables that can be used  to  modify  the
       lmbench timing subsystem: ENOUGH, TIMING_O, and LOOP_O.


       Development of lmbench is continuing.


       lmbench(8), timing(3), reporting(3), results(3).


       Carl Staelin and Larry McVoy

       Comments, suggestions, and bug reports are always welcome.

(c)1998-2000 Larry McVoy and Carl St$Date:$                         LMBENCH(3)