Provided by: lmbench_3.0-a9-1.1ubuntu0.1_amd64 bug

NAME

       lat_ctx - context switching benchmark

SYNOPSIS

       lat_ctx  [  -P  <parallelism>  ]  [  -W <warmups> ] [ -N <repetitions> ] [ -s <size_in_kbytes> ] #procs [
       #procs ...  ]

DESCRIPTION

       lat_ctx measures context switching time for any reasonable number of processes of  any  reasonable  size.
       The  processes are connected in a ring of Unix pipes.  Each process reads a token from its pipe, possibly
       does some work, and then writes the token to the next process.

       Processes may vary in number.  Smaller numbers of processes result in faster context switches.  More than
       20 processes is not supported.

       Processes  may  vary  in  size.  A size of zero is the baseline process that does nothing except pass the
       token on to the next process.  A process size of greater than zero means that the process does some  work
       before  passing on the token.  The work is simulated as the summing up of an array of the specified size.
       The summing is an unrolled loop of about a 2.7 thousand instructions.

       The effect is that both the data and the instruction cache get polluted by some amount before  the  token
       is passed on.  The data cache gets polluted by approximately the process ``size''.  The instruction cache
       gets polluted by a constant amount, approximately 2.7 thousand instructions.

       The pollution of the caches results in larger context switching times for the larger processes.  This may
       be confusing because the benchmark takes pains to measure only the context switch time, not including the
       overhead of doing the work.  The subtle point is that the overhead is measured using hot caches.  As  the
       number  and  size  of  the  processes  increases,  the caches are more and more polluted until the set of
       processes do not fit.  The context switch times go up because a context switch is defined as  the  switch
       time  plus the time it takes to restore all of the process state, including cache state.  This means that
       the switch includes the time for the cache misses on larger processes.

OUTPUT

       Output format is intended as input to xgraph or some similar program.  The  format  is  multi  line,  the
       first  line  is  a  title  that  specifies the size and non-context switching overhead of the test.  Each
       subsequent line is a pair of numbers that indicates the number of processes and the  cost  of  a  context
       switch.   The overhead and the context switch times are in micro second units.  The numbers below are for
       a SPARCstation 2.

       "size=0 ovr=179
       2 71
       4 104
       8 134
       16 333
       20 438

BUGS

       The numbers produced by this benchmark are somewhat inaccurate; they vary by about 10 to 15% from run  to
       run.   A  series  of  runs  may  be  done and the lowest numbers reported.  The lower the number the more
       accurate the results.

       The reasons for the inaccuracies are possibly interaction between the VM system and the processor caches.
       It  is  possible  that sometimes the benchmark processes are laid out in memory such that there are fewer
       TLB/cache conflicts than other times.  This is pure speculation on our part.

ACKNOWLEDGEMENT

       Funding for the development of this tool was provided by Sun Microsystems Computer Corporation.

SEE ALSO

       lmbench(8).

AUTHOR

       Carl Staelin and Larry McVoy

       Comments, suggestions, and bug reports are always welcome.

(c)1994-2000 Carl Staelin and Larry McVoy            $Date$                                           LAT_CTX(8)