Provided by: lmbench_3.0-a9+debian.1-6_amd64 bug


       lat_ctx - context switching benchmark


       lat_ctx [ -P <parallelism> ] [ -W <warmups> ] [ -N <repetitions> ] [ -s <size_in_kbytes> ]
       #procs [ #procs ...  ]


       lat_ctx measures context switching time for any reasonable  number  of  processes  of  any
       reasonable size.  The processes are connected in a ring of Unix pipes.  Each process reads
       a token from its pipe, possibly does some work, and then writes  the  token  to  the  next

       Processes  may  vary  in  number.   Smaller  numbers of processes result in faster context
       switches.  More than 20 processes is not supported.

       Processes may vary in size.  A size of zero is the  baseline  process  that  does  nothing
       except  pass  the token on to the next process.  A process size of greater than zero means
       that the process does some work before passing on the token.  The work is simulated as the
       summing  up of an array of the specified size.  The summing is an unrolled loop of about a
       2.7 thousand instructions.

       The effect is that both the data and the instruction cache get  polluted  by  some  amount
       before  the token is passed on.  The data cache gets polluted by approximately the process
       ``size''.  The instruction cache gets polluted by a  constant  amount,  approximately  2.7
       thousand instructions.

       The  pollution  of  the  caches  results  in larger context switching times for the larger
       processes.  This may be confusing because the benchmark takes pains to  measure  only  the
       context  switch  time,  not including the overhead of doing the work.  The subtle point is
       that the overhead is measured using hot caches.  As the number and size of  the  processes
       increases,  the  caches  are more and more polluted until the set of processes do not fit.
       The context switch times go up because a context switch is defined as the switch time plus
       the  time it takes to restore all of the process state, including cache state.  This means
       that the switch includes the time for the cache misses on larger processes.


       Output format is intended as input to xgraph or some similar program.  The format is multi
       line, the first line is a title that specifies the size and non-context switching overhead
       of the test.  Each subsequent line is a pair of  numbers  that  indicates  the  number  of
       processes and the cost of a context switch.  The overhead and the context switch times are
       in micro second units.  The numbers below are for a SPARCstation 2.

       "size=0 ovr=179
       2 71
       4 104
       8 134
       16 333
       20 438


       The numbers produced by this benchmark are somewhat inaccurate; they vary by about  10  to
       15%  from  run to run.  A series of runs may be done and the lowest numbers reported.  The
       lower the number the more accurate the results.

       The reasons for the inaccuracies are possibly interaction between the VM  system  and  the
       processor  caches.   It is possible that sometimes the benchmark processes are laid out in
       memory such that there are fewer TLB/cache conflicts  than  other  times.   This  is  pure
       speculation on our part.


       Funding  for  the  development  of  this  tool  was  provided by Sun Microsystems Computer




       Carl Staelin and Larry McVoy

       Comments, suggestions, and bug reports are always welcome.

(c)1994-2000 Carl Staelin and Larry McVoy     $Date$                                   LAT_CTX(8)