Provided by: collectl-utils_4.7.1-1_all bug


       colmux  -  multiplex  communications  to  multiple  systems running collectl from a single


       colmux [-command "collectl-switches... [-p filespec]]"  [-address  addr1[,addr2,...]|-addr
       filename] [-cols col1[,col2...]] | [-column num]


       This  utility  gathers up data generated by collectl from multiple systems and multiplexes
       it into a single consolidated format.  It runs in essentialy 2 distinct modes,  the  first
       is  known  as real-time, because data is retrieved and displayed in real time.  The second
       is playback mode because data is played back from existing collectl data files.

       There are also 2 general formats for the data being displayed  and  one  is  a  multi-line
       display  in  which  the  data  is  displayed in the native form that collectl displays it,
       except it is sorted by a distint column, essentially allowing one to see the TOP producers
       of  that  data.  The  second format is a single line display in which one or more distinct
       data elements from each source is displayed on the same line.  This latter format is never
       sorted, but rather positionally organized by the name of the system that generated it.

       Collectl  will  be then be executed, using any optional switches specified by -command, on
       each of the systems specified by -address OR read those  addresses  from  a  file  it  the
       target  of that switch is a filename rather than a list of hosts OR on the local system if
       -address is not specified.  See collectl for details of the  various  switches.   In  some
       cases  certain collectl switches will not make sense in a colmux environment and if chosen
       will generate an error.  Further, if hosts are specified with -address, they should  be  a
       individual  addresses  or  hostnames  separated by commas.  In turn, any of them can be in
       what those familiar with pdsh would recognize as -w format.

       Colmux will then execute the collectl command, gather the results from all sources  for  a
       particular  interval  and display them one result per line, sorted by the specified column
       OR all on the same line in groups specified by -cols.  The number of  lines  displayed  is
       set  to  the size of the terminal window by default, but can be changed using -lines.  The
       one exception is the use of -nosort  which  only  applies  to  the  playback  of  existing
       collectl  raw files.  In this mode all records for a particular interval will be displayed
       and the sorting bypassed, making this a speedy and convenient mechanism for gathering  all
       data from all systems in one place for potential further processing.

       Colmux  will  never  modify  the size of the terminal window so to see more or wider lines
       either expand the window or override the number of display lines and run it again.  If the
       number  display  lines is set greater then the terminal height or 0, colmux will no longer
       overlay the previous window and simply run in a continuous scrolling mode.

       Common Switches

       -address list|pdsh|filename
              Specify any combination of addresses as  hostnames  OR  in  pdsh  -w  format  OR  a
              filename  containg  a  list  of  hostnames/addresses,  1  per  line.  You MUST have
              passwordless ssh access to these nodes.  If a differnt  username  is  required,  be
              sure  to  specify  addresses in username@host format noting you do not have to have
              the same username on each host.  If specified, these usernames will override  those
              specified with the -username switch.  rsh access is not supported.

       -command switches
              One  can  specify virtally any collectl command here, both in real-time or playback
              mode.  Some switches may only be used during one mode or the other and colmux  will
              usually  let  you  know  if  you  specify  an  invalid  combination or an otherwise
              restricted switch.  Only those directly affecting colmux are listed below:

              --from, --thru
                     Limit the timeframe for data being played back, noting you can include  both
                     the  from  and thru times with the --from switch if you separate then with a

              -o time-format
                     This is a "magic" switch in that it not only tells collectl how  to  display
                     dates/times  (no  other options are permitted using -o other than those from
                     the set [dDTm]), it also tells colmux how to display dates/times too.

                     In single line mode, the timestamp will either come from the host system  in
                     real-time  mode  OR  the  first host when run in playback mode.  This is the
                     most common use/need for this switch.

                     In real-time/top mode this switch is not allowed since colmux simply reports
                     the current time of the system it is running on.

                     When  playing  back data multi-line formatted data from one or more files, a
                     timestamp for each interval is reported, consisting  of  the  time  of  that
                     interval.   When  this  switch is included, each line will be tagged with an
                     appropriate timestamp since on rare occasions they may not  necessarily  all
                     be identical.

              -p playback-file
                     This  switch  tells  colmux  to  run  in playback mode.  The filename should
                     include the directory location and is usually  specified  with  wild  cards,
                     limiting  the  selected file(s) to a specific date.  When those files are on
                     the same host (-address is not specified), they may be for  multiple  hosts,
                     but  when  the files are on remote hosts they must all be for be that unique
                     host.  If the file specification includes the string TODAY or YESTERDAY they
                     will be replaced with *yyyymmdd* for that date.

                     Run  collectl  in  plot-format.  This allows one to specificy just about any
                     combination of subsystems since all data is always  displayed  on  a  single
                     line.   However, due to the lack of formatting, this also makes no sense for
                     multi-line displays and is therefore only supported in single-line format.

              Show a brief help message and exit.

       -hostwidth n
              By default, colmux set the hostwidth to 8, unless it sees something wider  and  for
              more  situations  this is sufficient.  However, if one specifies hostnames that are
              aliases of the longer hostname, colmux has no way of knowing the  real  hostlengths
              until  after  it starts receiving data from collectl and the formatting will be off
              if the hostnames are longer than the default.  To overcome this problem,  use  this
              switch to force the hostname to be this size.

              Change the number of lines that are displayed for each interval in multi-line mode.
              The default for will determined by the terminal size returned by the  linux  resize
              command  if present.  If that command is not present the size will be initially set
              to 24.  If -lines is greater than the terminal size or 0,  top-like  behavior  will
              not be used when in real-time mode.

              In  single-line  format  format this controls the number of lines displayed between
              headers.  A value of 0 will only display the header one time.

              Sometimes a remote version of collectl is already using the default  socket.   This
              allows one to start another instance and override that value.

              This  tells  colmux  to execute the specified collectl command either locally or on
              the first remote system specified by -address, print the associated header with the
              selected  column(s)  highlighted  and  also include each column name along with its
              ordinal number, making it fairly easy  to  make  sure  you've  selected  the  right

       -username name
              Use this username for ALL ssh commands.  It can be overridden for specific hosts by
              specifying them with the -address switch with the desired hostnames.

              Display the version and exit.  It will also report if  Term::ReadKey  is  installed
              and if so what its version number is.

       Playback Mode Specific

       The  following  additional  switches  only apply to playback mode.  There are no real-time
       mode specific switches.

       -delay seconds
              Introduce a delay between interval in seconds.  You can specify fractional valuess.
              Not  using  this  switch will cause the output to be displayed as fast as it can be

              Move the cursor to the home position (upper left-hand corner) of the display to use
              a top-like display format.

       -hostfilter addr[,addr]
              When  playing  back  files for multiple hosts on the local system, sometimes you do
              not want to play back ALL the host files.  This filter allows you to  specify  only
              those  hosts  which  you  want  to process.  The format of the list of addresses is
              specified in the same way as -address except that you cannot specify a filename.

              Intended primarily for output that would be redirected to a file, do  not  sort  or
              include any escape sequences in the output.

       Multi-Line Format

              When there is more output then will fit on the screen, colmux includes the text:
                     Displaying: lines xx thru yy out of zz
              on the right-side of the top line of the display, where xx is typically 1.

              However,  once  colmux  is  running, one might want to look at subsequent lines, ie
              those below the bottom of the screen  and  therefore  invisible.   If  the  ReadKey
              modues  is  installed, one can simply use the PageDown key to move down the display
              and the PageUp key to move in the other direction.  If ReadKey  is  not  installed,
              typing  the multi-key sequences pd<ENTER> or pu<ENTER> will cause the same thing to

       -column num
              Set the sort column to this number.  The column  numbering  is  determined  by  the
              columns  returned  by  collectl for the requested command.  Since date/time columns
              are optional for non-plot data, their inclusion will change the  numbering  of  the
              columns  so  if  you are not sure you selected the correct column, you should first
              execute your command with -test included.

              You can also change the column number interactively with the RIGHT/LEFT arrow  keys
              IF  the  ReadKey  module  is  installed  (see colmux -version) OR simply type it in
              followed by the <ENTER> key.

              Do not highlight the selected column.  This may be useful when  redirecting  output
              to a file and you do not want the associated escape squences to be written to it.

              Reverse  the  default  sort  order.   You can also change the direction of the sort
              interactively with the UP/DOWN arrow keys IF the ReadKey module is  installed  (see
              colmux -version)
               OR simply type the r key and <ENTER>.

              Do  not  display  any  rows  with  0  in  the  sort  column.   You  can  also  type

       Single-Line Format

              Divide each column by 1000 before display

              Divide each column by 1024 before display

       -cols nums,...
              Group all data together for each host by column number(s).  As  with  -column,  you
              can confirm the correct column(s) have been selected by first running with -test.

              Do not show data for individual hosts, just display the totals.

              Include the totals for each column to the right.

              Set  the  output columns to this width, typically used in conjunction with -col1000
              or colk to allow more hosts to fit onto the same line.  It can also be used if  the
              host  names  are  too  narrow for column headers and you have room to display wider

       Exception Reporting Specific

       In single-line format, rather than wait for all hosts to report their data, colmux  simply
       reports  the  last data seen when the time to generate a line of output has come.  In most
       cases, these do reflect the most recent data values but in times of load, the data may  be
       late  getting  to colmux and so a previous value may be reported.  If the age of that data
       exceeds a defined number of intervals, the default is currently 2, an exception value will
       be  reported  of  -1.   At other times it has been seen where kernel/driver bugs may cause
       incorrect values to be reported as negative numbers and those values are also reported  as
       -1.  Both the age and exception values can be changed with the following switches.

       -age number
              When intially starting up and all hosts have not yet reported any data, colmux will
              display a -1 to indicate no data has been seen yet.  If during  processing  a  host
              fails  to  report in -age intervals, the default is 2, colmux will also report a -1
              indicating the data is stale.

       -negdataval val
              In some cases, there could be erroneous data reported as negative  numbers  (though
              sometimes  negative  numbers  are  valid).   When  specified,  replace any negative
              numbers with this value.

       -nodataval val
              This switch allows you to change the -1 that is normally reported  for  missing  or
              stale data to the specified value, most commonly 0.


       The  following  switches  are intended more for diagnostic purposes than normal operation,
       though are also worth using on appropriate occasions.

       -debug val
              This switch is for generating diagnostic information  at  various  levels.   It  is
              actually  a  bit  mask,  whose values are listed in the beginning on colmux itself.
              Perhaps the most useful value is 1 as it will  cause  colmux  to  display  all  the
              remote  commands  issues  to  each  host  in  the address list and can often reveal
              problems when things don't seem to be working correctly

              This switch was initially included in an earlier version when remote host  checking
              was  causing  problem  in some cases and by skipping those checks, colmux would run
              more reliably.  While it is felt that as of V3.2.0 these  reashability  checks  are
              now reliable and should not be skipped, this switch has been left in place.

              By  default  and  when  -nocheck  not  specified, colmux checks the versions of all
              collectl instances against that of the first node found to be running collectl  and
              if different, reports the mismatch.  This switch suppresses that warning.

              By  default,  when  a node is found to not be reachable, colmux will remove it from
              its list of hosts and continue execution.  This switch will  tell  colmux  to  exit
              when all hosts are not reachable.


       There are 2 switches whose descriptions don't really fit anywhere else:

       -colbin path
              On  rare  occasions, such as testing a patch to collectl in a copy NOT in /usr/bin,
              you may want to tell colmux to use that copy instead of the standard one.  Use this
              switch  to  point  to that copy.  Naturally that copy must exist in that loction on
              all systems.

       -keepalive secs
              Colmux uses ssh to start collectl on each remote machine  and  then  communications
              between  collectl  and  colmux occur over a socket.  Normally, ssh is configured to
              timeout after an interval of inactivity, such as 30 minutes, which  means  a  long-
              running  colmux  session  will  begin  to  lose  connections  when this interval is
              reached.  By specifying a keepalive interval, you're telling  the  ssh  to  send  a
              periodic keepalive to the other end so that connection doesn't get dropped.

       -timeout secs
              By  default,  collectl  waits  up to 10 seconds for remote instances of collectl to
              connect back.  On slower networks or when a very large  number  of  instances  have
              been  started, they may fail to connect back in time.  This switch will extend that
              timeout, but it also requires collectl V3.6.4 be used because  earlier  version  do
              not support this feature.


       Users  of  Version  2 will find this to look like a new utility though in actuality only a
       couple of enhancements have been made to the functionality, which include:

       sorting of multi-line data

       Rather than simply report all the data for all hosts specfied, something  ver  few  people
       actually  used,  only  the  top-n  hosts  will now have their data reported, sorted by the
       column specified by -column.

       ability to playback data from collectl files

       Simply add -p to the collectl command and the associted file(s) for the same day  will  be
       played back and the data reported in either multi- or single-line format.

       new features, include -test to show which column(s) selected

       Instead of manually counting which column(s) you wish to select for sorting or single-line
       mode, -test will show you column numbering, which  can  be  particulary  useful  for  wide
       lines.  Additional switches for enhanced multi-line formatting have also been included.

       several changes to single line mode

              new  way  to  request  prefacing lines with timestamps: Simply add the desired time
              format using -o to the collectl command

              no longer need -w for non-plot data: colmux is smart  enough  to  recognize  fields
              that  end  in  K/M/G  and  convert  them  to the appropriate values before sorting.
              However it will still display them in their original forms.  Further, you can  even
              sort on non-numeric fields such as device names and many of the fields reported for
              process data.

       several switched eliminated
              Yes, it is hard to believe but a number of switches  have  been  eliminated  either
              because  their  functionality  is encompassed in other mechanisms or their function
              has been deemed obsolete.

              -date, -mmdd, -time: time formats now handled with -o in collectl command

              -hosts, -machines: use -address

              -rsh: nobody uses rsh anymore


       All logs being played back must have been collected using the same interval as colmux only
       looks at the first file/host to determine the appropriate value.

       It  is  assumed all clocks are resonable well syncronized as colmux uses time to determine
       which data is to be displayed as a set.

       All files must be in the same directory on all systems and that directory must be included
       in the playback file specification

       All files on a remote host must be for that host only


       Run collectl on 3 nodes, showing CPU, Disk and Network statistics once a second and sorted
       by column 1, which happens to be total cpu.

       colmux -addr abc,def,xyz

       Dynamically display top processes on nodes n1-n10 of a cluster once a  second,  sorted  by
       column 5.

       colmux -addr n[1-10] -command "-sZ :1" -column 5

       Do  the  same for yesterday, between the hours of 5AM and 6AM, being sure to stall for 1/2
       second between intervals.  Note, if you leave off -addr you could put all  the  logs  into
       /var/log/collectl on the local host and play them back from there.

       colmux  -addr  n[1-10]  -command  "-sZ  -p/var/log/collectl/YESTERDAY  -from  05:00-06:00"
       -column 5 -delay .5

       Look at the amount of mapped and slab memory consumed on nodes n1-n10  and  n15  in  real-
       time, every 2 seconds using single-line format.  Include totals and preface each line with
       the time.  Since memory sizes tend to be rather large, divide each by 1024 so  we  see  MB
       rather  than  KB.   Note that the columns numbers are always displayed are ascending order
       reguardless of their order in -cols. To be sure, first test the column numbers.

       colmux -addr n[1-10,15] -command "-sm -i2 -oT" -cols 6,7 -coltot -colk -test
       colmux -addr n[1-10,15] -command "-sm -i2 -oT" -cols 6,7 -coltot -colk

       Display most active disks, based on KB written, on nodes n1, n4 and n5.

       colmux -addr n1,n4,n5 -command "-sD" -column 6

       Here is a cool trick.  Collectl currently lets you look at top processes  with  the  --top
       switch  and  even choose a sort column by name.  However, if you want to change the column
       you need to exit, then rerun collectl with a different sort column name.  But if  you  run
       it  like  this example, you get the power of colmux to dynamically change the sort columns
       with the arrow keys!  You can also use this technique to have  collectl  dynamically  sort
       any  local  multi-line  data  such as slabs or even detail data like CPU, Disk, Lustre and
       Networks too!  Naturally this technique works just as well with playing back data as well.

       colmux -command "-sZ -i:1"


       colmux requires passwordless ssh between the node it is running on those it is monitoring.
       also be sure the port you are using for communications, the default is 2655, if open


       see source code


       This program was written by Mark Seger (
       Copyright 2010 Hewlett-Packard Development Company, L.P.