Provided by: openafs-fileserver_1.6.7-1ubuntu1.1_amd64 bug

NAME

       fileserver - Initializes the File Server component of the fs process

SYNOPSIS

       fileserver
           [-auditlog <path to log file>]
           [-audit-interface (file | sysvmq)]
           [-d <debug level>]
           [-p <number of processes>]
           [-spare <number of spare blocks>]
           [-pctspare <percentage spare>]
           [-b <buffers>]
           [-l <large vnodes>]
           [-s <small vnodes>]
           [-vc <volume cachesize>]
           [-w <call back wait interval>]
           [-cb <number of call backs>]
           [-banner]
           [-novbc]
           [-implicit <admin mode bits: rlidwka>]
           [-readonly]
           [-hr <number of hours between refreshing the host cps>]
           [-busyat <redirect clients when queue > n>]
           [-nobusy]
           [-rxpck <number of rx extra packets>]
           [-rxdbg]
           [-rxdbge]
           [-rxmaxmtu <bytes>]
           [-nojumbo]
           [-jumbo]
           [-rxbind]
           [-allow-dotted-principals]
           [-L]
           [-S]
           [-k <stack size>]
           [-realm <Kerberos realm name>]
           [-udpsize <size of socket buffer in bytes>]
           [-sendsize <size of send buffer in bytes>]
           [-abortthreshold <abort threshold>]
           [-enable_peer_stats]
           [-enable_process_stats]
           [-syslog [< loglevel >]]
           [-mrafslogs]
           [-saneacls]
           [-help]
           [-vhandle-setaside <fds reserved for non-cache io>]
           [-vhandle-max-cachesize <max open files>]
           [-vhandle-initial-cachesize <fds reserved for non-cache io>]
           [-vattachpar <number of volume attach threads>]
           [-m <min percentage spare in partition>]
           [-lock]
           [-sync <sync behavior>]

DESCRIPTION

       The fileserver command initializes the File Server component of the "fs" process. In the conventional
       configuration, its binary file is located in the /usr/lib/openafs directory on a file server machine.

       The fileserver command is not normally issued at the command shell prompt, but rather placed into a
       database server machine's /etc/openafs/BosConfig file with the bos create command. If it is ever issued
       at the command shell prompt, the issuer must be logged onto a file server machine as the local superuser
       "root".

       The File Server creates the /var/log/openafs/FileLog log file as it initializes, if the file does not
       already exist. It does not write a detailed trace by default, but the -d option may be used to increase
       the amount of detail. Use the bos getlog command to display the contents of the log file.

       The command's arguments enable the administrator to control many aspects of the File Server's
       performance, as detailed in OPTIONS.  By default the File Server sets values for many arguments that are
       suitable for a medium-sized file server machine. To set values suitable for a small or large file server
       machine, use the -S or -L flag respectively. The following list describes the parameters and
       corresponding argument for which the File Server sets default values, and the table below summarizes the
       setting for each of the three machine sizes.

       •   The  maximum  number  of  lightweight  processes  (LWPs)  or  pthreads the File Server uses to handle
           requests for data; corresponds to the -p argument. The File Server always uses a minimum of 32 KB  of
           memory for these processes.

       •   The  maximum  number  of  directory  blocks  the  File Server caches in memory; corresponds to the -b
           argument. Each cached directory block (buffer) consumes 2,092 bytes of memory.

       •   The maximum number of large vnodes the File Server caches in memory for tracking directory  elements;
           corresponds to the -l argument. Each large vnode consumes 292 bytes of memory.

       •   The  maximum  number  of  small  vnodes  the File Server caches in memory for tracking file elements;
           corresponds to the -s argument.  Each small vnode consumes 100 bytes of memory.

       •   The maximum volume cache size, which determines how many volumes the File Server can cache in  memory
           before having to retrieve data from disk; corresponds to the -vc argument.

       •   The  maximum  number  of callback structures the File Server caches in memory; corresponds to the -cb
           argument. Each callback structure consumes 16 bytes of memory.

       •   The maximum number of Rx packets the File Server uses;  corresponds  to  the  -rxpck  argument.  Each
           packet consumes 1544 bytes of memory.

       The default values are:

         Parameter (Argument)               Small (-S)     Medium   Large (-L)
         ---------------------------------------------------------------------
         Number of LWPs (-p)                        6           9          128
         Number of cached dir blocks (-b)          70          90          120
         Number of cached large vnodes (-l)       200         400          600
         Number of cached small vnodes (-s)       200         400          600
         Maximum volume cache size (-vc)          200         400          600
         Number of callbacks (-cb)             20,000      60,000       64,000
         Number of Rx packets (-rxpck)            100         150          200

       To  override  any  of the values, provide the indicated argument (which can be combined with the -S or -L
       flag).

       The amount of memory required for the File Server varies. The approximate default memory usage is 751  KB
       when the -S flag is used (small configuration), 1.1 MB when all defaults are used (medium configuration),
       and  1.4 MB when the -L flag is used (large configuration). If additional memory is available, increasing
       the value of the -cb and -vc arguments can improve File Server performance most directly.

       By default, the File Server allows a volume to exceed its quota by 1 MB when an  application  is  writing
       data  to  an existing file in a volume that is full. The File Server still does not allow users to create
       new files in a full volume. To change the default, use one of the following arguments:

       •   Set the -spare argument to the number of extra kilobytes that the File Server allows  as  overage.  A
           value of 0 allows no overage.

       •   Set the -pctspare argument to the percentage of the volume's quota the File Server allows as overage.

       By  default,  the  File  Server  implicitly  grants  the "a" (administer) and "l" (lookup) permissions to
       system:administrators on the access control list (ACL) of every directory in the volumes  stored  on  its
       file  server machine. In other words, the group's members can exercise those two permissions even when an
       entry for the group does not appear on an ACL.  To  change  the  set  of  default  permissions,  use  the
       -implicit argument.

       The  File  Server  maintains  a  host current protection subgroup (host CPS) for each client machine from
       which it has received a data access request. Like the CPS for a  user,  a  host  CPS  lists  all  of  the
       Protection  Database  groups to which the machine belongs, and the File Server compares the host CPS to a
       directory's ACL to determine in what manner users on the machine are authorized to access the directory's
       contents. When the pts adduser or pts removeuser command is used to change the groups to which a  machine
       belongs, the File Server must recompute the machine's host CPS in order to notice the change. By default,
       the  File  Server contacts the Protection Server every two hours to recompute host CPSs, implying that it
       can take that long for changed group memberships to become effective. To change this frequency,  use  the
       -hr argument.

       The  File  Server  stores  volumes  in partitions. A partition is a filesystem or directory on the server
       machine that is named "/vicepX" or "/vicepXX" where XX is "a" through "z" or "aa" though "iv". Up to  255
       partitions  are  allowed.  The  File Server expects that the /vicepXX directories are each on a dedicated
       filesystem. The File Server will only use a /vicepXX if it's a mountpoint for another filesystem,  unless
       the   file   "/vicepXX/AlwaysAttach"   exists.    A   partition   will   not   be  mounted  if  the  file
       "/vicepXX/NeverAttach" exists. If both "/vicepXX/AlwaysAttach" and  "/vicepXX/NeverAttach"  are  present,
       then "/vicepXX/AlwaysAttach" wins.  The data in the partition is a special format that can only be access
       using OpenAFS commands or an OpenAFS client.

       The File Server generates the following message when a partition is nearly full:

          No space left on device

       This  command does not use the syntax conventions of the AFS command suites. Provide the command name and
       all option names in full.

CAUTIONS

       There are two strategies the File Server can use for attaching AFS volumes at startup and handling volume
       salvages.  The traditional method assumes all volumes are salvaged before  the  File  Server  starts  and
       attaches all volumes at start before serving files.  The newer demand-attach method attaches volumes only
       on  demand,  salvaging  them at that time as needed, and detaches volumes that are not in use.  A demand-
       attach File Server can also save state to disk for  faster  restarts.  The  dafileserver  implements  the
       demand-attach method, while fileserver uses the traditional method.

       The  choice  of  traditional  or  demand-attach File Server changes the required setup in BosConfig. When
       changing from a traditional File Server to demand-attach or vice versa, you will need to stop and  remove
       the "fs" or "dafs" node in BosConfig and create a new node of the appropriate type. See bos_create(8) for
       more information.

       Do  not  use the -k and -w arguments, which are intended for use by the OpenAFS developers only. Changing
       them from their default values can result in unpredictable File Server behavior.  In any  case,  on  many
       operating  systems  the  File  Server  uses  native  threads rather than the LWP threads, so using the -k
       argument to set the number of LWP threads has no effect.

       Do not specify both the -spare and -pctspare arguments. Doing so causes the File Server to exit,  leaving
       an error message in the /var/log/openafs/FileLog file.

       Options  that  are  available  only on some system types, such as the -m and -lock options, appear in the
       output generated by the -help option only on the relevant system type.

       Currently, the maximum size of a volume quota is 2 terabytes (2^41 bytes)  and  the  maximum  size  of  a
       /vicepX  partition  on  a  fileserver is 2^64 kilobytes. The maximum partition size in releases 1.4.7 and
       earlier is 2 terabytes (2^31 bytes). The maximum partition size for 1.5.x releases 1.5.34 and earlier  is
       2 terabytes as well.

       The  maximum number of directory entries is 64,000 if all of the entries have names that are 15 octets or
       less in length. A name that is 15 octets long requires the use  of  only  one  block  in  the  directory.
       Additional  sequential  blocks  are  required to store entries with names that are longer than 15 octets.
       Each additional block provides an additional length of 32 octets for the name of the entry. Note that  if
       file names use an encoding like UTF-8, a single character may be encoded into multiple octets.

       In  real  world  use,  the  maximum  number  of objects in an AFS directory is usually between 16,000 and
       25,000, depending on the average name length.

OPTIONS

       -auditlog <log path>
           Turns on audit logging, and sets the path for the audit log.  The audit log records information about
           RPC calls, including the name of the RPC call, the host that submitted the  call,  the  authenticated
           entity (user) that issued the call, the parameters for the call, and if the call succeeded or failed.

       -audit-interface (file | sysvmq)
           Specifies  what audit interface to use. The "file" interface writes audit messages to the file passed
           to -auditlog. The "sysvmq" interface writes audit messages to  a  SYSV  message  (see  msgget(2)  and
           msgrcv(2)).  The  message  queue  the "sysvmq" interface writes to has the key "ftok(path, 1)", where
           "path" is the path specified in the -auditlog option.

           Defaults to "file".

       -d <debug level>
           Sets the detail level for the debugging trace written to the /var/log/openafs/FileLog  file.  Provide
           one  of the following values, each of which produces an increasingly detailed trace: 0, 1, 5, 25, and
           125. The default value of 0 produces only a few messages.

       -p <number of processes>
           Sets the number of threads (or LWPs) to run. Provide a positive integer.  The File Server creates and
           uses five threads for special purposes, in addition to the number specified  (but  if  this  argument
           specifies the maximum possible number, the File Server automatically uses five of the threads for its
           own purposes).

           The  maximum  number  of  threads can differ in each release of OpenAFS.  Consult the OpenAFS Release
           Notes for the current release.

       -spare <number of spare blocks>
           Specifies the number of additional kilobytes an application can store in a volume after the quota  is
           exceeded. Provide a positive integer; a value of 0 prevents the volume from ever exceeding its quota.
           Do not combine this argument with the -pctspare argument.

       -pctspare <percentage spare>
           Specifies the amount by which the File Server allows a volume to exceed its quota, as a percentage of
           the  quota. Provide an integer between 0 and 99. A value of 0 prevents the volume from ever exceeding
           its quota. Do not combine this argument with the -spare argument.

       -b <buffers>
           Sets the number of directory buffers. Provide a positive integer.

       -l <large vnodes>
           Sets the number of large vnodes available  in  memory  for  caching  directory  elements.  Provide  a
           positive integer.

       -s <small nodes>
           Sets  the  number  of  small vnodes available in memory for caching file elements. Provide a positive
           integer.

       -vc <volume cachesize>
           Sets the number of volumes the File Server can cache in memory.  Provide a positive integer.

       -w <call back wait interval>
           Sets the interval at which the daemon spawned by the File Server performs its maintenance  tasks.  Do
           not use this argument; changing the default value can cause unpredictable behavior.

       -cb <number of callbacks>
           Sets the number of callbacks the File Server can track. Provide a positive integer.

       -banner
           Prints the following banner to /dev/console about every 10 minutes.

              File Server is running at I<time>.

       -novbc
           Prevents  the  File  Server from breaking the callbacks that Cache Managers hold on a volume that the
           File Server is reattaching after the volume was offline (as a result of the vos restore command,  for
           example). Use of this flag is strongly discouraged.

       -implicit <admin mode bits>
           Defines  the  set  of permissions granted by default to the system:administrators group on the ACL of
           every directory in a volume stored on the file server machine. Provide one or more  of  the  standard
           permission letters ("rlidwka") and auxiliary permission letters ("ABCDEFGH"), or one of the shorthand
           notations  for  groups  of permissions ("all", "none", "read", and "write"). To review the meaning of
           the permissions, see the fs setacl reference page.

       -readonly
           Don't allow writes to this fileserver.

       -hr <number of hours between refreshing the host cps>
           Specifies how often the File Server refreshes its knowledge of the machines that belong to protection
           groups (refreshes the host CPSs for machines). The File Server must update this information to enable
           users from machines recently added to protection groups to access data for which those  machines  now
           have the necessary ACL permissions.

       -busyat <redirect clients when queue > n>
           Defines  the  number  of incoming RPCs that can be waiting for a response from the File Server before
           the File Server returns the error code "VBUSY" to the Cache Manager that  sent  the  latest  RPC.  In
           response,  the  Cache  Manager  retransmits  the  RPC  after  a  delay.  This  argument  prevents the
           accumulation of so many waiting RPCs that the File Server can  never  process  them  all.  Provide  a
           positive integer.  The default value is 600.

       -rxpck <number of rx extra packets>
           Controls  the  number  of  Rx packets the File Server uses to store data for incoming RPCs that it is
           currently handling, that are waiting for a response, and for  replies  that  are  not  yet  complete.
           Provide a positive integer.

       -rxdbg
           Writes a trace of the File Server's operations on Rx packets to the file /var/log/openafs/rx_dbg.

       -rxdbge
           Writes  a  trace  of  the File Server's operations on Rx events (such as retransmissions) to the file
           /var/log/openafs/rx_dbg.

       -rxmaxmtu <bytes>
           Defines the maximum size of an MTU.  The value must be between the minimum and  maximum  packet  data
           sizes for Rx.

       -jumbo
           Allows the server to send and receive jumbograms. A jumbogram is a large-size packet composed of 2 to
           4  normal  Rx  data  packets  that  share  the same header. The fileserver does not use jumbograms by
           default, as some routers are not capable of properly breaking the jumbogram into smaller packets  and
           reassembling them.

       -nojumbo
           Deprecated; jumbograms are disabled by default.

       -rxbind
           Force the fileserver to only bind to one IP address.

       -allow-dotted-principals
           By  default,  the  RXKAD security layer will disallow access by Kerberos principals with a dot in the
           first component of their name. This is  to  avoid  the  confusion  where  principals  user/admin  and
           user.admin  are both mapped to the user.admin PTS entry. Sites whose Kerberos realms don't have these
           collisions between principal names may disable this check by starting the server with this option.

       -L  Sets values for many arguments in a manner suitable for a large file  server  machine.  Combine  this
           flag  with  any  option except the -S flag; omit both flags to set values suitable for a medium-sized
           file server machine.

       -S  Sets values for many arguments in a manner suitable for a small file  server  machine.  Combine  this
           flag  with  any  option except the -L flag; omit both flags to set values suitable for a medium-sized
           file server machine.

       -k <stack size>
           Sets the LWP stack size in units of 1 kilobyte. Do not use this argument, and in  particular  do  not
           specify a value less than the default of 24.

       -realm <Kerberos realm name>
           Defines the Kerberos realm name for the File Server to use. If this argument is not provided, it uses
           the realm name corresponding to the cell listed in the local /etc/openafs/server/ThisCell file.

       -udpsize <size of socket buffer in bytes>
           Sets  the  size  of the UDP buffer, which is 64 KB by default. Provide a positive integer, preferably
           larger than the default.

       -sendsize <size of send buffer in bytes>
           Sets the size of the send buffer, which is 16384 bytes by default.

       -abortthreshold <abort threshold>
           Sets the abort threshold, which is triggered when  an  AFS  client  sends  a  number  of  FetchStatus
           requests  in  a  row  and  all of them fail due to access control or some other error. When the abort
           threshold is reached, the file server starts to slow down the responses  to  the  problem  client  in
           order to reduce the load on the file server.

           The throttling behaviour can cause issues especially for some versions of the Windows OpenAFS client.
           When  using  Windows Explorer to navigate the AFS directory tree, directories with only "look" access
           for the current user may load more slowly because of the throttling.  This  is  because  the  Windows
           OpenAFS  client  sends  FetchStatus  calls  one  at  a time instead of in bulk like the Unix Open AFS
           client.

           Setting the threshold to 0 disables the throttling behavior. This  option  is  available  in  OpenAFS
           versions 1.4.1 and later.

       -enable_peer_stats
           Activates the collection of Rx statistics and allocates memory for their storage. For each connection
           with  a  specific  UDP  port  on  another  machine,  a  separate  record is kept for each type of RPC
           (FetchFile, GetStatus, and so on) sent or received. To display or otherwise access the  records,  use
           the Rx Monitoring API.

       -enable_process_stats
           Activates  the  collection of Rx statistics and allocates memory for their storage. A separate record
           is kept for each type of RPC (FetchFile, GetStatus, and so on) sent or received, aggregated over  all
           connections to other machines. To display or otherwise access the records, use the Rx Monitoring API.

       -syslog [<loglevel]
           Use  syslog  instead  of  the  normal  logging location for the fileserver process.  If provided, log
           messages are at <loglevel> instead of the default LOG_USER.

       -mrafslogs
           Use MR-AFS (Multi-Resident) style logging.  This option is deprecated.

       -saneacls
           Offer the SANEACLS capability for the fileserver.  This option is currently unimplemented.

       -help
           Prints the online help for this command. All other valid options are ignored.

       -vhandle-setaside <fds reserved for non-cache io>
           Number of file handles set aside for I/O not in the cache. Defaults to 128.

       -vhandle-max-cachesize <max open files>
           Maximum number of available file handles.

       -vhandle-initial-cachesize <initial open file cache>
           Number of file handles set aside for I/O in the cache. Defaults to 128.

       -vattachpar <number of volume attach threads>
           The number of threads assigned to attach and detach volumes.  The default is 1.  Warning: many of the
           I/O parallelism features of Demand-Attach Fileserver are turned off when the number of volume  attach
           threads is only 1.

           This option is only meaningful for a file server built with pthreads support.

       -m <min percentage spare in partition>
           Specifies the percentage of each AFS server partition that the AIX version of the File Server creates
           as  a  reserve. Specify an integer value between 0 and 30; the default is 8%. A value of 0 means that
           the partition can become completely full, which can have serious negative consequences.  This  option
           is not supported on platforms other than AIX.

       -lock
           Prevents  any  portion  of  the  fileserver binary from being paged (swapped) out of memory on a file
           server machine running the IRIX operating system.  This option is not supported  on  platforms  other
           than IRIX.

       -sync <always | delayed | onclose | never>
           This  option  changes  how  hard the fileserver tries to ensure that data written to volumes actually
           hits the physical disk.

           Normally, when the fileserver writes to disk, the underlying filesystem or Operating System may delay
           writes from actually going to disk, and reorder which writes  hit  the  disk  first.  So,  during  an
           unclean  shutdown  of  the  machine  (if  the power goes out, or the machine crashes, etc), or if the
           physical disk backing store becomes unavailable, file data may become lost that the server previously
           told clients was already successfully written.

           To try to mitigate this, the fileserver will try to "sync" file data to the physical disk at numerous
           points during various I/O. However, this can result in significantly reduced  performance.  Depending
           on  the usage patterns, this may or may not be acceptable. This option dictates specifically what the
           fileserver does when it wants to perform a "sync".

           There are several options; pass one of these as the argument to -sync. The default is "delayed".

           always
               This causes a sync operation to always sync immediately and synchronously.  This is  the  slowest
               option that provides the greatest protection against data loss in the event of a crash or backing
               store unavailability.

               Note  that  this  is  still not a 100% guarantee that data will not be lost or corrupted during a
               crash. The underlying filesystem itself may cause data to be lost or corrupt in such a situation.
               And OpenAFS itself does not (yet) even guarantee that all data is  consistent  at  any  point  in
               time;  so  even  if  the  filesystem  and  OS  do  not  buffer or reorder any writes, you are not
               guaranteed that all data will be okay after a crash.

               This option may be appropriate if you have reason to believe a  server  is  prone  to  data  loss
               failures,  such  as  if the server encounters frequent power failures or connectivity issues with
               network attached storage. Or if the backend storage is temporarily  degraded  in  some  way  (for
               example,  a  battery  on  a  caching  controller fails), it may make sense to temporarily use the
               "always" option until the situation is fixed. Some servers may also allow for sync operations  to
               occur very quickly, such that the "always" option is not noticeably slower than any other option.
               In such a case, there is no downside to specifying "always".

               This was the only behavior allowed in OpenAFS releases prior to 1.4.5.

           delayed
               This  causes  a  sync  to  do  nothing  immediately,  but  the  sync  will happen sometime in the
               background, within approximately the next 10 seconds. This works by having a separate thread that
               goes through all open file handles every 10 seconds, and it syncs the ones that have been  marked
               as  needing  a  sync.  File  handles  flagged  for sync may also get synced on volume detachment,
               according to the same behavior as with the "onclose" option.

               This option is currently not recommended, since in the past the code implementing this option has
               caused rare data corruption during normal operation. However, it is currently the default  option
               to allow consistent behavior from previous OpenAFS releases.

               This  was  the  only behavior allowed in OpenAFS releases starting from 1.4.5 up to and including
               1.6.2. It is the default starting in OpenAFS 1.6.3. This option  will  be  removed  in  a  future
               version of OpenAFS, and the default behavior will likely change to the "onclose" behavior.

           onclose
               This  causes  a  sync  to  do  nothing immediately, but causes the relevant file to be flagged as
               potentially needing a sync. When a volume is detached, flagged volume metadata files are  synced,
               as  well  as  data  files  that have been accessed recently. Events that cause a volume to detach
               include: performing certain volume operations (restore, salvage, offline, et  al),  detection  of
               volume consistency errors, a clean shutdown of the fileserver, or during DAFS "soft detachment".

               Effectively  this  option  is  the  same as "never" while a volume is attached and actively being
               used, but if a volume is detached, there is an additional guarantee for the data's consistency.

           never
               This causes all syncs to never do  anything.  This  is  the  fastest  option,  with  the  weakest
               guarantees for data consistency.

               Depending  on  the  underlying  filesystem and Operating System, there may be guarantees that any
               data written to disk will hit the physical media after a certain amount  of  time.  For  example,
               Linux's  pdflush  process  usually  makes  this  guarantee,  and  ext3  can  make certain various
               consistency guarantees according to the options given. ZFS on Solaris can  also  provide  similar
               guarantees,  as  can  various other platforms and filesystems. Consult the documentation for your
               platform if you are unsure.

           Which option you choose is not an easy decision to make. Various  developers  and  experts  sometimes
           disagree  on  which  option  is  the  most reasonable, and it may depend on the specific scenario and
           workload involved. Some argue that  the  "always"  option  does  not  provide  significantly  greater
           guarantees  over  any  other option, whereas others argue that choosing anything besides the "always"
           option allows for an unacceptable risk of data loss. This may depend on  your  usage  patterns,  your
           hardware, your platform and filesystem, and who you talk to about this topic.

EXAMPLES

       The  following  bos  create  command  creates  a  traditional  fs  process  on  the  file  server machine
       "fs2.abc.com" that uses the large configuration size, and allows volumes to exceed their  quota  by  10%.
       Type the command on a single line:

          % bos create -server fs2.abc.com -instance fs -type fs \
                       -cmd "/usr/lib/openafs/fileserver -pctspare 10 -L" \
                       /usr/lib/openafs/volserver /usr/lib/openafs/salvager

TROUBLESHOOTING

       Sending process signals to the File Server Process can change its behavior in the following ways:

         Process          Signal       OS     Result
         ---------------------------------------------------------------------

         File Server      XCPU        Unix    Prints a list of client IP
                                              Addresses.

         File Server      USR2      Windows   Prints a list of client IP
                                              Addresses.

         File Server      POLL        HPUX    Prints a list of client IP
                                              Addresses.

         Any server       TSTP        Any     Increases Debug level by a power
                                              of 5 -- 1,5,25,125, etc.
                                              This has the same effect as the
                                              -d XXX command-line option.

         Any Server       HUP         Any     Resets Debug level to 0

         File Server      TERM        Any     Run minor instrumentation over
                                              the list of descriptors.

         Other Servers    TERM        Any     Causes the process to quit.

         File Server      QUIT        Any     Causes the File Server to Quit.
                                              Bos Server knows this.

       The  basic  metric of whether an AFS file server is doing well is the number of connections waiting for a
       thread, which can be found by running the following command:

          % rxdebug <server> | grep waiting_for | wc -l

       Each line returned by "rxdebug" that contains the  text  "waiting_for"  represents  a  connection  that's
       waiting for a file server thread.

       If  the  blocked connection count is ever above 0, the server is having problems replying to clients in a
       timely fashion.  If it gets above 10, roughly, there will be noticeable slowness by the user.  The  total
       number  of  connections  is a mostly irrelevant number that goes essentially monotonically for as long as
       the server has been running and then goes back down to zero when it's restarted.

       The most common cause of blocked connections rising on a server is some process somewhere  performing  an
       abnormal  number  of  accesses  to  that  server  and  its  volumes.   If multiple servers have a blocked
       connection count, the most likely explanation is that there is a volume replicated between those  servers
       that is absorbing an abnormally high access rate.

       To get an access count on all the volumes on a server, run:

          % vos listvol <server> -long

       and  save the output in a file.  The results will look like a bunch of vos examine output for each volume
       on the server.  Look for lines like:

          40065 accesses in the past day (i.e., vnode references)

       and look for volumes with an abnormally high number of accesses.  Anything over 10,000  is  fairly  high,
       but  some volumes like root.cell and other volumes close to the root of the cell will have that many hits
       routinely.  Anything over 100,000 is generally abnormally high.  The count resets about once a day.

       Another approach that can be used to narrow the possibilities for  a  replicated  volume,  when  multiple
       servers are having trouble, is to find all replicated volumes for that server.  Run:

          % vos listvldb -server <server>

       where <server> is one of the servers having problems to refresh the VLDB cache, and then run:

          % vos listvldb -server <server> -part <partition>

       to get a list of all volumes on that server and partition, including every other server with replicas.

       Once the volume causing the problem has been identified, the best way to deal with the problem is to move
       that  volume  to  another  server with a low load or to stop any runaway programs that are accessing that
       volume unnecessarily.  Often the volume will be enough information to tell what's going on.

       If you still need additional information about who's hitting that server, sometimes you can guess at that
       information from the failed callbacks in the FileLog log in /var/log/afs  on  the  server,  or  from  the
       output of:

          % /usr/afsws/etc/rxdebug <server> -rxstats

       but  the best way is to turn on debugging output from the file server.  (Warning: This generates a lot of
       output into FileLog on the AFS server.)  To do this, log on to the  AFS  server,  find  the  PID  of  the
       fileserver process, and do:

           kill -TSTP <pid>

       where  <pid>  is  the PID of the file server process.  This will raise the debugging level so that you'll
       start seeing what people are actually doing on the server.  You can do this up to three more times to get
       even more output if needed.  To reset the debugging level back to normal, use (The following command will
       NOT terminate the file server):

           kill -HUP <pid>

       The debugging setting on the File Server should be reset back to  normal  when  debugging  is  no  longer
       needed.  Otherwise, the AFS server may well fill its disks with debugging output.

       The lines of the debugging output that are most useful for debugging load problems are:

           SAFS_FetchStatus,  Fid = 2003828163.77154.82248, Host 171.64.15.76
           SRXAFS_FetchData, Fid = 2003828163.77154.82248

       (The example above is partly truncated to highlight the interesting information).  The Fid identifies the
       volume and inode within the volume; the volume is the first long number.  So, for example, this was:

          % vos examine 2003828163
          pubsw.matlab61                   2003828163 RW    1040060 K  On-line
              afssvr5.Stanford.EDU /vicepa
              RWrite 2003828163 ROnly 2003828164 Backup 2003828165
              MaxQuota    3000000 K
              Creation    Mon Aug  6 16:40:55 2001
              Last Update Tue Jul 30 19:00:25 2002
              86181 accesses in the past day (i.e., vnode references)

              RWrite: 2003828163    ROnly: 2003828164    Backup: 2003828165
              number of sites -> 3
                 server afssvr5.Stanford.EDU partition /vicepa RW Site
                 server afssvr11.Stanford.EDU partition /vicepd RO Site
                 server afssvr5.Stanford.EDU partition /vicepa RO Site

       and from the Host information one can tell what system is accessing that volume.

       Note  that  the  output  of  vos_examine(1)  also includes the access count, so once the problem has been
       identified, vos examine can be used to see if the access count is still increasing.  Also  remember  that
       you can run vos examine on the read-only replica (e.g., pubsw.matlab61.readonly) to see the access counts
       on the read-only replica on all of the servers that it's located on.

PRIVILEGE REQUIRED

       The  issuer  must be logged in as the superuser "root" on a file server machine to issue the command at a
       command shell prompt.  It is conventional instead to create and start the  process  by  issuing  the  bos
       create command.

SEE ALSO

       BosConfig(5),  FileLog(5), bos_create(8), bos_getlog(8), fs_setacl(1), msgget(2), msgrcv(2), salvager(8),
       volserver(8), vos_examine(1)

COPYRIGHT

       IBM Corporation 2000. <http://www.ibm.com/> All Rights Reserved.

       This documentation is covered by the IBM Public License Version 1.0.  It was converted from HTML  to  POD
       by  software  written  by  Chas  Williams  and Russ Allbery, based on work by Alf Wachsmann and Elizabeth
       Cassell.

OpenAFS                                            2015-11-10                                      FILESERVER(8)