Provided by: openafs-fileserver_1.6.15-1ubuntu1.1_amd64 bug

NAME

       fileserver - Initializes the File Server component of the fs process

SYNOPSIS

       fileserver
           [-auditlog <path to log file>]
           [-audit-interface (file | sysvmq)]
           [-d <debug level>]
           [-p <number of processes>]
           [-spare <number of spare blocks>]
           [-pctspare <percentage spare>]
           [-b <buffers>]
           [-l <large vnodes>]
           [-s <small vnodes>]
           [-vc <volume cachesize>]
           [-w <call back wait interval>]
           [-cb <number of call backs>]
           [-banner]
           [-novbc]
           [-implicit <admin mode bits: rlidwka>]
           [-readonly]
           [-hr <number of hours between refreshing the host cps>]
           [-busyat <redirect clients when queue > n>]
           [-nobusy]
           [-rxpck <number of rx extra packets>]
           [-rxdbg]
           [-rxdbge]
           [-rxmaxmtu <bytes>]
           [-nojumbo]
           [-jumbo]
           [-rxbind]
           [-allow-dotted-principals]
           [-L]
           [-S]
           [-k <stack size>]
           [-realm <Kerberos realm name>]
           [-udpsize <size of socket buffer in bytes>]
           [-sendsize <size of send buffer in bytes>]
           [-abortthreshold <abort threshold>]
           [-enable_peer_stats]
           [-enable_process_stats]
           [-syslog [< loglevel >]]
           [-mrafslogs]
           [-saneacls]
           [-help]
           [-vhandle-setaside <fds reserved for non-cache io>]
           [-vhandle-max-cachesize <max open files>]
           [-vhandle-initial-cachesize <fds reserved for non-cache io>]
           [-vattachpar <number of volume attach threads>]
           [-m <min percentage spare in partition>]
           [-lock]
           [-sync <sync behavior>]
           [-offline-timeout <timeout in seconds>]
           [-offline-shutdown-timeout <timeout in seconds>]

DESCRIPTION

       The fileserver command initializes the File Server component of the "fs" process. In the
       conventional configuration, its binary file is located in the /usr/lib/openafs directory
       on a file server machine.

       The fileserver command is not normally issued at the command shell prompt, but rather
       placed into a database server machine's /etc/openafs/BosConfig file with the bos create
       command. If it is ever issued at the command shell prompt, the issuer must be logged onto
       a file server machine as the local superuser "root".

       The File Server creates the /var/log/openafs/FileLog log file as it initializes, if the
       file does not already exist. It does not write a detailed trace by default, but the -d
       option may be used to increase the amount of detail. Use the bos getlog command to display
       the contents of the log file.

       The command's arguments enable the administrator to control many aspects of the File
       Server's performance, as detailed in OPTIONS.  By default the File Server sets values for
       many arguments that are suitable for a medium-sized file server machine. To set values
       suitable for a small or large file server machine, use the -S or -L flag respectively. The
       following list describes the parameters and corresponding argument for which the File
       Server sets default values, and the table below summarizes the setting for each of the
       three machine sizes.

       •   The maximum number of lightweight processes (LWPs) or pthreads the File Server uses to
           handle requests for data; corresponds to the -p argument. The File Server always uses
           a minimum of 32 KB of memory for these processes.

       •   The maximum number of directory blocks the File Server caches in memory; corresponds
           to the -b argument. Each cached directory block (buffer) consumes 2,092 bytes of
           memory.

       •   The maximum number of large vnodes the File Server caches in memory for tracking
           directory elements; corresponds to the -l argument. Each large vnode consumes 292
           bytes of memory.

       •   The maximum number of small vnodes the File Server caches in memory for tracking file
           elements; corresponds to the -s argument.  Each small vnode consumes 100 bytes of
           memory.

       •   The maximum volume cache size, which determines how many volumes the File Server can
           cache in memory before having to retrieve data from disk; corresponds to the -vc
           argument.

       •   The maximum number of callback structures the File Server caches in memory;
           corresponds to the -cb argument. Each callback structure consumes 16 bytes of memory.

       •   The maximum number of Rx packets the File Server uses; corresponds to the -rxpck
           argument. Each packet consumes 1544 bytes of memory.

       The default values are:

         Parameter (Argument)               Small (-S)     Medium   Large (-L)
         ---------------------------------------------------------------------
         Number of LWPs (-p)                        6           9          128
         Number of cached dir blocks (-b)          70          90          120
         Number of cached large vnodes (-l)       200         400          600
         Number of cached small vnodes (-s)       200         400          600
         Maximum volume cache size (-vc)          200         400          600
         Number of callbacks (-cb)             20,000      60,000       64,000
         Number of Rx packets (-rxpck)            100         150          200

       To override any of the values, provide the indicated argument (which can be combined with
       the -S or -L flag).

       The amount of memory required for the File Server varies. The approximate default memory
       usage is 751 KB when the -S flag is used (small configuration), 1.1 MB when all defaults
       are used (medium configuration), and 1.4 MB when the -L flag is used (large
       configuration). If additional memory is available, increasing the value of the -cb and -vc
       arguments can improve File Server performance most directly.

       By default, the File Server allows a volume to exceed its quota by 1 MB when an
       application is writing data to an existing file in a volume that is full. The File Server
       still does not allow users to create new files in a full volume. To change the default,
       use one of the following arguments:

       •   Set the -spare argument to the number of extra kilobytes that the File Server allows
           as overage. A value of 0 allows no overage.

       •   Set the -pctspare argument to the percentage of the volume's quota the File Server
           allows as overage.

       By default, the File Server implicitly grants the "a" (administer) and "l" (lookup)
       permissions to system:administrators on the access control list (ACL) of every directory
       in the volumes stored on its file server machine. In other words, the group's members can
       exercise those two permissions even when an entry for the group does not appear on an ACL.
       To change the set of default permissions, use the -implicit argument.

       The File Server maintains a host current protection subgroup (host CPS) for each client
       machine from which it has received a data access request. Like the CPS for a user, a host
       CPS lists all of the Protection Database groups to which the machine belongs, and the File
       Server compares the host CPS to a directory's ACL to determine in what manner users on the
       machine are authorized to access the directory's contents. When the pts adduser or pts
       removeuser command is used to change the groups to which a machine belongs, the File
       Server must recompute the machine's host CPS in order to notice the change. By default,
       the File Server contacts the Protection Server every two hours to recompute host CPSs,
       implying that it can take that long for changed group memberships to become effective. To
       change this frequency, use the -hr argument.

       The File Server stores volumes in partitions. A partition is a filesystem or directory on
       the server machine that is named "/vicepX" or "/vicepXX" where XX is "a" through "z" or
       "aa" though "iv". Up to 255 partitions are allowed. The File Server expects that the
       /vicepXX directories are each on a dedicated filesystem. The File Server will only use a
       /vicepXX if it's a mountpoint for another filesystem, unless the file
       "/vicepXX/AlwaysAttach" exists.  A partition will not be mounted if the file
       "/vicepXX/NeverAttach" exists. If both "/vicepXX/AlwaysAttach" and "/vicepXX/NeverAttach"
       are present, then "/vicepXX/AlwaysAttach" wins.  The data in the partition is a special
       format that can only be access using OpenAFS commands or an OpenAFS client.

       The File Server generates the following message when a partition is nearly full:

          No space left on device

       This command does not use the syntax conventions of the AFS command suites. Provide the
       command name and all option names in full.

CAUTIONS

       There are two strategies the File Server can use for attaching AFS volumes at startup and
       handling volume salvages.  The traditional method assumes all volumes are salvaged before
       the File Server starts and attaches all volumes at start before serving files.  The newer
       demand-attach method attaches volumes only on demand, salvaging them at that time as
       needed, and detaches volumes that are not in use.  A demand-attach File Server can also
       save state to disk for faster restarts. The dafileserver implements the demand-attach
       method, while fileserver uses the traditional method.

       The choice of traditional or demand-attach File Server changes the required setup in
       BosConfig. When changing from a traditional File Server to demand-attach or vice versa,
       you will need to stop and remove the "fs" or "dafs" node in BosConfig and create a new
       node of the appropriate type. See bos_create(8) for more information.

       Do not use the -k and -w arguments, which are intended for use by the OpenAFS developers
       only. Changing them from their default values can result in unpredictable File Server
       behavior.  In any case, on many operating systems the File Server uses native threads
       rather than the LWP threads, so using the -k argument to set the number of LWP threads has
       no effect.

       Do not specify both the -spare and -pctspare arguments. Doing so causes the File Server to
       exit, leaving an error message in the /var/log/openafs/FileLog file.

       Options that are available only on some system types, such as the -m and -lock options,
       appear in the output generated by the -help option only on the relevant system type.

       Currently, the maximum size of a volume quota is 2 terabytes (2^41 bytes) and the maximum
       size of a /vicepX partition on a fileserver is 2^64 kilobytes. The maximum partition size
       in releases 1.4.7 and earlier is 2 terabytes (2^31 bytes). The maximum partition size for
       1.5.x releases 1.5.34 and earlier is 2 terabytes as well.

       The maximum number of directory entries is 64,000 if all of the entries have names that
       are 15 octets or less in length. A name that is 15 octets long requires the use of only
       one block in the directory. Additional sequential blocks are required to store entries
       with names that are longer than 15 octets. Each additional block provides an additional
       length of 32 octets for the name of the entry. Note that if file names use an encoding
       like UTF-8, a single character may be encoded into multiple octets.

       In real world use, the maximum number of objects in an AFS directory is usually between
       16,000 and 25,000, depending on the average name length.

OPTIONS

       -auditlog <log path>
           Turns on audit logging, and sets the path for the audit log.  The audit log records
           information about RPC calls, including the name of the RPC call, the host that
           submitted the call, the authenticated entity (user) that issued the call, the
           parameters for the call, and if the call succeeded or failed.

       -audit-interface (file | sysvmq)
           Specifies what audit interface to use. The "file" interface writes audit messages to
           the file passed to -auditlog. The "sysvmq" interface writes audit messages to a SYSV
           message (see msgget(2) and msgrcv(2)). The message queue the "sysvmq" interface writes
           to has the key "ftok(path, 1)", where "path" is the path specified in the -auditlog
           option.

           Defaults to "file".

       -d <debug level>
           Sets the detail level for the debugging trace written to the /var/log/openafs/FileLog
           file. Provide one of the following values, each of which produces an increasingly
           detailed trace: 0, 1, 5, 25, and 125. The default value of 0 produces only a few
           messages.

       -p <number of processes>
           Sets the number of threads (or LWPs) to run. Provide a positive integer.  The File
           Server creates and uses five threads for special purposes, in addition to the number
           specified (but if this argument specifies the maximum possible number, the File Server
           automatically uses five of the threads for its own purposes).

           The maximum number of threads can differ in each release of OpenAFS.  Consult the
           OpenAFS Release Notes for the current release.

       -spare <number of spare blocks>
           Specifies the number of additional kilobytes an application can store in a volume
           after the quota is exceeded. Provide a positive integer; a value of 0 prevents the
           volume from ever exceeding its quota. Do not combine this argument with the -pctspare
           argument.

       -pctspare <percentage spare>
           Specifies the amount by which the File Server allows a volume to exceed its quota, as
           a percentage of the quota. Provide an integer between 0 and 99. A value of 0 prevents
           the volume from ever exceeding its quota. Do not combine this argument with the -spare
           argument.

       -b <buffers>
           Sets the number of directory buffers. Provide a positive integer.

       -l <large vnodes>
           Sets the number of large vnodes available in memory for caching directory elements.
           Provide a positive integer.

       -s <small nodes>
           Sets the number of small vnodes available in memory for caching file elements. Provide
           a positive integer.

       -vc <volume cachesize>
           Sets the number of volumes the File Server can cache in memory.  Provide a positive
           integer.

       -w <call back wait interval>
           Sets the interval at which the daemon spawned by the File Server performs its
           maintenance tasks. Do not use this argument; changing the default value can cause
           unpredictable behavior.

       -cb <number of callbacks>
           Sets the number of callbacks the File Server can track. Provide a positive integer.

       -banner
           Prints the following banner to /dev/console about every 10 minutes.

              File Server is running at I<time>.

       -novbc
           Prevents the File Server from breaking the callbacks that Cache Managers hold on a
           volume that the File Server is reattaching after the volume was offline (as a result
           of the vos restore command, for example). Use of this flag is strongly discouraged.

       -implicit <admin mode bits>
           Defines the set of permissions granted by default to the system:administrators group
           on the ACL of every directory in a volume stored on the file server machine. Provide
           one or more of the standard permission letters ("rlidwka") and auxiliary permission
           letters ("ABCDEFGH"), or one of the shorthand notations for groups of permissions
           ("all", "none", "read", and "write"). To review the meaning of the permissions, see
           the fs setacl reference page.

       -readonly
           Don't allow writes to this fileserver.

       -hr <number of hours between refreshing the host cps>
           Specifies how often the File Server refreshes its knowledge of the machines that
           belong to protection groups (refreshes the host CPSs for machines). The File Server
           must update this information to enable users from machines recently added to
           protection groups to access data for which those machines now have the necessary ACL
           permissions.

       -busyat <redirect clients when queue > n>
           Defines the number of incoming RPCs that can be waiting for a response from the File
           Server before the File Server returns the error code "VBUSY" to the Cache Manager that
           sent the latest RPC. In response, the Cache Manager retransmits the RPC after a delay.
           This argument prevents the accumulation of so many waiting RPCs that the File Server
           can never process them all. Provide a positive integer.  The default value is 600.

       -rxpck <number of rx extra packets>
           Controls the number of Rx packets the File Server uses to store data for incoming RPCs
           that it is currently handling, that are waiting for a response, and for replies that
           are not yet complete. Provide a positive integer.

       -rxdbg
           Writes a trace of the File Server's operations on Rx packets to the file
           /var/log/openafs/rx_dbg.

       -rxdbge
           Writes a trace of the File Server's operations on Rx events (such as retransmissions)
           to the file /var/log/openafs/rx_dbg.

       -rxmaxmtu <bytes>
           Defines the maximum size of an MTU.  The value must be between the minimum and maximum
           packet data sizes for Rx.

       -jumbo
           Allows the server to send and receive jumbograms. A jumbogram is a large-size packet
           composed of 2 to 4 normal Rx data packets that share the same header. The fileserver
           does not use jumbograms by default, as some routers are not capable of properly
           breaking the jumbogram into smaller packets and reassembling them.

       -nojumbo
           Deprecated; jumbograms are disabled by default.

       -rxbind
           Force the fileserver to only bind to one IP address.

       -allow-dotted-principals
           By default, the RXKAD security layer will disallow access by Kerberos principals with
           a dot in the first component of their name. This is to avoid the confusion where
           principals user/admin and user.admin are both mapped to the user.admin PTS entry.
           Sites whose Kerberos realms don't have these collisions between principal names may
           disable this check by starting the server with this option.

       -L  Sets values for many arguments in a manner suitable for a large file server machine.
           Combine this flag with any option except the -S flag; omit both flags to set values
           suitable for a medium-sized file server machine.

       -S  Sets values for many arguments in a manner suitable for a small file server machine.
           Combine this flag with any option except the -L flag; omit both flags to set values
           suitable for a medium-sized file server machine.

       -k <stack size>
           Sets the LWP stack size in units of 1 kilobyte. Do not use this argument, and in
           particular do not specify a value less than the default of 24.

       -realm <Kerberos realm name>
           Defines the Kerberos realm name for the File Server to use. If this argument is not
           provided, it uses the realm name corresponding to the cell listed in the local
           /etc/openafs/server/ThisCell file.

       -udpsize <size of socket buffer in bytes>
           Sets the size of the UDP buffer, which is 64 KB by default. Provide a positive
           integer, preferably larger than the default.

       -sendsize <size of send buffer in bytes>
           Sets the size of the send buffer, which is 16384 bytes by default.

       -abortthreshold <abort threshold>
           Sets the abort threshold, which is triggered when an AFS client sends a number of
           FetchStatus requests in a row and all of them fail due to access control or some other
           error. When the abort threshold is reached, the file server starts to slow down the
           responses to the problem client in order to reduce the load on the file server.

           The throttling behaviour can cause issues especially for some versions of the Windows
           OpenAFS client. When using Windows Explorer to navigate the AFS directory tree,
           directories with only "look" access for the current user may load more slowly because
           of the throttling. This is because the Windows OpenAFS client sends FetchStatus calls
           one at a time instead of in bulk like the Unix Open AFS client.

           Setting the threshold to 0 disables the throttling behavior. This option is available
           in OpenAFS versions 1.4.1 and later.

       -enable_peer_stats
           Activates the collection of Rx statistics and allocates memory for their storage. For
           each connection with a specific UDP port on another machine, a separate record is kept
           for each type of RPC (FetchFile, GetStatus, and so on) sent or received. To display or
           otherwise access the records, use the Rx Monitoring API.

       -enable_process_stats
           Activates the collection of Rx statistics and allocates memory for their storage. A
           separate record is kept for each type of RPC (FetchFile, GetStatus, and so on) sent or
           received, aggregated over all connections to other machines. To display or otherwise
           access the records, use the Rx Monitoring API.

       -syslog [<loglevel]
           Use syslog instead of the normal logging location for the fileserver process.  If
           provided, log messages are at <loglevel> instead of the default LOG_USER.

       -mrafslogs
           Use MR-AFS (Multi-Resident) style logging.  This option is deprecated.

       -saneacls
           Offer the SANEACLS capability for the fileserver.  This option is currently
           unimplemented.

       -help
           Prints the online help for this command. All other valid options are ignored.

       -vhandle-setaside <fds reserved for non-cache io>
           Number of file handles set aside for I/O not in the cache. Defaults to 128.

       -vhandle-max-cachesize <max open files>
           Maximum number of available file handles.

       -vhandle-initial-cachesize <initial open file cache>
           Number of file handles set aside for I/O in the cache. Defaults to 128.

       -vattachpar <number of volume attach threads>
           The number of threads assigned to attach and detach volumes.  The default is 1.
           Warning: many of the I/O parallelism features of Demand-Attach Fileserver are turned
           off when the number of volume attach threads is only 1.

           This option is only meaningful for a file server built with pthreads support.

       -m <min percentage spare in partition>
           Specifies the percentage of each AFS server partition that the AIX version of the File
           Server creates as a reserve. Specify an integer value between 0 and 30; the default is
           8%. A value of 0 means that the partition can become completely full, which can have
           serious negative consequences.  This option is not supported on platforms other than
           AIX.

       -lock
           Prevents any portion of the fileserver binary from being paged (swapped) out of memory
           on a file server machine running the IRIX operating system.  This option is not
           supported on platforms other than IRIX.

       -sync <always | delayed | onclose | never>
           This option changes how hard the fileserver tries to ensure that data written to
           volumes actually hits the physical disk.

           Normally, when the fileserver writes to disk, the underlying filesystem or Operating
           System may delay writes from actually going to disk, and reorder which writes hit the
           disk first. So, during an unclean shutdown of the machine (if the power goes out, or
           the machine crashes, etc), or if the physical disk backing store becomes unavailable,
           file data may become lost that the server previously told clients was already
           successfully written.

           To try to mitigate this, the fileserver will try to "sync" file data to the physical
           disk at numerous points during various I/O. However, this can result in significantly
           reduced performance. Depending on the usage patterns, this may or may not be
           acceptable. This option dictates specifically what the fileserver does when it wants
           to perform a "sync".

           There are several options; pass one of these as the argument to -sync. The default is
           "onclose".

           always
               This causes a sync operation to always sync immediately and synchronously.  This
               is the slowest option that provides the greatest protection against data loss in
               the event of a crash or backing store unavailability.

               Note that this is still not a 100% guarantee that data will not be lost or
               corrupted during a crash. The underlying filesystem itself may cause data to be
               lost or corrupt in such a situation. And OpenAFS itself does not (yet) even
               guarantee that all data is consistent at any point in time; so even if the
               filesystem and OS do not buffer or reorder any writes, you are not guaranteed that
               all data will be okay after a crash.

               This option may be appropriate if you have reason to believe a server is prone to
               data loss failures, such as if the server encounters frequent power failures or
               connectivity issues with network attached storage. Or if the backend storage is
               temporarily degraded in some way (for example, a battery on a caching controller
               fails), it may make sense to temporarily use the "always" option until the
               situation is fixed. Some servers may also allow for sync operations to occur very
               quickly, such that the "always" option is not noticeably slower than any other
               option. In such a case, there is no downside to specifying "always".

               This was the only behavior allowed in OpenAFS releases prior to 1.4.5.

           delayed
               This causes a sync to do nothing immediately, but the sync will happen sometime in
               the background, within approximately the next 10 seconds. This works by having a
               separate thread that goes through all open file handles every 10 seconds, and it
               syncs the ones that have been marked as needing a sync. File handles flagged for
               sync may also get synced on volume detachment, according to the same behavior as
               with the "onclose" option.

               This option is currently not recommended, since in the past the code implementing
               this option has caused rare data corruption during normal operation.

               This was the only behavior allowed in OpenAFS releases starting from 1.4.5 up to
               and including 1.6.2. It was the default starting from OpenAFS 1.6.3 up to and
               including OpenAFS 1.6.7. This option will be removed in a future version of
               OpenAFS.

           onclose
               This causes a sync to do nothing immediately, but causes the relevant file to be
               flagged as potentially needing a sync. When a volume is detached, flagged volume
               metadata files are synced, as well as data files that have been accessed recently.
               Events that cause a volume to detach include: performing certain volume operations
               (restore, salvage, offline, et al), detection of volume consistency errors, a
               clean shutdown of the fileserver, or during DAFS "soft detachment".

               Effectively this option is the same as "never" while a volume is attached and
               actively being used, but if a volume is detached, there is an additional guarantee
               for the data's consistency.

               This option is the default starting with OpenAFS 1.6.8.

           never
               This causes all syncs to never do anything. This is the fastest option, with the
               weakest guarantees for data consistency.

               Depending on the underlying filesystem and Operating System, there may be
               guarantees that any data written to disk will hit the physical media after a
               certain amount of time. For example, Linux's pdflush process usually makes this
               guarantee, and ext3 can make certain various consistency guarantees according to
               the options given. ZFS on Solaris can also provide similar guarantees, as can
               various other platforms and filesystems. Consult the documentation for your
               platform if you are unsure.

           Which option you choose is not an easy decision to make. Various developers and
           experts sometimes disagree on which option is the most reasonable, and it may depend
           on the specific scenario and workload involved. Some argue that the "always" option
           does not provide significantly greater guarantees over any other option, whereas
           others argue that choosing anything besides the "always" option allows for an
           unacceptable risk of data loss. This may depend on your usage patterns, your hardware,
           your platform and filesystem, and who you talk to about this topic.

       -offline-timeout <timeout in seconds>
           Setting this option to N means that if any clients are reading from a volume when we
           want to offline that volume (for example, as part of releasing a volume), we will wait
           N seconds for the clients' request to finish. If the clients' requests have not
           finished, we will then interrupt the client requests and send an error to those
           clients, allowing the volume to go offline.

           If a client is interrupted, from the client's point of view, it will appear as if they
           had accessed the volume after it had gone offline. For RO volumes, this mean the
           client should fail-over to other valid RO sites for that volume. This option may speed
           up volume releases if volumes are being accessed by clients that have slow or
           unreliable network connections.

           Setting this option to 0 means to interrupt clients immediately if a volume is waiting
           to go offline. Setting this option to "-1" means to wait forever for client requests
           to finish. The default value is "-1".

           For the LWP fileserver, the only valid value for this option is "-1".

       -offline-shutdown-timeout <timeout in seconds>
           This option behaves similarly to -offline-timeout but applies to volumes that are
           going offline as part of the fileserver shutdown process. If the value specified is N,
           we will interrupt any clients reading from volumes after N seconds have passed since
           we first needed to wait for a volume to offline during the shutdown process.

           Setting this option to 0 means to interrupt all clients reading from volumes
           immediately during the shutdown process. Setting this option to "-1" means to wait
           forever for client requests to finish during the shutdown process.

           If -offline-timeout is specified, the default value of -offline-shutdown-timeout is
           the value specified for -offline-timeout. Otherwise, the default value is "-1".

           For the LWP fileserver, the only valid value for this option is "-1".

EXAMPLES

       The following bos create command creates a traditional fs process on the file server
       machine "fs2.abc.com" that uses the large configuration size, and allows volumes to exceed
       their quota by 10%. Type the command on a single line:

          % bos create -server fs2.abc.com -instance fs -type fs \
                       -cmd "/usr/lib/openafs/fileserver -pctspare 10 -L" \
                       /usr/lib/openafs/volserver /usr/lib/openafs/salvager

TROUBLESHOOTING

       Sending process signals to the File Server Process can change its behavior in the
       following ways:

         Process          Signal       OS     Result
         ---------------------------------------------------------------------

         File Server      XCPU        Unix    Prints a list of client IP
                                              Addresses.

         File Server      USR2      Windows   Prints a list of client IP
                                              Addresses.

         File Server      POLL        HPUX    Prints a list of client IP
                                              Addresses.

         Any server       TSTP        Any     Increases Debug level by a power
                                              of 5 -- 1,5,25,125, etc.
                                              This has the same effect as the
                                              -d XXX command-line option.

         Any Server       HUP         Any     Resets Debug level to 0

         File Server      TERM        Any     Run minor instrumentation over
                                              the list of descriptors.

         Other Servers    TERM        Any     Causes the process to quit.

         File Server      QUIT        Any     Causes the File Server to Quit.
                                              Bos Server knows this.

       The basic metric of whether an AFS file server is doing well is the number of connections
       waiting for a thread, which can be found by running the following command:

          % rxdebug <server> | grep waiting_for | wc -l

       Each line returned by "rxdebug" that contains the text "waiting_for" represents a
       connection that's waiting for a file server thread.

       If the blocked connection count is ever above 0, the server is having problems replying to
       clients in a timely fashion.  If it gets above 10, roughly, there will be noticeable
       slowness by the user.  The total number of connections is a mostly irrelevant number that
       goes essentially monotonically for as long as the server has been running and then goes
       back down to zero when it's restarted.

       The most common cause of blocked connections rising on a server is some process somewhere
       performing an abnormal number of accesses to that server and its volumes.  If multiple
       servers have a blocked connection count, the most likely explanation is that there is a
       volume replicated between those servers that is absorbing an abnormally high access rate.

       To get an access count on all the volumes on a server, run:

          % vos listvol <server> -long

       and save the output in a file.  The results will look like a bunch of vos examine output
       for each volume on the server.  Look for lines like:

          40065 accesses in the past day (i.e., vnode references)

       and look for volumes with an abnormally high number of accesses.  Anything over 10,000 is
       fairly high, but some volumes like root.cell and other volumes close to the root of the
       cell will have that many hits routinely.  Anything over 100,000 is generally abnormally
       high.  The count resets about once a day.

       Another approach that can be used to narrow the possibilities for a replicated volume,
       when multiple servers are having trouble, is to find all replicated volumes for that
       server.  Run:

          % vos listvldb -server <server>

       where <server> is one of the servers having problems to refresh the VLDB cache, and then
       run:

          % vos listvldb -server <server> -part <partition>

       to get a list of all volumes on that server and partition, including every other server
       with replicas.

       Once the volume causing the problem has been identified, the best way to deal with the
       problem is to move that volume to another server with a low load or to stop any runaway
       programs that are accessing that volume unnecessarily.  Often the volume will be enough
       information to tell what's going on.

       If you still need additional information about who's hitting that server, sometimes you
       can guess at that information from the failed callbacks in the FileLog log in /var/log/afs
       on the server, or from the output of:

          % /usr/afsws/etc/rxdebug <server> -rxstats

       but the best way is to turn on debugging output from the file server.  (Warning: This
       generates a lot of output into FileLog on the AFS server.)  To do this, log on to the AFS
       server, find the PID of the fileserver process, and do:

           kill -TSTP <pid>

       where <pid> is the PID of the file server process.  This will raise the debugging level so
       that you'll start seeing what people are actually doing on the server.  You can do this up
       to three more times to get even more output if needed.  To reset the debugging level back
       to normal, use (The following command will NOT terminate the file server):

           kill -HUP <pid>

       The debugging setting on the File Server should be reset back to normal when debugging is
       no longer needed.  Otherwise, the AFS server may well fill its disks with debugging
       output.

       The lines of the debugging output that are most useful for debugging load problems are:

           SAFS_FetchStatus,  Fid = 2003828163.77154.82248, Host 171.64.15.76
           SRXAFS_FetchData, Fid = 2003828163.77154.82248

       (The example above is partly truncated to highlight the interesting information).  The Fid
       identifies the volume and inode within the volume; the volume is the first long number.
       So, for example, this was:

          % vos examine 2003828163
          pubsw.matlab61                   2003828163 RW    1040060 K  On-line
              afssvr5.Stanford.EDU /vicepa
              RWrite 2003828163 ROnly 2003828164 Backup 2003828165
              MaxQuota    3000000 K
              Creation    Mon Aug  6 16:40:55 2001
              Last Update Tue Jul 30 19:00:25 2002
              86181 accesses in the past day (i.e., vnode references)

              RWrite: 2003828163    ROnly: 2003828164    Backup: 2003828165
              number of sites -> 3
                 server afssvr5.Stanford.EDU partition /vicepa RW Site
                 server afssvr11.Stanford.EDU partition /vicepd RO Site
                 server afssvr5.Stanford.EDU partition /vicepa RO Site

       and from the Host information one can tell what system is accessing that volume.

       Note that the output of vos_examine(1) also includes the access count, so once the problem
       has been identified, vos examine can be used to see if the access count is still
       increasing.  Also remember that you can run vos examine on the read-only replica (e.g.,
       pubsw.matlab61.readonly) to see the access counts on the read-only replica on all of the
       servers that it's located on.

PRIVILEGE REQUIRED

       The issuer must be logged in as the superuser "root" on a file server machine to issue the
       command at a command shell prompt.  It is conventional instead to create and start the
       process by issuing the bos create command.

SEE ALSO

       BosConfig(5), FileLog(5), bos_create(8), bos_getlog(8), fs_setacl(1), msgget(2),
       msgrcv(2), salvager(8), volserver(8), vos_examine(1)

COPYRIGHT

       IBM Corporation 2000. <http://www.ibm.com/> All Rights Reserved.

       This documentation is covered by the IBM Public License Version 1.0.  It was converted
       from HTML to POD by software written by Chas Williams and Russ Allbery, based on work by
       Alf Wachsmann and Elizabeth Cassell.