Ubuntu Manpage: cciss_vol_status - show status of logical drives attached to HP Smartarray controllers

Provided by: cciss-vol-status_1.11-3_amd64

NAME

       cciss_vol_status - show status of logical drives attached to HP Smartarray controllers

SYNOPSIS

       cciss_vol_status [OPTION] [DEVICE]...

DESCRIPTION

       Shows the status of logical drives configured on HP Smartarray controllers.

OPTIONS

-p, --persnickety
Without this option, device nodes which can't be opened, or which are not found to be of the
correct device type are silently ignored. This lets you use wildcards, e.g.: cciss_vol_status
/dev/sg* /dev/cciss/c*d0, and the program will not complain as long as all devices which are found
to be of the correct type are found to be ok. However, you may wish to explicitly list the
devices you expect to be there, and be notified if they are not there (e.g. perhaps a PCI slot has
died, and the system has rebooted, so that what was once /dev/cciss/c1d0 is no longer there at
all). This option will cause the program to complain about any device node listed which does not
appear to be the right device type, or is not openable.

-C, --copyright
If stderr is a terminal, Print out a copyright message, and exit.

-q, --quiet
This option doesn't do anything. Previously, without this option and if stderr is a terminal, a
copyright message precedes the normal program output. Now, the copyright message is only printed
via the -C option.

-s Query each physical drive for S.M.A.R.T data and report any drives in "predictive failure" state.

-u, --try-unknown-devices
If a device has an unrecognized board ID, normally the program will not attempt to communicate
with it. In case you have some Smart Array controller which is newer than this program, the
program may not recognize it. This option permits the program to attempt to interrogate the board
even if it is unrecognized on the assumption that it is in fact a Smart Array of some kind.

-v, --version
Print the version number and exit.

-V, --verbose
Print out more information about the controllers and physical drives. For each controller, the
board ID, number of logical drives, currently running firmware revision and ROM firmware revision
are printed. For each physical drive, the location, vendor, model, serial number, and firmware
revision are printed.

-x, --exhaustive
Deprecated. Previously, it "exhaustively" searched for logical drives, as, under some
circumstances some logical drives might otherwise be missed. This option no longer does anything,
as the algorithm for finding logical drives was changed to obviate the need for it.

DEVICE

       The DEVICE argument indicates which RAID controller is to be queried.  Note, that it indicates which RAID
       controller, not which logical drive.

       For the cciss driver, the "d0" nodes matching "/dev/cciss/c*d0" are the nodes  which  correspond  to  the
       RAID  controllers.   (See note 1, below.)  It is not necessary to invoke cciss_vol_status on each logical
       drive individually, though if you do this, each time it will report the status of ALL logical  drives  on
       the controller.

       For  the  hpsa  driver,  or  for  fibre attached MSA1000 family devices, or for the hpahcisr sotware RAID
       driver which emulates Smart Arrays, the RAID controller is accessed via the scsi generic driver, and  the
       device nodes will match "/dev/sg*"   Some variants of the "lsscsi" tool will easily identify which device
       node   corresponds   to   the   RAID   controller.    Some  variants  may  only  report  the  SCSI  nexus
       (controller/bus/target/lun tuple.)  Some distros may not have the lsscsi tool.

       Executing the following query  to  the  /sys  filesystem  and  correlating  this  with  the  contents  of
       /proc/scsi/scsi  or  output  of  lsscsi  can  help  in  finding  the  right  /dev/sg  node  to  use  with
       cciss_vol_status:

       wumpus:/home/scameron # ls -l /sys/class/scsi_generic/*
       lrwxrwxrwx 1 root root 0 2009-11-18 12:31 /sys/class/scsi_generic/sg0 -> ../../devices/pci0000:00/0000:00:02.0/0000:02:00.0/0000:03:03.0/host0/target0:0:0/0:0:0:0/scsi_generic/sg0
       lrwxrwxrwx 1 root root 0 2009-11-18 12:31 /sys/class/scsi_generic/sg1 -> ../../devices/pci0000:00/0000:00:1f.1/host2/target2:0:0/2:0:0:0/scsi_generic/sg1
       lrwxrwxrwx 1 root root 0 2009-11-19 07:47 /sys/class/scsi_generic/sg2 -> ../../devices/pci0000:00/0000:00:05.0/0000:0e:00.0/host4/target4:3:0/4:3:0:0/scsi_generic/sg2
       wumpus:/home/scameron # cat /proc/scsi/scsi
       Attached devices:
       Host: scsi0 Channel: 00 Id: 00 Lun: 00
         Vendor: COMPAQ   Model: BD03685A24       Rev: HPB6
         Type:   Direct-Access                    ANSI  SCSI revision: 03
       Host: scsi2 Channel: 00 Id: 00 Lun: 00
         Vendor: SAMSUNG  Model: CD-ROM SC-148A   Rev: B408
         Type:   CD-ROM                           ANSI  SCSI revision: 05
       Host: scsi4 Channel: 03 Id: 00 Lun: 00
         Vendor: HP       Model: P800             Rev: 6.82
         Type:   RAID                             ANSI  SCSI revision: 00
       wumpus:/home/scameron # lsscsi
       [0:0:0:0]    disk    COMPAQ   BD03685A24       HPB6  /dev/sda
       [2:0:0:0]    cd/dvd  SAMSUNG  CD-ROM SC-148A   B408  /dev/sr0
       [4:3:0:0]    storage HP       P800             6.82  -

       From the above you can see that /dev/sg2 corresponds to SCSI nexus 4:3:0:0, which corresponds to  the  HP
       P800 RAID controller listed in /proc/scsi/scsi.

EXAMPLE

            [root@somehost]# cciss_vol_status -q /dev/cciss/c*d0
            /dev/cciss/c0d0: (Smart Array P800) RAID 0 Volume 0 status: OK.
            /dev/cciss/c0d0: (Smart Array P800) RAID 0 Volume 1 status: OK.
            /dev/cciss/c0d0: (Smart Array P800) RAID 1 Volume 2 status: OK.
            /dev/cciss/c0d0: (Smart Array P800) RAID 5 Volume 4 status: OK.
            /dev/cciss/c0d0: (Smart Array P800) RAID 5 Volume 5 status: OK.
            /dev/cciss/c0d0: (Smart Array P800) Enclosure MSA60 (S/N: USP6340B3F) on Bus 2, Physical Port 1E status: Power Supply Unit failed
            /dev/cciss/c1d0: (Smart Array P800) RAID 5 Volume 0 status: OK.
            /dev/cciss/c1d0: (Smart Array P800) RAID 5 Volume 1 status: OK.
            /dev/cciss/c1d0: (Smart Array P800) RAID 5 Volume 2 status: OK.
            /dev/cciss/c1d0: (Smart Array P800) RAID 5 Volume 3 status: OK.
            /dev/cciss/c1d0: (Smart Array P800) RAID 5 Volume 4 status: OK.
            /dev/cciss/c1d0: (Smart Array P800) RAID 5 Volume 5 status: OK.
            /dev/cciss/c1d0: (Smart Array P800) RAID 5 Volume 6 status: OK.
            /dev/cciss/c1d0: (Smart Array P800) RAID 5 Volume 7 status: OK.

            [root@someotherhost]# cciss_vol_status -q /dev/sg0 /dev/cciss/c*d0
            /dev/sg0: (MSA1000) RAID 1 Volume 0 status: OK.   At least one spare drive.
            /dev/sg0: (MSA1000) RAID 5 Volume 1 status: OK.
            /dev/cciss/c0d0: (Smart Array P800) RAID 0 Volume 0 status: OK.

            [root@localhost]# ./cciss_vol_status -s /dev/sg1
            /dev/sda: (Smart Array P410i) RAID 0 Volume 0 status: OK.
                  connector 1I box 1 bay 1                 HP      DG072A9BB7                               B365P6803PCP0633     HPD0 S.M.A.R.T. predictive failure.
            [root@localhost]# echo $?
            1

            [root@localhost]# ./cciss_vol_status -s /dev/cciss/c0d0
            /dev/cciss/c0d0: (Smart Array P800) RAID 0 Volume 0 status: OK.
                  connector 2E box 1 bay 8                 HP      DF300BB6C3                           3LM08AP700009713RXUT     HPD3 S.M.A.R.T. predictive failure.
            /dev/cciss/c0d0: (Smart Array P800) Enclosure MSA60 (S/N: USP6340B3F) on Bus 2, Physical Port 2E status: OK.

            [root@localhost cciss_vol_status]# ./cciss_vol_status --verbose /dev/sg0
            Controller: Smart Array P420i
              Board ID: 0x3354103c
              Logical drives: 1
              Running firmware: 3.42
              ROM firmware: 3.42
            /dev/sda: (Smart Array P420i) RAID 1 Volume 0 status: OK.
              Physical drives: 2
                  connector 1I box 2 bay 1                 HP      EG1200FCVBQ                                      KZG21NVD     HPD1 OK
                  connector 2I box 2 bay 5                 HP      EG1200FCVBQ                                      KZG20X7D     HPD1 OK
            /dev/sg0(Smart Array P420i:0): Non-Volatile Cache status:
                         Cache configured: Yes
                        Read cache memory: 81 MiB
                       Write cache memory: 735 MiB
                      Write cache enabled: Yes
               Flash backed cache present

DIAGNOSTICS

       Normally,  a  logical drive in good working order should report a status of "OK."  Possible status values
       are:

       "OK." (0) - The logical drive is in good working order.

       "FAILED." (1) - The logical drive has failed, and no i/o to it is poosible.
              Additionally, failed drives will be identified by connector, box  and  bay,  as  well  as  vendor,
              model, serial number, and firmware revision.

       "Using interim recovery mode." (3) - One or more drives has failed,
              but  not  so  many  that  the  logical  drive  can no longer operate.  The failed drives should be
              replaced as soon as possible.

       "Ready for recovery operation." (4) -  Failed drive(s) have been
              replaced, and the controller is about to begin rebuilding redundant parity data.

       "Currently recovering." (5) - Failed drive(s) have been replaced,
              and the controller is currently rebuilding redundant parity information.

       "Wrong physical drive was replaced." (6) - A drive has failed, and
              another (working) drive was replaced.

       "A physical drive is not properly connected." (7) - There is some
              cabling or backplane problem in the drive enclosure.

       (From fwspecwww.doc, see cpqarray project on sourceforge.net):
              Note: If the unit_status value is 6 (Wrong physical drive was replaced) or 7 (A physical drive  is
              not  properly connected), the unit_status of all other configured logical drives will be marked as
              1 (Logical drive failed). This is to force the user to correct the problem and to insure that once
              the problem is corrected, the data will not have been corrupted by any user action.

       "Hardware is overheating." (8) - Hardware is too hot.

       "Hardware was overheated." (9) - At some point in the past,
              the hardware got too hot.

       "Currently expannding." (10) - The controller is currently in the
              process of expanding a logical drive.

       "Not yet available." (11) - The logical drive is not yet finished
              being configured.

       "Queued for expansion." (12) - The logical drive will be expended
              when the controller is able to begin working on it.

       Additionally, the following messages may appear regarding spare drive status:

            "At least one spare drive designated"
            "At least one spare drive activated and currently rebuilding"
            "At least one activated on-line spare drive is completely rebuilt on this logical drive"
            "At least one spare drive has failed"
            "At least one spare drive activated"
            "At least one spare drive remains available"
       Active spares will be identified by connector, box and bay, as well
       as by vendor, model, serial number, and firmware revision.

       For each logical drive, the total number of failed physical drives, if more than zero, will  be  reported
       as:

                   "Total of n failed physical drives detected on this logical drive."

       with "n" replaced by the actual number, of course.

       "Replacement"  drives  --  newly  inserted  drives that replace a previously failed drive but are not yet
       finished rebuilding -- are also identified by connector, box and bay, as well as by vendor, model, serial
       number, and firmware revision.

       If the -s option is specified, each physical drive will be queried for S.M.A.R.T data, any any drives  in
       predictive  failure  state  will  be  reported,  identified by connector, box and bay, as well as vendor,
       model, serial number, and firmware revision.

       Additionally failure conditions of disk enclosure fans, power supplies, and temperature are  reported  as
       follows:

            "Fan failed"
            "Temperature problem"
            "Door alert"
            "Power Supply Unit failed"

FILES

       /dev/cciss/c*d0 (Smart Array PCI controllers using the cciss driver)
       /dev/sg*  (Fibre  attached  MSA1000  controllers  and  Smart  Array  controllers using the hpsa driver or
       hpahcisr software RAID driver.)

EXIT CODES

       0 - All configured logical drives queried have status of "OK."

       1 - One or more configured logical drives queried have status other than "OK."

BUGS

       MSA500 G1 logical drive numbers may not be reported correctly.

       I've seen enclosure serial numbers contain garbage.

       Some Smart Arrays support more than 128 physical drives on a single  RAID  controller.   cciss_vol_status
       does not.

AUTHOR

       Written by Stephen M. Cameron

REPORTING BUGS

       Report bugs to <scameron@beardog.cce.hp.com>

COPYRIGHT

       Copyright © 2007 Hewlett-Packard Development Company, L.P.
       This  is  free  software;  see  the  source  for  copying conditions.  There is NO warranty; not even for
       MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

NOTE 1

       The /dev/cciss/c*d0 device nodes of the cciss driver do double duty.  They serve as an  access  point  to
       both  the  RAID  controllers,  and  to  the  first  logical drive of each RAID controller.  Notice that a
       /dev/cciss/c*d0 node will be present for each controller even if no logical drives are configured on that
       controller.  It might be cleaner if the driver had a special device node just for the controller, instead
       of making these device nodes do double duty.  It has been like that since the 2.2 linux kernel timeframe.
       At that time, device major and minor nodes were statically allocated at compile time, and were  in  short
       supply.  Changing this behavior at this point would break lots of userland programs.

cciss_vol_status (ccissutils)                       May 2013                                 CCISS_VOL_STATUS(8)