Ubuntu Manpage: cciss_vol_status - show status of logical drives attached to HP Smartarray controllers

Provided by: cciss-vol-status_1.12-2_amd64

NAME

       cciss_vol_status - show status of logical drives attached to HP Smartarray controllers

SYNOPSIS

       cciss_vol_status [OPTION] [DEVICE]...

DESCRIPTION

       Shows the status of logical drives configured on HP Smartarray controllers.

OPTIONS

-p, --persnickety
Without this option, device nodes which can't be opened, or which are not found to
be of the correct device type are silently ignored. This lets you use wildcards,
e.g.: cciss_vol_status /dev/sg* /dev/cciss/c*d0, and the program will not complain
as long as all devices which are found to be of the correct type are found to be
ok. However, you may wish to explicitly list the devices you expect to be there,
and be notified if they are not there (e.g. perhaps a PCI slot has died, and the
system has rebooted, so that what was once /dev/cciss/c1d0 is no longer there at
all). This option will cause the program to complain about any device node listed
which does not appear to be the right device type, or is not openable.

-C, --copyright
If stderr is a terminal, Print out a copyright message, and exit.

-q, --quiet
This option doesn't do anything. Previously, without this option and if stderr is
a terminal, a copyright message precedes the normal program output. Now, the
copyright message is only printed via the -C option.

-s Query each physical drive for S.M.A.R.T data and report any drives in "predictive
failure" state.

-u, --try-unknown-devices
If a device has an unrecognized board ID, normally the program will not attempt to
communicate with it. In case you have some Smart Array controller which is newer
than this program, the program may not recognize it. This option permits the
program to attempt to interrogate the board even if it is unrecognized on the
assumption that it is in fact a Smart Array of some kind.

-v, --version
Print the version number and exit.

-V, --verbose
Print out more information about the controllers and physical drives. For each
controller, the board ID, number of logical drives, currently running firmware
revision and ROM firmware revision are printed. For each physical drive, the
location, vendor, model, serial number, and firmware revision are printed.

-x, --exhaustive
Deprecated. Previously, it "exhaustively" searched for logical drives, as, under
some circumstances some logical drives might otherwise be missed. This option no
longer does anything, as the algorithm for finding logical drives was changed to
obviate the need for it.

DEVICE

       The DEVICE argument indicates which RAID controller is  to  be  queried.   Note,  that  it
       indicates which RAID controller, not which logical drive.

       For  the  cciss  driver,  the  "d0"  nodes  matching "/dev/cciss/c*d0" are the nodes which
       correspond to the RAID controllers.  (See note 1, below.)  It is not necessary  to  invoke
       cciss_vol_status  on  each logical drive individually, though if you do this, each time it
       will report the status of ALL logical drives on the controller.

       For the hpsa driver, or for fibre attached MSA1000 family devices,  or  for  the  hpahcisr
       sotware  RAID  driver which emulates Smart Arrays, the RAID controller is accessed via the
       scsi generic driver, and the device nodes will match "/dev/sg*"    Some  variants  of  the
       "lsscsi"  tool  will easily identify which device node corresponds to the RAID controller.
       Some variants may only report the  SCSI  nexus  (controller/bus/target/lun  tuple.)   Some
       distros may not have the lsscsi tool.

       Executing  the  following  query  to  the  /sys  filesystem  and correlating this with the
       contents of /proc/scsi/scsi or output of lsscsi can help in finding the right /dev/sg node
       to use with cciss_vol_status:

       wumpus:/home/scameron # ls -l /sys/class/scsi_generic/*
       lrwxrwxrwx 1 root root 0 2009-11-18 12:31 /sys/class/scsi_generic/sg0 -> ../../devices/pci0000:00/0000:00:02.0/0000:02:00.0/0000:03:03.0/host0/target0:0:0/0:0:0:0/scsi_generic/sg0
       lrwxrwxrwx 1 root root 0 2009-11-18 12:31 /sys/class/scsi_generic/sg1 -> ../../devices/pci0000:00/0000:00:1f.1/host2/target2:0:0/2:0:0:0/scsi_generic/sg1
       lrwxrwxrwx 1 root root 0 2009-11-19 07:47 /sys/class/scsi_generic/sg2 -> ../../devices/pci0000:00/0000:00:05.0/0000:0e:00.0/host4/target4:3:0/4:3:0:0/scsi_generic/sg2
       wumpus:/home/scameron # cat /proc/scsi/scsi
       Attached devices:
       Host: scsi0 Channel: 00 Id: 00 Lun: 00
         Vendor: COMPAQ   Model: BD03685A24       Rev: HPB6
         Type:   Direct-Access                    ANSI  SCSI revision: 03
       Host: scsi2 Channel: 00 Id: 00 Lun: 00
         Vendor: SAMSUNG  Model: CD-ROM SC-148A   Rev: B408
         Type:   CD-ROM                           ANSI  SCSI revision: 05
       Host: scsi4 Channel: 03 Id: 00 Lun: 00
         Vendor: HP       Model: P800             Rev: 6.82
         Type:   RAID                             ANSI  SCSI revision: 00
       wumpus:/home/scameron # lsscsi
       [0:0:0:0]    disk    COMPAQ   BD03685A24       HPB6  /dev/sda
       [2:0:0:0]    cd/dvd  SAMSUNG  CD-ROM SC-148A   B408  /dev/sr0
       [4:3:0:0]    storage HP       P800             6.82  -

       From  the  above  you  can  see  that  /dev/sg2  corresponds  to SCSI nexus 4:3:0:0, which
       corresponds to the HP P800 RAID controller listed in /proc/scsi/scsi.

EXAMPLE

            [root@somehost]# cciss_vol_status -q /dev/cciss/c*d0
            /dev/cciss/c0d0: (Smart Array P800) RAID 0 Volume 0 status: OK.
            /dev/cciss/c0d0: (Smart Array P800) RAID 0 Volume 1 status: OK.
            /dev/cciss/c0d0: (Smart Array P800) RAID 1 Volume 2 status: OK.
            /dev/cciss/c0d0: (Smart Array P800) RAID 5 Volume 4 status: OK.
            /dev/cciss/c0d0: (Smart Array P800) RAID 5 Volume 5 status: OK.
            /dev/cciss/c0d0: (Smart Array P800) Enclosure MSA60 (S/N: USP6340B3F) on Bus 2, Physical Port 1E status: Power Supply Unit failed
            /dev/cciss/c1d0: (Smart Array P800) RAID 5 Volume 0 status: OK.
            /dev/cciss/c1d0: (Smart Array P800) RAID 5 Volume 1 status: OK.
            /dev/cciss/c1d0: (Smart Array P800) RAID 5 Volume 2 status: OK.
            /dev/cciss/c1d0: (Smart Array P800) RAID 5 Volume 3 status: OK.
            /dev/cciss/c1d0: (Smart Array P800) RAID 5 Volume 4 status: OK.
            /dev/cciss/c1d0: (Smart Array P800) RAID 5 Volume 5 status: OK.
            /dev/cciss/c1d0: (Smart Array P800) RAID 5 Volume 6 status: OK.
            /dev/cciss/c1d0: (Smart Array P800) RAID 5 Volume 7 status: OK.

            [root@someotherhost]# cciss_vol_status -q /dev/sg0 /dev/cciss/c*d0
            /dev/sg0: (MSA1000) RAID 1 Volume 0 status: OK.   At least one spare drive.
            /dev/sg0: (MSA1000) RAID 5 Volume 1 status: OK.
            /dev/cciss/c0d0: (Smart Array P800) RAID 0 Volume 0 status: OK.

            [root@localhost]# ./cciss_vol_status -s /dev/sg1
            /dev/sda: (Smart Array P410i) RAID 0 Volume 0 status: OK.
                  connector 1I box 1 bay 1                 HP      DG072A9BB7                               B365P6803PCP0633     HPD0 S.M.A.R.T. predictive failure.
            [root@localhost]# echo $?
            1

            [root@localhost]# ./cciss_vol_status -s /dev/cciss/c0d0
            /dev/cciss/c0d0: (Smart Array P800) RAID 0 Volume 0 status: OK.
                  connector 2E box 1 bay 8                 HP      DF300BB6C3                           3LM08AP700009713RXUT     HPD3 S.M.A.R.T. predictive failure.
            /dev/cciss/c0d0: (Smart Array P800) Enclosure MSA60 (S/N: USP6340B3F) on Bus 2, Physical Port 2E status: OK.

            [root@localhost cciss_vol_status]# ./cciss_vol_status --verbose /dev/sg0
            Controller: Smart Array P420i
              Board ID: 0x3354103c
              Logical drives: 1
              Running firmware: 3.42
              ROM firmware: 3.42
            /dev/sda: (Smart Array P420i) RAID 1 Volume 0 status: OK.
              Physical drives: 2
                  connector 1I box 2 bay 1                 HP      EG1200FCVBQ                                      KZG21NVD     HPD1 OK
                  connector 2I box 2 bay 5                 HP      EG1200FCVBQ                                      KZG20X7D     HPD1 OK
            /dev/sg0(Smart Array P420i:0): Non-Volatile Cache status:
                         Cache configured: Yes
                        Read cache memory: 81 MiB
                       Write cache memory: 735 MiB
                      Write cache enabled: Yes
               Flash backed cache present

DIAGNOSTICS

       Normally, a logical drive in good working order should report a status of "OK."   Possible
       status values are:

       "OK." (0) - The logical drive is in good working order.

       "FAILED." (1) - The logical drive has failed, and no i/o to it is poosible.
              Additionally,  failed  drives will be identified by connector, box and bay, as well
              as vendor, model, serial number, and firmware revision.

       "Using interim recovery mode." (3) - One or more drives has failed,
              but not so many that the logical drive can no longer operate.   The  failed  drives
              should be replaced as soon as possible.

       "Ready for recovery operation." (4) -  Failed drive(s) have been
              replaced, and the controller is about to begin rebuilding redundant parity data.

       "Currently recovering." (5) - Failed drive(s) have been replaced,
              and the controller is currently rebuilding redundant parity information.

       "Wrong physical drive was replaced." (6) - A drive has failed, and
              another (working) drive was replaced.

       "A physical drive is not properly connected." (7) - There is some
              cabling or backplane problem in the drive enclosure.

       (From fwspecwww.doc, see cpqarray project on sourceforge.net):
              Note:  If  the  unit_status  value is 6 (Wrong physical drive was replaced) or 7 (A
              physical drive is not properly connected), the unit_status of all other  configured
              logical  drives  will  be  marked as 1 (Logical drive failed). This is to force the
              user to correct the problem and to insure that once the problem is  corrected,  the
              data will not have been corrupted by any user action.

       "Hardware is overheating." (8) - Hardware is too hot.

       "Hardware was overheated." (9) - At some point in the past,
              the hardware got too hot.

       "Currently expannding." (10) - The controller is currently in the
              process of expanding a logical drive.

       "Not yet available." (11) - The logical drive is not yet finished
              being configured.

       "Queued for expansion." (12) - The logical drive will be expended
              when the controller is able to begin working on it.

       Additionally, the following messages may appear regarding spare drive status:

            "At least one spare drive designated"
            "At least one spare drive activated and currently rebuilding"
            "At least one activated on-line spare drive is completely rebuilt on this logical drive"
            "At least one spare drive has failed"
            "At least one spare drive activated"
            "At least one spare drive remains available"
       Active spares will be identified by connector, box and bay, as well
       as by vendor, model, serial number, and firmware revision.

       For  each  logical  drive,  the total number of failed physical drives, if more than zero,
       will be reported as:

                   "Total of n failed physical drives detected on this logical drive."

       with "n" replaced by the actual number, of course.

       "Replacement" drives -- newly inserted drives that replace a previously failed  drive  but
       are  not yet finished rebuilding -- are also identified by connector, box and bay, as well
       as by vendor, model, serial number, and firmware revision.

       If the -s option is specified, each physical drive will be queried for S.M.A.R.T data, any
       any  drives in predictive failure state will be reported, identified by connector, box and
       bay, as well as vendor, model, serial number, and firmware revision.

       Additionally failure conditions of disk enclosure fans, power  supplies,  and  temperature
       are reported as follows:

            "Fan failed"
            "Temperature problem"
            "Door alert"
            "Power Supply Unit failed"

FILES

       /dev/cciss/c*d0 (Smart Array PCI controllers using the cciss driver)
       /dev/sg*  (Fibre  attached  MSA1000 controllers and Smart Array controllers using the hpsa
       driver or hpahcisr software RAID driver.)

EXIT CODES

       0 - All configured logical drives queried have status of "OK."

       1 - One or more configured logical drives queried have status other than "OK."

BUGS

       MSA500 G1 logical drive numbers may not be reported correctly.

       I've seen enclosure serial numbers contain garbage.

       Some Smart Arrays support more than 128 physical  drives  on  a  single  RAID  controller.
       cciss_vol_status does not.

AUTHOR

       Written by Stephen M. Cameron

REPORTING BUGS

       Report bugs to <scameron@beardog.cce.hp.com>

COPYRIGHT

       Copyright © 2007 Hewlett-Packard Development Company, L.P.
       This  is  free software; see the source for copying conditions.  There is NO warranty; not
       even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

NOTE 1

       The /dev/cciss/c*d0 device nodes of the cciss driver do double duty.   They  serve  as  an
       access  point  to  both  the RAID controllers, and to the first logical drive of each RAID
       controller.  Notice that a /dev/cciss/c*d0 node will be present for each  controller  even
       if no logical drives are configured on that controller.  It might be cleaner if the driver
       had a special device node just for the controller, instead of making these device nodes do
       double  duty.   It has been like that since the 2.2 linux kernel timeframe.  At that time,
       device major and minor nodes were statically allocated at compile time, and were in  short
       supply.  Changing this behavior at this point would break lots of userland programs.