Provided by: scalapack-doc_1.5-11_all bug

NAME

       PDLACONSB  -  look for two consecutive small subdiagonal elements by  seeing the effect of
       starting a double shift QR iteration  given by H44, H33, & H43H34 and see  if  this  would
       make a  subdiagonal negligible

SYNOPSIS

       SUBROUTINE PDLACONSB( A, DESCA, I, L, M, H44, H33, H43H34, BUF, LWORK )

           INTEGER           I, L, LWORK, M

           DOUBLE            PRECISION H33, H43H34, H44

           INTEGER           DESCA( * )

           DOUBLE            PRECISION A( * ), BUF( * )

PURPOSE

       PDLACONSB looks for two consecutive small subdiagonal elements by
          seeing the effect of starting a double shift QR iteration
          given by H44, H33, & H43H34 and see if this would make a
          subdiagonal negligible.

       Notes
       =====

       Each  global  data  object  is described by an associated description vector.  This vector
       stores the information required to establish the mapping between an object element and its
       corresponding process and memory location.

       Let  A be a generic term for any 2D block cyclicly distributed array.  Such a global array
       has an associated description vector DESCA.  In the following comments,  the  character  _
       should be read as "of the global array".

       NOTATION        STORED IN      EXPLANATION
       ---------------   --------------   --------------------------------------  DTYPE_A(global)
       DESCA( DTYPE_ )The descriptor type.  In this case,
                                      DTYPE_A = 1.
       CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating
                                      the BLACS process grid A is distribu-
                                      ted over. The context itself is glo-
                                      bal, but the handle (the integer
                                      value) may vary.
       M_A    (global) DESCA( M_ )    The number of rows in the global
                                      array A.
       N_A    (global) DESCA( N_ )    The number of columns in the global
                                      array A.
       MB_A   (global) DESCA( MB_ )   The blocking factor used to distribute
                                      the rows of the array.
       NB_A   (global) DESCA( NB_ )   The blocking factor used to distribute
                                      the columns of the array.
       RSRC_A (global) DESCA( RSRC_ ) The process row over which the first
                                      row of the array A is distributed.  CSRC_A (global)  DESCA(
       CSRC_ ) The process column over which the
                                      first column of the array A is
                                      distributed.
       LLD_A  (local)  DESCA( LLD_ )  The leading dimension of the local
                                      array.  LLD_A >= MAX(1,LOCr(M_A)).

       Let  K  be  the  number  of  rows  or columns of a distributed matrix, and assume that its
       process grid has dimension p x q.
       LOCr( K ) denotes the number of elements of K that a  process  would  receive  if  K  were
       distributed over the p processes of its process column.
       Similarly, LOCc( K ) denotes the number of elements of K that a process would receive if K
       were distributed over the q processes of its process row.
       The values of LOCr() and LOCc() may be  determined  via  a  call  to  the  ScaLAPACK  tool
       function, NUMROC:
               LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ),
               LOCc(  N  )  =  NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ).  An upper bound for these
       quantities may be computed by:
               LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A
               LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A

ARGUMENTS

       A       (global input) DOUBLE PRECISION array, dimension
               (DESCA(LLD_),*) On entry, the Hessenberg matrix whose tridiagonal  part  is  being
               scanned.  Unchanged on exit.

       DESCA   (global and local input) INTEGER array of dimension DLEN_.
               The array descriptor for the distributed matrix A.

       I       (global input) INTEGER
               The  global  location of the bottom of the unreduced submatrix of A.  Unchanged on
               exit.

       L       (global input) INTEGER
               The global location of the top of the unreduced  submatrix  of  A.   Unchanged  on
               exit.

       M       (global output) INTEGER
               On  exit,  this  yields  the  starting location of the QR double shift.  This will
               satisfy: L <= M  <= I-2.

               H44 H33 H43H34  (global input) DOUBLE PRECISION These three  values  are  for  the
               double shift QR iteration.

       BUF     (local output) DOUBLE PRECISION array of size LWORK.

       LWORK   (global input) INTEGER
               On  exit,  LWORK  is  the  size of the work buffer.  This must be at least 7*Ceil(
               Ceil( (I-L)/HBL ) / LCM(NPROW,NPCOL) ) Here LCM  is  least  common  multiple,  and
               NPROWxNPCOL is the logical grid size.

               Logic: ======

               Two  consecutive  small  subdiagonal  elements  will stall convergence of a double
               shift if their product is small relatively even if each is not very  small.   Thus
               it  is  necessary  to scan the "tridiagonal portion of the matrix."  In the LAPACK
               algorithm  DLAHQR,  a  loop  of  M  goes  from  I-2  down  to   L   and   examines
               H(m,m),H(m+1,m+1),H(m+1,m),H(m,m+1),H(m-1,m-1),H(m,m-1),  and  H(m+2,m-1).   Since
               these elements may be on separate processors, the first major loop (10) goes  over
               the  tridiagonal  and has each node store whatever values of the 7 it has that the
               node owning H(m,m) does not.  This will occur on a border and  can  happen  in  no
               more  than 3 locations per block assuming square blocks.  There are 5 buffers that
               each node stores these values:  a buffer to send  diagonally  down  and  right,  a
               buffer  to send up, a buffer to send left, a buffer to send diagonally up and left
               and a buffer to send right.  Each of these  buffers  is  actually  stored  in  one
               buffer  BUF  where  BUF(ISTR1+1)  starts the first buffer, BUF(ISTR2+1) starts the
               second, etc..  After the values are stored, if there are any values  that  a  node
               needs,  they  will be sent and received.  Then the next major loop passes over the
               data and searches for two consecutive small subdiagonals.

               Notes:

               This routine does a global maximum and must be called by all processes.

               Implemented by:  G. Henry, November 17, 1996