Provided by: scalapack-doc_1.5-10_all bug

NAME

       PDLAHQR  - i an auxiliary routine used to find the Schur decomposition  and or eigenvalues
       of a matrix already in Hessenberg form from  cols ILO to IHI

SYNOPSIS

       SUBROUTINE PDLAHQR( WANTT, WANTZ, N, ILO, IHI, A, DESCA, WR, WI,  ILOZ,  IHIZ,  Z,  DESCZ,
                           WORK, LWORK, IWORK, ILWORK, INFO )

           LOGICAL         WANTT, WANTZ

           INTEGER         IHI, IHIZ, ILO, ILOZ, ILWORK, INFO, LWORK, N, ROTN

           INTEGER         DESCA( * ), DESCZ( * ), IWORK( * )

           DOUBLE          PRECISION A( * ), WI( * ), WORK( * ), WR( * ), Z( * )

PURPOSE

       PDLAHQR is an auxiliary routine used to find the Schur decomposition
         and or eigenvalues of a matrix already in Hessenberg form from
         cols ILO to IHI.

       Notes
       =====

       Each  global  data  object  is described by an associated description vector.  This vector
       stores the information required to establish the mapping between an object element and its
       corresponding process and memory location.

       Let  A be a generic term for any 2D block cyclicly distributed array.  Such a global array
       has an associated description vector DESCA.  In the following comments,  the  character  _
       should be read as "of the global array".

       NOTATION        STORED IN      EXPLANATION
       ---------------   --------------   --------------------------------------  DTYPE_A(global)
       DESCA( DTYPE_ )The descriptor type.  In this case,
                                      DTYPE_A = 1.
       CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating
                                      the BLACS process grid A is distribu-
                                      ted over. The context itself is glo-
                                      bal, but the handle (the integer
                                      value) may vary.
       M_A    (global) DESCA( M_ )    The number of rows in the global
                                      array A.
       N_A    (global) DESCA( N_ )    The number of columns in the global
                                      array A.
       MB_A   (global) DESCA( MB_ )   The blocking factor used to distribute
                                      the rows of the array.
       NB_A   (global) DESCA( NB_ )   The blocking factor used to distribute
                                      the columns of the array.
       RSRC_A (global) DESCA( RSRC_ ) The process row over which the first
                                      row of the array A is distributed.  CSRC_A (global)  DESCA(
       CSRC_ ) The process column over which the
                                      first column of the array A is
                                      distributed.
       LLD_A  (local)  DESCA( LLD_ )  The leading dimension of the local
                                      array.  LLD_A >= MAX(1,LOCr(M_A)).

       Let  K  be  the  number  of  rows  or columns of a distributed matrix, and assume that its
       process grid has dimension p x q.
       LOCr( K ) denotes the number of elements of K that a  process  would  receive  if  K  were
       distributed over the p processes of its process column.
       Similarly, LOCc( K ) denotes the number of elements of K that a process would receive if K
       were distributed over the q processes of its process row.
       The values of LOCr() and LOCc() may be  determined  via  a  call  to  the  ScaLAPACK  tool
       function, NUMROC:
               LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ),
               LOCc(  N  )  =  NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ).  An upper bound for these
       quantities may be computed by:
               LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A
               LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A

ARGUMENTS

       WANTT   (global input) LOGICAL
               = .TRUE. : the full Schur form T is required;
               = .FALSE.: only eigenvalues are required.

       WANTZ   (global input) LOGICAL
               = .TRUE. : the matrix of Schur vectors Z is required;
               = .FALSE.: Schur vectors are not required.

       N       (global input) INTEGER
               The order of the Hessenberg matrix A (and Z if WANTZ).  N >= 0.

       ILO     (global input) INTEGER
               IHI     (global input) INTEGER It is  assumed  that  A  is  already  upper  quasi-
               triangular  in  rows  and columns IHI+1:N, and that A(ILO,ILO-1) = 0 (unless ILO =
               1). PDLAHQR works primarily with the Hessenberg submatrix in rows and columns  ILO
               to  IHI,  but applies transformations to all of H if WANTT is .TRUE..  1 <= ILO <=
               max(1,IHI); IHI <= N.

       A       (global input/output) DOUBLE PRECISION array, dimension
               (DESCA(LLD_),*) On entry, the upper Hessenberg matrix A.  On  exit,  if  WANTT  is
               .TRUE.,  A  is upper quasi-triangular in rows and columns ILO:IHI, with any 2-by-2
               or larger diagonal blocks not yet in standard  form.  If  WANTT  is  .FALSE.,  the
               contents of A are unspecified on exit.

       DESCA   (global and local input) INTEGER array of dimension DLEN_.
               The array descriptor for the distributed matrix A.

       WR      (global replicated output) DOUBLE PRECISION array,
               dimension (N) WI      (global replicated output) DOUBLE PRECISION array, dimension
               (N) The real and imaginary parts, respectively, of the computed eigenvalues ILO to
               IHI  are stored in the corresponding elements of WR and WI. If two eigenvalues are
               computed as a complex conjugate pair, they are stored in consecutive  elements  of
               WR  and  WI, say the i-th and (i+1)th, with WI(i) > 0 and WI(i+1) < 0. If WANTT is
               .TRUE., the eigenvalues are stored in the same order as on  the  diagonal  of  the
               Schur form returned in A.  A may be returned with larger diagonal blocks until the
               next release.

       ILOZ    (global input) INTEGER
               IHIZ    (global input) INTEGER Specify the rows of Z to which transformations must
               be applied if WANTZ is .TRUE..  1 <= ILOZ <= ILO; IHI <= IHIZ <= N.

       Z       (global input/output) DOUBLE PRECISION array.
               If   WANTZ   is  .TRUE.,  on  entry  Z  must  contain  the  current  matrix  Z  of
               transformations  accumulated  by  PDHSEQR,  and  on  exit  Z  has  been   updated;
               transformations  are applied only to the submatrix Z(ILOZ:IHIZ,ILO:IHI).  If WANTZ
               is .FALSE., Z is not referenced.

       DESCZ   (global and local input) INTEGER array of dimension DLEN_.
               The array descriptor for the distributed matrix Z.

       WORK    (local output) DOUBLE PRECISION array of size LWORK
               (Unless LWORK=-1, in which case WORK must be at least size 1)

       LWORK   (local input) INTEGER
               WORK(LWORK) is a local array and LWORK is assumed big enough so that LWORK >=  3*N
               + MAX( 2*MAX(DESCZ(LLD_),DESCA(LLD_)) + 2*LOCc(N), 7*Ceil(N/HBL)/LCM(NPROW,NPCOL))
               + MAX( 2*N, (8*LCM(NPROW,NPCOL)+2)**2 ) If LWORK=-1, then WORK(1) gets set to  the
               above number and the code returns immediately.

       IWORK   (global and local input) INTEGER array of size ILWORK
               This  will  hold  some of the IBLK integer arrays.  This is held as a place holder
               for a future release.  Currently unreferenced.

       ILWORK  (local input) INTEGER
               This will hold the size of the IWORK array.  This is held as a place holder for  a
               future release.  Currently unreferenced.

       INFO    (global output) INTEGER
               < 0: parameter number -INFO incorrect or inconsistent
               = 0: successful exit
               >  0:  PDLAHQR  failed  to  compute  all  the eigenvalues ILO to IHI in a total of
               30*(IHI-ILO+1) iterations; if INFO = i, elements i+1:ihi  of  WR  and  WI  contain
               those eigenvalues which have been successfully computed.

               Logic:  This  algorithm  is  very  similar  to  _LAHQR.  Unlike _LAHQR, instead of
               sending one double shift through the largest unreduced submatrix,  this  algorithm
               sends  multiple  double  shifts  and  spaces  them  apart  so  that  there  can be
               parallelism across several processor row/columns.  Another critical difference  is
               that  this  algorithm  aggregrates  multiple transforms together in order to apply
               them in a block fashion.

               Important Local Variables: IBLK =  The  maximum  number  of  bulges  that  can  be
               computed.   Currently  fixed.   Future  releases  this won't be fixed.  HBL  = The
               square block size (HBL=DESCA(MB_)=DESCA(NB_)) ROTN = The number of  transforms  to
               block together NBULGE = The number of bulges that will be attempted on the current
               submatrix.  IBULGE = The current number of  bulges  started.   K1(*),K2(*)  =  The
               current bulge loops from K1(*) to K2(*).

               Subroutines:  From  LAPACK,  this  routine  calls: DLAHQR     -> Serial QR used to
               determine  shifts  and  eigenvalues  DLARFG      ->  Determine   the   Householder
               transforms

               This ScaLAPACK, this routine calls: PDLACONSB  -> To determine where to start each
               iteration DLAMSH     -> Sends multiple shifts through a small submatrix to see how
               the  consecutive subdiagonals change (if PDLACONSB indicates we can start a run in
               the middle) PDLAWIL    -> Given the shift, get the  transformation  DLASORTE    ->
               Pair  up  eigenvalues  so  that reals are paired.  PDLACP3    -> Parallel array to
               local replicated array copy & back.  DLAREF     -> Row/column  reflector  applier.
               Core routine here.  PDLASMSUB  -> Finds negligible subdiagonal elements.

               Current  Notes  and/or  Restrictions: 1.) This code requires the distributed block
               size to be square and at least  six  (6);  unlike  simpler  codes  like  LU,  this
               algorithm  is  extremely  sensitive  to block size.  Unwise choices of too small a
               block size can lead to bad performance.  2.) This code requires  A  and  Z  to  be
               distributed  identically and have identical contxts.  A future version may allow Z
               to have a different contxt to 1D row map it to all nodes (so no communication on Z
               is  necessary.)   3.) This release currently does not have a routine for resolving
               the Schur blocks into regular 2x2 form after this code is completed.   Because  of
               this,  a significant performance impact is required while the deflation is done by
               sometimes a single column of processors.  4.) This code does not  currently  block
               the  initial  transforms  so  that  none  of the rows or columns for any bulge are
               completed until all are started.  To offset pipeline start-up  it  is  recommended
               that  at  least  2*LCM(NPROW,NPCOL)  bulges are used (if possible) 5.) The maximum
               number of bulges currently supported is fixed at 32.  In future versions this will
               be limited only by the incoming WORK and IWORK array.  6.) The matrix A must be in
               upper Hessenberg form.   If  elements  below  the  subdiagonal  are  nonzero,  the
               resulting transforms may be nonsimilar.  This is also true with the LAPACK routine
               DLAHQR.  7.) For this release, this code has only been tested  for  RSRC_=CSRC_=0,
               but  it has been written for the general case.  8.) Currently, all the eigenvalues
               are distributed to all the nodes.  Future releases will  probably  distribute  the
               eigenvalues  by  the  column  partitioning.  9.) The internals of this routine are
               subject to change.  10.) To  optimize  this  for  your  architecture,  try  tuning
               DLAREF.   11.)  This  code  has only been tested for WANTZ = .TRUE. and may behave
               unpredictably for WANTZ set to .FALSE.

               Implemented by:  G. Henry, May 1, 1997