bionic (3) pslahqr.3.gz

Provided by: scalapack-doc_1.5-11_all bug

NAME

       PSLAHQR  -  i  an  auxiliary routine used to find the Schur decomposition  and or eigenvalues of a matrix
       already in Hessenberg form from  cols ILO to IHI

SYNOPSIS

       SUBROUTINE PSLAHQR( WANTT, WANTZ, N, ILO, IHI, A, DESCA, WR, WI,  ILOZ,  IHIZ,  Z,  DESCZ,  WORK,  LWORK,
                           IWORK, ILWORK, INFO )

           LOGICAL         WANTT, WANTZ

           INTEGER         IHI, IHIZ, ILO, ILOZ, ILWORK, INFO, LWORK, N, ROTN

           INTEGER         DESCA( * ), DESCZ( * ), IWORK( * )

           REAL            A( * ), WI( * ), WORK( * ), WR( * ), Z( * )

PURPOSE

       PSLAHQR is an auxiliary routine used to find the Schur decomposition
         and or eigenvalues of a matrix already in Hessenberg form from
         cols ILO to IHI.

       Notes
       =====

       Each  global  data  object  is  described  by  an  associated description vector.  This vector stores the
       information required to establish the mapping between an object element and its corresponding process and
       memory location.

       Let  A  be  a  generic  term  for  any  2D  block cyclicly distributed array.  Such a global array has an
       associated description vector DESCA.  In the following comments, the character _ should be  read  as  "of
       the global array".

       NOTATION        STORED IN      EXPLANATION
       ---------------  -------------- -------------------------------------- DTYPE_A(global) DESCA( DTYPE_ )The
       descriptor type.  In this case,
                                      DTYPE_A = 1.
       CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating
                                      the BLACS process grid A is distribu-
                                      ted over. The context itself is glo-
                                      bal, but the handle (the integer
                                      value) may vary.
       M_A    (global) DESCA( M_ )    The number of rows in the global
                                      array A.
       N_A    (global) DESCA( N_ )    The number of columns in the global
                                      array A.
       MB_A   (global) DESCA( MB_ )   The blocking factor used to distribute
                                      the rows of the array.
       NB_A   (global) DESCA( NB_ )   The blocking factor used to distribute
                                      the columns of the array.
       RSRC_A (global) DESCA( RSRC_ ) The process row over which the first
                                      row of the array A is distributed.  CSRC_A (global)  DESCA(  CSRC_  )  The
       process column over which the
                                      first column of the array A is
                                      distributed.
       LLD_A  (local)  DESCA( LLD_ )  The leading dimension of the local
                                      array.  LLD_A >= MAX(1,LOCr(M_A)).

       Let  K  be  the  number  of rows or columns of a distributed matrix, and assume that its process grid has
       dimension p x q.
       LOCr( K ) denotes the number of elements of K that a process would receive if K were distributed over the
       p processes of its process column.
       Similarly,  LOCc(  K  )  denotes  the  number  of  elements  of  K that a process would receive if K were
       distributed over the q processes of its process row.
       The values of LOCr() and LOCc() may be determined via a call to the ScaLAPACK tool function, NUMROC:
               LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ),
               LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ).  An upper bound for these quantities may  be
       computed by:
               LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A
               LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A

ARGUMENTS

       WANTT   (global input) LOGICAL
               = .TRUE. : the full Schur form T is required;
               = .FALSE.: only eigenvalues are required.

       WANTZ   (global input) LOGICAL
               = .TRUE. : the matrix of Schur vectors Z is required;
               = .FALSE.: Schur vectors are not required.

       N       (global input) INTEGER
               The order of the Hessenberg matrix A (and Z if WANTZ).  N >= 0.

       ILO     (global input) INTEGER
               IHI     (global input) INTEGER It is assumed that A is already upper quasi-triangular in rows and
               columns IHI+1:N, and that A(ILO,ILO-1) = 0 (unless ILO = 1). PSLAHQR  works  primarily  with  the
               Hessenberg  submatrix  in rows and columns ILO to IHI, but applies transformations to all of H if
               WANTT is .TRUE..  1 <= ILO <= max(1,IHI); IHI <= N.

       A       (global input/output) REAL array, dimension
               (DESCA(LLD_),*) On entry, the upper Hessenberg matrix A.  On exit, if WANTT is .TRUE., A is upper
               quasi-triangular  in  rows and columns ILO:IHI, with any 2-by-2 or larger diagonal blocks not yet
               in standard form. If WANTT is .FALSE., the contents of A are unspecified on exit.

       DESCA   (global and local input) INTEGER array of dimension DLEN_.
               The array descriptor for the distributed matrix A.

       WR      (global replicated output) REAL array, dimension (N)
               WI      (global replicated output) REAL array,  dimension  (N)  The  real  and  imaginary  parts,
               respectively,  of the computed eigenvalues ILO to IHI are stored in the corresponding elements of
               WR and WI. If two eigenvalues are computed as a  complex  conjugate  pair,  they  are  stored  in
               consecutive  elements  of WR and WI, say the i-th and (i+1)th, with WI(i) > 0 and WI(i+1) < 0. If
               WANTT is .TRUE., the eigenvalues are stored in the same order as on the  diagonal  of  the  Schur
               form returned in A.  A may be returned with larger diagonal blocks until the next release.

       ILOZ    (global input) INTEGER
               IHIZ     (global input) INTEGER Specify the rows of Z to which transformations must be applied if
               WANTZ is .TRUE..  1 <= ILOZ <= ILO; IHI <= IHIZ <= N.

       Z       (global input/output) REAL array.
               If WANTZ is .TRUE., on entry Z must contain the current matrix Z of  transformations  accumulated
               by  PDHSEQR,  and  on  exit Z has been updated; transformations are applied only to the submatrix
               Z(ILOZ:IHIZ,ILO:IHI).  If WANTZ is .FALSE., Z is not referenced.

       DESCZ   (global and local input) INTEGER array of dimension DLEN_.
               The array descriptor for the distributed matrix Z.

       WORK    (local output) REAL array of size LWORK
               (Unless LWORK=-1, in which case WORK must be at least size 1)

       LWORK   (local input) INTEGER
               WORK(LWORK) is a local array and LWORK is assumed  big  enough  so  that  LWORK  >=  3*N  +  MAX(
               2*MAX(DESCZ(LLD_),DESCA(LLD_))   +   2*LOCc(N),   7*Ceil(N/HBL)/LCM(NPROW,NPCOL))   +  MAX(  2*N,
               (8*LCM(NPROW,NPCOL)+2)**2 ) If LWORK=-1, then WORK(1) gets set to the above number and  the  code
               returns immediately.

       IWORK   (global and local input) INTEGER array of size ILWORK
               This  will  hold  some  of  the IBLK integer arrays.  This is held as a place holder for a future
               release.  Currently unreferenced.

       ILWORK  (local input) INTEGER
               This will hold the size of the IWORK array.  This is held as a place holder for a future release.
               Currently unreferenced.

       INFO    (global output) INTEGER
               < 0: parameter number -INFO incorrect or inconsistent
               = 0: successful exit
               >  0:  PSLAHQR  failed  to  compute  all  the eigenvalues ILO to IHI in a total of 30*(IHI-ILO+1)
               iterations; if INFO = i, elements i+1:ihi of WR and WI contain those eigenvalues which have  been
               successfully computed.

               Logic:  This  algorithm  is very similar to _LAHQR.  Unlike _LAHQR, instead of sending one double
               shift through the largest unreduced submatrix, this algorithm sends multiple  double  shifts  and
               spaces them apart so that there can be parallelism across several processor row/columns.  Another
               critical difference is that this algorithm aggregrates multiple transforms together in  order  to
               apply them in a block fashion.

               Important  Local  Variables: IBLK = The maximum number of bulges that can be computed.  Currently
               fixed.   Future   releases   this   won't   be   fixed.    HBL    =   The   square   block   size
               (HBL=DESCA(MB_)=DESCA(NB_)) ROTN = The number of transforms to block together NBULGE = The number
               of bulges that will be attempted on the current submatrix.  IBULGE = The current number of bulges
               started.  K1(*),K2(*) = The current bulge loops from K1(*) to K2(*).

               Subroutines:  From  LAPACK,  this routine calls: SLAHQR     -> Serial QR used to determine shifts
               and eigenvalues SLARFG     -> Determine the Householder transforms

               This ScaLAPACK, this routine calls: PSLACONSB  -> To determine  where  to  start  each  iteration
               SLAMSH      ->  Sends  multiple  shifts  through  a  small  submatrix  to see how the consecutive
               subdiagonals change (if PSLACONSB indicates we can start a run in the middle) PSLAWIL    -> Given
               the  shift,  get  the  transformation SLASORTE   -> Pair up eigenvalues so that reals are paired.
               PSLACP3    -> Parallel array to local replicated array copy &  back.   SLAREF      ->  Row/column
               reflector applier.  Core routine here.  PSLASMSUB  -> Finds negligible subdiagonal elements.

               Current Notes and/or Restrictions: 1.) This code requires the distributed block size to be square
               and at least six (6); unlike simpler codes like LU, this  algorithm  is  extremely  sensitive  to
               block size.  Unwise choices of too small a block size can lead to bad performance.  2.) This code
               requires A and Z to be distributed identically and have identical contxts.  A future version  may
               allow  Z  to  have  a different contxt to 1D row map it to all nodes (so no communication on Z is
               necessary.)  3.) This release currently does not have a routine for resolving  the  Schur  blocks
               into  regular  2x2 form after this code is completed.  Because of this, a significant performance
               impact is required while the deflation is done by sometimes a single column of  processors.   4.)
               This code does not currently block the initial transforms so that none of the rows or columns for
               any bulge are completed until all are started.  To offset pipeline  start-up  it  is  recommended
               that  at  least 2*LCM(NPROW,NPCOL) bulges are used (if possible) 5.) The maximum number of bulges
               currently supported is fixed at 32.  In future versions this will be limited only by the incoming
               WORK  and IWORK array.  6.) The matrix A must be in upper Hessenberg form.  If elements below the
               subdiagonal are nonzero, the resulting transforms may be nonsimilar.  This is also true with  the
               LAPACK  routine  SLAHQR.  7.) For this release, this code has only been tested for RSRC_=CSRC_=0,
               but it has been written for the general case.  8.) Currently, all the eigenvalues are distributed
               to  all  the  nodes.   Future  releases  will  probably  distribute the eigenvalues by the column
               partitioning.  9.) The internals of this routine are subject to change.  10.)  To  optimize  this
               for your architecture, try tuning SLAREF.  11.) This code has only been tested for WANTZ = .TRUE.
               and may behave unpredictably for WANTZ set to .FALSE.

               Implemented by:  G. Henry, May 1, 1997