Provided by: scalapack-doc_1.5-10_all bug

NAME

       PCSTEIN  -  compute  the eigenvectors of a symmetric tridiagonal matrix in parallel, using
       inverse iteration

SYNOPSIS

       SUBROUTINE PCSTEIN( N, D, E, M, W, IBLOCK, ISPLIT, ORFAC, Z, IZ, JZ, DESCZ,  WORK,  LWORK,
                           IWORK, LIWORK, IFAIL, ICLUSTR, GAP, INFO )

           INTEGER         INFO, IZ, JZ, LIWORK, LWORK, M, N

           REAL            ORFAC

           INTEGER         DESCZ( * ), IBLOCK( * ), ICLUSTR( * ), IFAIL( * ), ISPLIT( * ), IWORK(
                           * )

           REAL            D( * ), E( * ), GAP( * ), W( * ), WORK( * )

           COMPLEX         Z( * )

PURPOSE

       PCSTEIN computes the eigenvectors of a symmetric tridiagonal  matrix  in  parallel,  using
       inverse  iteration.  The  eigenvectors  found  correspond  to  user specified eigenvalues.
       PCSTEIN does not orthogonalize vectors that are on  different  processes.  The  extent  of
       orthogonalization is controlled by the input parameter LWORK.  Eigenvectors that are to be
       orthogonalized are computed by the same process. PCSTEIN decides on the allocation of work
       among  the  processes  and then calls SSTEIN2 (modified LAPACK routine) on each individual
       process. If insufficient workspace is allocated, the expected orthogonalization may not be
       done.

       Note : If the eigenvectors obtained are not orthogonal, increase
              LWORK and run the code again.

       Notes
       =====

       Each  global  data  object  is described by an associated description vector.  This vector
       stores the information required to establish the mapping between an object element and its
       corresponding process and memory location.

       Let  A be a generic term for any 2D block cyclicly distributed array.  Such a global array
       has an associated description vector DESCA.  In the following comments,  the  character  _
       should be read as "of the global array".

       NOTATION        STORED IN      EXPLANATION
       ---------------   --------------   --------------------------------------  DTYPE_A(global)
       DESCA( DTYPE_ )The descriptor type.  In this case,
                                      DTYPE_A = 1.
       CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating
                                      the BLACS process grid A is distribu-
                                      ted over. The context itself is glo-
                                      bal, but the handle (the integer
                                      value) may vary.
       M_A    (global) DESCA( M_ )    The number of rows in the global
                                      array A.
       N_A    (global) DESCA( N_ )    The number of columns in the global
                                      array A.
       MB_A   (global) DESCA( MB_ )   The blocking factor used to distribute
                                      the rows of the array.
       NB_A   (global) DESCA( NB_ )   The blocking factor used to distribute
                                      the columns of the array.
       RSRC_A (global) DESCA( RSRC_ ) The process row over which the first
                                      row of the array A is distributed.  CSRC_A (global)  DESCA(
       CSRC_ ) The process column over which the
                                      first column of the array A is
                                      distributed.
       LLD_A  (local)  DESCA( LLD_ )  The leading dimension of the local
                                      array.  LLD_A >= MAX(1,LOCr(M_A)).

       Let  K  be  the  number  of  rows  or columns of a distributed matrix, and assume that its
       process grid has dimension p x q.
       LOCr( K ) denotes the number of elements of K that a  process  would  receive  if  K  were
       distributed over the p processes of its process column.
       Similarly, LOCc( K ) denotes the number of elements of K that a process would receive if K
       were distributed over the q processes of its process row.
       The values of LOCr() and LOCc() may be  determined  via  a  call  to  the  ScaLAPACK  tool
       function, NUMROC:
               LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ),
               LOCc(  N  )  =  NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ).  An upper bound for these
       quantities may be computed by:
               LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A
               LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A

ARGUMENTS

       P = NPROW * NPCOL is the total number of processes

       N       (global input) INTEGER
               The order of the tridiagonal matrix T.  N >= 0.

       D       (global input) REAL array, dimension (N)
               The n diagonal elements of the tridiagonal matrix T.

       E       (global input) REAL array, dimension (N-1)
               The (n-1) off-diagonal elements of the tridiagonal matrix T.

       M       (global input) INTEGER
               The total number of eigenvectors to be found. 0 <= M <= N.

       W       (global input/global output) REAL array, dim (M)
               On input, the first M  elements  of  W  contain  all  the  eigenvalues  for  which
               eigenvectors  are  to  be computed. The eigenvalues should be grouped by split-off
               block and ordered from smallest to largest within the block (The  output  array  W
               from  PSSTEBZ with ORDER='b' is expected here). This array should be replicated on
               all processes.  On output, the first M elements contain the input  eigenvalues  in
               ascending order.

               Note  :  To  obtain  orthogonal vectors, it is best if eigenvalues are computed to
               highest accuracy ( this can be done by setting ABSTOL to the underflow threshold =
               SLAMCH('U') --- ABSTOL is an input parameter to PSSTEBZ )

       IBLOCK  (global input) INTEGER array, dimension (N)
               The  submatrix indices associated with the corresponding eigenvalues in W -- 1 for
               eigenvalues belonging to the first submatrix from the top, 2 for  those  belonging
               to  the  second  submatrix, etc. (The output array IBLOCK from PSSTEBZ is expected
               here).

       ISPLIT  (global input) INTEGER array, dimension (N)
               The splitting points, at which T breaks up into submatrices.  The first  submatrix
               consists  of  rows/columns  1 to ISPLIT(1), the second of rows/columns ISPLIT(1)+1
               through  ISPLIT(2),   etc.,   and   the   NSPLIT-th   consists   of   rows/columns
               ISPLIT(NSPLIT-1)+1  through ISPLIT(NSPLIT)=N (The output array ISPLIT from PSSTEBZ
               is expected here.)

       ORFAC   (global input) REAL
               ORFAC specifies which eigenvectors should be  orthogonalized.   Eigenvectors  that
               correspond  to  eigenvalues  which  are within ORFAC*||T|| of each other are to be
               orthogonalized.  However, if the  workspace  is  insufficient  (see  LWORK),  this
               tolerance  may  be  decreased  until  all eigenvectors to be orthogonalized can be
               stored in one process.  No orthogonalization will be done if ORFAC equals zero.  A
               default value of 10^-3 is used if ORFAC is negative.  ORFAC should be identical on
               all processes.

       Z       (local output) COMPLEX array,
               dimension (DESCZ(DLEN_), N/npcol  +  NB)  Z  contains  the  computed  eigenvectors
               associated  with  the specified eigenvalues. Any vector which fails to converge is
               set to its current iterate after MAXITS iterations ( See SSTEIN2 ).  On output,  Z
               is distributed across the P processes in block cyclic format.

       IZ      (global input) INTEGER
               Z's  global  row index, which points to the beginning of the submatrix which is to
               be operated on.

       JZ      (global input) INTEGER
               Z's global column index, which points to the beginning of the submatrix  which  is
               to be operated on.

       DESCZ   (global and local input) INTEGER array of dimension DLEN_.
               The array descriptor for the distributed matrix Z.

       WORK    (local workspace/global output) REAL array,
               dimension  (  LWORK  )  On  output, WORK(1) gives a lower bound on the workspace (
               LWORK ) that guarantees the user desired orthogonalization (see ORFAC).  Note that
               this may overestimate the minimum workspace needed.

       LWORK   (local input) integer
               LWORK   controls  the extent of orthogonalization which can be done. The number of
               eigenvectors for which storage is allocated on each  process  is  NVEC  =  floor((
               LWORK- max(5*N,NP00*MQ00) )/N).  Eigenvectors corresponding to eigenvalue clusters
               of size NVEC - ceil(M/P) + 1 are guaranteed to be orthogonal (  the  orthogonality
               is similar to that obtained from CSTEIN2).  Note : LWORK  must be no smaller than:
               max(5*N,NP00*MQ00) + ceil(M/P)*N, and should have the  same  input  value  on  all
               processes.   It is the minimum value of LWORK input on different processes that is
               significant.

               If LWORK = -1, then LWORK is global input and a workspace query  is  assumed;  the
               routine  only calculates the minimum and optimal size for all work arrays. Each of
               these values is returned in the first entry of the corresponding work  array,  and
               no error message is issued by PXERBLA.

       IWORK   (local workspace/global output) INTEGER array,
               dimension ( 3*N+P+1 ) On return, IWORK(1) contains the amount of integer workspace
               required.  On return, the IWORK(2) through IWORK(P+2)  indicate  the  eigenvectors
               computed  by  each  process.  Process I computes eigenvectors indexed IWORK(I+2)+1
               thru' IWORK(I+3).

       LIWORK  (local input) INTEGER
               Size of array IWORK.  Must be >= 3*N + P + 1

               If LIWORK = -1, then LIWORK is global input and a workspace query is assumed;  the
               routine  only calculates the minimum and optimal size for all work arrays. Each of
               these values is returned in the first entry of the corresponding work  array,  and
               no error message is issued by PXERBLA.

       IFAIL   (global output) integer array, dimension (M)
               On  normal exit, all elements of IFAIL are zero.  If one or more eigenvectors fail
               to converge after MAXITS iterations (as in CSTEIN), then INFO > 0 is returned.  If
               mod(INFO,M+1)>0,  then  for I=1 to mod(INFO,M+1), the eigenvector corresponding to
               the eigenvalue W(IFAIL(I))  failed  to  converge  (  W  refers  to  the  array  of
               eigenvalues on output ).

               ICLUSTR  (global output) integer array, dimension (2*P) This output array contains
               indices of eigenvectors corresponding to a cluster of eigenvalues that  could  not
               be  orthogonalized  due  to  insufficient  workspace  (see LWORK, ORFAC and INFO).
               Eigenvectors corresponding to clusters of eigenvalues  indexed  ICLUSTR(2*I-1)  to
               ICLUSTR(2*I),  I  =  1  to  INFO/(M+1), could not be orthogonalized due to lack of
               workspace. Hence the eigenvectors corresponding  to  these  clusters  may  not  be
               orthogonal.  ICLUSTR  is  a  zero  terminated  array --- ( ICLUSTR(2*K).NE.0 .AND.
               ICLUSTR(2*K+1).EQ.0 ) if and only if K is the number of clusters.

       GAP     (global output) REAL array, dimension (P)
               This output array contains the gap between eigenvalues  whose  eigenvectors  could
               not  be  orthogonalized.  The INFO/M output values in this array correspond to the
               INFO/(M+1) clusters indicated by the array ICLUSTR. As a result, the  dot  product
               between  eigenvectors  corresponding  to  the  I^th  cluster  may  be as high as (
               O(n)*macheps ) / GAP(I).

       INFO    (global output) INTEGER
               = 0:  successful exit
               < 0:  If the i-th argument is an array and the j-entry had an illegal value,  then
               INFO = -(i*100+j), if the i-th argument is a scalar and had an illegal value, then
               INFO = -i.  < 0 :  if INFO = -I, the I-th argument had an illegal value
               > 0 :  if mod(INFO,M+1) = I, then I eigenvectors  failed  to  converge  in  MAXITS
               iterations.  Their indices are stored in the array IFAIL.  if INFO/(M+1) = I, then
               eigenvectors  corresponding  to  I  clusters   of   eigenvalues   could   not   be
               orthogonalized  due  to  insufficient  workspace.  The indices of the clusters are
               stored in the array ICLUSTR.