Provided by: scalapack-doc_1.5-10_all bug

NAME

       PSGESVD - compute the singular value decomposition (SVD) of an M-by-N matrix A, optionally
       computing the left and/or right singular vectors

SYNOPSIS

       SUBROUTINE PSGESVD( JOBU, JOBVT, M, N, A, IA, JA, DESCA, S, U, IU,  JU,  DESCU,  VT,  IVT,
                           JVT, DESCVT, WORK, LWORK, INFO )

           CHARACTER       JOBU, JOBVT

           INTEGER         IA, INFO, IU, IVT, JA, JU, JVT, LWORK, M, N

           INTEGER         DESCA( * ), DESCU( * ), DESCVT( * )

           REAL            A( * ), S( * ), U( * ), VT( * ), WORK( * )

PURPOSE

       PSGESVD  computes the singular value decomposition (SVD) of an M-by-N matrix A, optionally
       computing the left and/or right singular vectors. The SVD is written as

            A = U * SIGMA * transpose(V)

       where SIGMA is an M-by-N matrix which is zero except for its min(M,N) diagonal elements, U
       is  an  M-by-M  orthogonal  matrix,  and  V  is  an N-by-N orthogonal matrix. The diagonal
       elements of SIGMA are the singular values of A  and  the  columns  of  U  and  V  are  the
       corresponding  right  and  left  singular  vectors,  respectively. The singular values are
       returned in array S in decreasing order and only the first min(M,N) columns of U and  rows
       of VT = V**T are computed.

       Notes
       =====
       Each  global  data  object  is  described by an associated description vector. This vector
       stores the information required to establish the mapping between an object element and its
       corresponding process and memory location.

       Let  A be a generic term for any 2D block cyclicly distributed array.  Such a global array
       has an associated description vector DESCA.  In the following comments,  the  character  _
       should be read as "of the global array".

       NOTATION        STORED IN      EXPLANATION
       ---------------   --------------   --------------------------------------  DTYPE_A(global)
       DESCA( DTYPE_ )The descriptor type.  In this case,
                                      DTYPE_A = 1.
       CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating
                                      the BLACS process grid A is distribu-
                                      ted over. The context itself is glo-
                                      bal, but the handle (the integer
                                      value) may vary.
       M_A    (global) DESCA( M_ )    The number of rows in the global
                                      array A.
       N_A    (global) DESCA( N_ )    The number of columns in the global
                                      array A.
       MB_A   (global) DESCA( MB_ )   The blocking factor used to distribute
                                      the rows of the array.
       NB_A   (global) DESCA( NB_ )   The blocking factor used to distribute
                                      the columns of the array.
       RSRC_A (global) DESCA( RSRC_ ) The process row over which the first
                                      row of the array A is distributed.  CSRC_A (global)  DESCA(
       CSRC_ ) The process column over which the
                                      first column of the array A is
                                      distributed.
       LLD_A  (local)  DESCA( LLD_ )  The leading dimension of the local
                                      array.  LLD_A >= MAX(1,LOCr(M_A)).

       Let  K  be  the  number  of  rows  or columns of a distributed matrix, and assume that its
       process grid has dimension p x q. LOCr( K ) denotes the number of elements  of  K  that  a
       process  would  receive  if K were distributed over the p processes of its process column.
       Similarly, LOCc( K ) denotes the number of elements of K that a process would receive if K
       were  distributed over the q processes of its process row. The values of LOCr() and LOCc()
       may be determined via a call to the ScaLAPACK tool function, NUMROC:
               LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ),
               LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ).  An  upper  bound  for  these
       quantities may be computed by:
               LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A
               LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A

ARGUMENTS

       MP  = number of local rows in A and U NQ = number of local columns in A and VT SIZE = min(
       M, N ) SIZEQ = number of local columns in U SIZEP = number of local rows in VT

       JOBU    (global input) CHARACTER*1
               Specifies options for computing all or part of the matrix U:
               = 'V':  the first SIZE columns of U (the left singular vectors)  are  returned  in
               the array U; = 'N':  no columns of U (no left singular vectors) are computed.

       JOBVT   (global input) CHARACTER*1
               Specifies options for computing all or part of the matrix V**T:
               =  'V':   the first SIZE rows of V**T (the right singular vectors) are returned in
               the array VT; = 'N':  no rows of V**T (no right singular vectors) are computed.

       M       (global input) INTEGER
               The number of rows of the input matrix A.  M >= 0.

       N       (global input) INTEGER
               The number of columns of the input matrix A.  N >= 0.

       A       (local input/workspace) block cyclic REAL           array,
               global dimension (M, N), local dimension (MP, NQ) On exit, the contents of  A  are
               destroyed.

       IA      (global input) INTEGER
               The row index in the global array A indicating the first row of sub( A ).

       JA      (global input) INTEGER
               The column index in the global array A indicating the first column of sub( A ).

       DESCA   (global input) INTEGER array of dimension DLEN_
               The array descriptor for the distributed matrix A.

       S       (global output) REAL           array, dimension SIZE
               The singular values of A, sorted so that S(i) >= S(i+1).

       U       (local output) REAL           array, local dimension
               (MP,  SIZEQ),  global  dimension  (M,  SIZE)  if  JOBU = 'V', U contains the first
               min(m,n) columns of U if JOBU = 'N', U is not referenced.

       IU      (global input) INTEGER
               The row index in the global array U indicating the first row of sub( U ).

       JU      (global input) INTEGER
               The column index in the global array U indicating the first column of sub( U ).

       DESCU   (global input) INTEGER array of dimension DLEN_
               The array descriptor for the distributed matrix U.

       VT      (local output) REAL           array, local dimension
               (SIZEP, NQ), global dimension (SIZE, N).  If JOBVT = 'V', VT  contains  the  first
               SIZE rows of V**T. If JOBVT = 'N', VT is not referenced.

       IVT     (global input) INTEGER
               The row index in the global array VT indicating the first row of sub( VT ).

       JVT     (global input) INTEGER
               The column index in the global array VT indicating the first column of sub( VT ).

       DESCVT   (global input) INTEGER array of dimension DLEN_
                The array descriptor for the distributed matrix VT.

       WORK    (local workspace/output) REAL           array, dimension
               (LWORK) On exit, if INFO = 0, WORK(1) returns the optimal LWORK;

       LWORK   (local input) INTEGER
               The dimension of the array WORK.

               LWORK > 2 + 6*SIZEB + MAX(WATOBD, WBDTOSVD),

               where  SIZEB  =  MAX(M,N),  and  WATOBD  and  WBDTOSVD refer, respectively, to the
               workspace required to bidiagonalize the matrix A and to  go  from  the  bidiagonal
               matrix to the singular value decomposition U*S*VT.

               For WATOBD, the following holds:

               WATOBD = MAX(MAX(WPSLANGE,WPSGEBRD), MAX(WPSLARED2D,WPSLARED1D)),

               where  WPSLANGE,  WPSLARED1D,  WPSLARED2D,  WPSGEBRD  are  the workspaces required
               respectively for the subprograms PSLANGE, PSLARED1D, PSLARED2D, PSGEBRD. Using the
               standard notation

               MP  =  NUMROC(  M,  MB,  MYROW, DESCA( CTXT_ ), NPROW), NQ = NUMROC( N, NB, MYCOL,
               DESCA( LLD_ ), NPCOL),

               the workspaces required for the above subprograms are

               WPSLANGE = MP, WPSLARED1D = NQ0, WPSLARED2D = MP0, WPSGEBRD = NB*(MP + NQ +  1)  +
               NQ,

               where  NQ0  and  MP0  refer, respectively, to the values obtained at MYCOL = 0 and
               MYROW = 0. In general, the upper limit for the workspace is given by  a  workspace
               required on processor (0,0):

               WATOBD <= NB*(MP0 + NQ0 + 1) + NQ0.

               In  case of a homogeneous process grid this upper limit can be used as an estimate
               of the minimum workspace for every processor.

               For WBDTOSVD, the following holds:

               WBDTOSVD = SIZE*(WANTU*NRU + WANTVT*NCVT)  +  MAX(WDBDSQR,  MAX(WANTU*WPSORMBRQLN,
               WANTVT*WPSORMBRPRT)),

       where

      1, if left(right) singular vectors are wanted WANTU(WANTVT) = 0, otherwise

      and  WDBDSQR,  WPSORMBRQLN and WPSORMBRPRT refer respectively to the workspace required for
      the subprograms DBDSQR, PSORMBR(QLN), and PSORMBR(PRT), where QLN and PRT are the values of
      the  arguments  VECT,  SIDE,  and  TRANS  in the call to PSORMBR. NRU is equal to the local
      number of rows of the matrix  U  when  distributed  1-dimensional  "column"  of  processes.
      Analogously, NCVT is equal to the local number of columns of the matrix VT when distributed
      across 1-dimensional "row" of processes. Calling the LAPACK procedure DBDSQR requires

      WDBDSQR = MAX(1, 2*SIZE + (2*SIZE - 4)*MAX(WANTU, WANTVT))

      on every processor. Finally,

      WPSORMBRQLN = MAX( (NB*(NB-1))/2, (SIZEQ+MP)*NB)+NB*NB, WPSORMBRPRT =  MAX(  (MB*(MB-1))/2,
      (SIZEP+NQ)*MB )+MB*MB,

      If  LIWORK  = -1, then LIWORK is global input and a workspace query is assumed; the routine
      only calculates the minimum size for the work array. The required workspace is returned  as
      the first element of WORK and no error message is issued by PXERBLA.

       INFO    (output) INTEGER
               = 0:  successful exit.
               < 0:  if INFO = -i, the i-th argument had an illegal value.
               >  0:  if SBDSQR did not converge If INFO = MIN(M,N) + 1, then PSSYEV has detected
               heterogeneity by finding that eigenvalues were not identical  across  the  process
               grid.  In this case, the accuracy of the results from PSSYEV cannot be guaranteed.