This chapter gives an overview over the available algorithms in ViennaCL. The focus of ViennaCL is on iterative solvers, for which generic implementations that allows the use of the same code on the CPU (either using Boost.uBLAS, Eigen, MTL4, or ViennaCL types) and on the GPU (using ViennaCL types) are provided.
ViennaCL provides triangular solvers and LU factorization without pivoting for the solution of dense linear systems. The interface is similar to that of Boost.uBLAS.
In ViennaCL there is no pivoting included in the LU factorization process, hence the computation may break down or yield results with poor accuracy. However, for certain classes of matrices (like diagonal dominant matrices) good results can be obtained without pivoting.
It is also possible to solve for multiple right hand sides:
ViennaCL provides different iterative solvers for various classes of matrices:
Unlike direct solvers, the convergence of iterative solvers relies on certain properties of the system matrix. Keep in mind that an iterative solver may fail to converge, especially if the matrix is ill conditioned or a wrong solver is chosen.
examples/tutorials/
folder.Method | Matrix class | ViennaCL |
---|---|---|
Conjugate Gradient (CG) | symmetric positive definite | y = solve(A, x, cg_tag()); |
Stabilized Bi-CG (BiCGStab) | non-symmetric | y = solve(A, x, bicgstab_tag()); |
Generalized Minimum Residual (GMRES) | general | y = solve(A, x, gmres_tag()); |
Linear solver routines in ViennaCL for the computation of in the expression
with given
,
.
Customized error tolerances can be set in the solver tags. The convention is that solver tags take the relative error tolerance as first argument and the maximum number of iteration steps as second argument. Furthermore, after the solver run the number of iterations and the estimated error can be obtained from the solver tags as follows:
The BiCGStab solver tag can be customized in exactly the same way. The GMRES solver tag takes as third argument the dimension of the Krylov space. Thus, a tag for GMRES(30) with tolerance and at most
100
total iterations (hence, up to three restarts) can be set up by
ViennaCL ships with a generic implementation of several preconditioners. The preconditioner setup is expect for simple diagonal preconditioners always carried out on the CPU host due to the need for dynamically allocating memory. Thus, one may not obtain an overall performance benefit if too much time is spent on the preconditioner setup.
An overview of preconditioners available for the various sparse matrix types is as follows:
Matrix Type | ICHOL | (Block-)ILU[0/T] | Jacobi | Row-scaling | AMG | SPAI |
---|---|---|---|---|---|---|
compressed_matrix | yes | yes | yes | yes | yes | yes |
coordinate_matrix | no | no | yes | yes | no | no |
ell_matrix | no | no | no | no | no | no |
hyb_matrix | no | no | no | no | no | no |
Broader support of preconditioners particularly for ell_matrix
and hyb_matrix
is scheduled for future releases. AMG and SPAI preconditioners are described in Additional Algorithms section.
The incomplete LU factorization preconditioner aims at computing sparse matrices lower and upper triangular matrices and
such that the sparse system matrix is approximately given by
. In order to control the sparsity pattern of
and
, a threshold strategy is used (ILUT) [21] . Due to the serial nature of the preconditioner, the setup of ILUT is always computed on the CPU using the respective ViennaCL backend.
The triangular substitutions may be applied in parallel on GPUs by enabling level-scheduling [21] via the member function call use_level_scheduling(true)
in the ilut_config
object.
Three parameters can be passed to the constructor of ilut_tag
: The first specifies the maximum number of entries per row in and
, while the second parameter specifies the drop tolerance. The third parameter is the boolean specifying whether level scheduling should be used.
Similar to ILUT, ILU0 computes an approximate LU factorization with sparse factors L and U. While ILUT determines the location of nonzero entries on the fly, ILU0 uses the sparsity pattern of A for the sparsity pattern of L and U [21] Due to the serial nature of the preconditioner, the setup of ILU0 is computed on the CPU.
The triangular substitutions may be applied in parallel on GPUs by enabling level-scheduling [21] via the member function call use_level_scheduling(true)
in the ilu0_config
object.
One parameter can be passed to the constructor of ilu0_tag
, being the boolean specifying whether level scheduling should be used.
To overcome the serial nature of ILUT and ILU0 applied to the full system matrix, a parallel variant is to apply ILU to diagonal blocks of the system matrix. This is accomplished by the block_ilu
preconditioner, which takes the system matrix type as first template argument and the respective ILU-tag type as second template argument (either ilut_tag
or ilu0_tag
). Support for accelerators using CUDA or OpenCL is provided.
A third argument can be passed to the constructor of block_ilu_precond
: Either the number of blocks to be used (defaults to 8
), or an index vector with fine-grained control over the blocks.
A Jacobi preconditioner is a simple diagonal preconditioner given by the reciprocals of the diagonal entries of the system matrix. Use the preconditioner as follows:
A row scaling preconditioner is a simple diagonal preconditioner given by the reciprocals of the norms of the rows of the system matrix. Use the preconditioner as follows:
The tag viennacl::linalg::row_scaling_tag()
can be supplied with a parameter denoting the norm to be used. A value of 1
specifies the -norm, while a value of
selects the
-norm (default).
Two algorithms for the computations of the eigenvalues of a sparse matrix are implemented in ViennaCL:
The algorithms are called for a matrix object A
by
Depending on the second parameter tag
either of the two methods is called. Both algorithms can be used for either Boost.uBLAS or ViennaCL compressed matrices. In order to get the eigenvalue with the greatest absolut value, the power iteration should be called. The Lanczos algorithm returns a vector of the largest eigenvalues with the same type as the entries of the matrix.
The Power iteration aims at computing the eigenvalues of a matrix by calculating the product of the matrix and a vector for several times, where the resulting vector is used for the next product of the matrix and so on. The computation stops as soon as the norm of the vector converges. The final vector is the eigenvector to the eigenvalue with the greatest absolut value. To call this algorithm, piter_tag
must be used. This tag has only one parameter: terminationfactor
defines the accuracy of the computation, i.e. if the new norm of the eigenvector changes less than this parameter the computation stops and returns the corresponding eigenvalue (default: ) The call of the constructor may look as follows:
examples/tutorial/power-iter.cpp
In order to compute the eigenvalues of a sparse high-dimensional matrix the Lanczos algorithm can be used to find these. This algorithm reformulates the given high-dimensional matrix in a way such that the matrix can be rewritten in a tridiagonal matrix at much lower dimension. The eigenvalues of this tridiagonal matrix are equal to the largest eigenvalues of the original matrix and calculated by using the bisection method [10] . To call this Lanczos algorithm, lanczos_tag
must be used. This tag has several parameters that can be passed to the constructor:
factor
(default: 0.75
)0
uses partial reorthogonalization, 1
full reothogonalization, and 2
does not use reorthogonalization (default: 0
)num_eigenvalues
(default: 10
)krylov_size
(default: 100
). The maximum number of iterations can be equal or less this parameter.The call of the constructor may look like the following:
examples/tutorial/lanczos.cpp
A matrix can be factored into
, where
is an orthogonal matrix and
is upper triangular. This so-called QR-factorization is important for eigenvalue computations as well as for the solution of least-squares problems [10] . ViennaCL provides a generic implementation of the QR-factorization using Householder reflections in file
viennacl/linalg/qr.hpp
. An example application can be found in examples/tutorial/qr.cpp
.
The Householder reflectors defining the Householder reflection
are stored in the columns below the diagonal of the input matrix
[10] . The normalization coefficients
are returned by the worker function
inplace_qr
. The upper triangular matrix is directly written to the upper triangular part of
.
If is a dense matrix from Boost.uBLAS, the calculation is carried out on the CPU using a single thread. If
is a
viennacl::matrix
, a hybrid implementation is used: The panel factorization is carried out using Boost.uBLAS, while expensive BLAS level 3 operations are computed on the OpenCL device using multiple threads.
Typically, the orthogonal matrix is kept in inplicit form because of computational efficiency. However, if
and
have to be computed explicitly, the function
recoverQ
can be used:
Here, A
is the inplace QR-factored matrix, betas
are the coefficients of the Householder reflectors as returned by inplace_qr
, while Q
and R
are the destination matrices. However, the explicit formation of Q
is expensive and is usually avoided. For a number of applications of the QR factorization it is only required to apply Q^T
to a vector b
. This is accomplished by
without setting up Q
(or Q^T
) explicitly.
examples/tutorial/least-squares.cpp
for a least-squares computation using QR factorizations.