The implementations of the algorithms described in this chapter are still not yet mature enough to be considered core-functionality, and/or are available with the OpenCL backend only.
The following iterative solvers are only available on selected computing backends.
A two-stage mixed-precision CG algorithm is available as follows:
As usual, the first parameter to the constructor of mixed_precision_cg_tag
is the relative tolerance for the residual, while the second parameter denotes the maximum number of solver iterations. The third parameter denotes the relative tolerance for the inner low-precision CG iterations and defaults to 0.01
.
examples/banchmarks/solver.cpp
for an example.double
precision.In addition to the Preconditioners available in the ViennaCL core, two more preconditioners are available with the OpenCL backend and are described in the following.
Algebraic multigrid mimics the behavior of geometric multigrid on the algebraic level and is thus suited for black-box purposes, where only the system matrix and the right hand side vector are available [23] . Many different flavors of the individual multigrid ingredients exist [25], of which the most common ones are implemented in ViennaCL.
The two main ingredients of algebraic multigrid are a coarsening algorithm and an interpolation algorithm. The available coarsening methods are:
Description | ViennaCL option constant |
---|---|
Classical Ruge-Stüben (RS) | VIENNACL_AMG_COARSE_RS |
One-Pass | VIENNACL_AMG_COARSE_ONEPASS |
RS0 | VIENNACL_AMG_COARSE_RS0 |
RS3 | VIENNACL_AMG_COARSE_RS3 |
Aggregation | VIENNACL_AMG_COARSE_AG |
Smoothed aggregation | VIENNACL_AMG_COARSE_SA |
AMG coarsening methods available in ViennaCL. Per default, classical RS coarsening is used.
The available interpolation methods are:
Description | ViennaCL option constant |
---|---|
Direct | VIENNACL_AMG_INTERPOL_DIRECT |
Classic | VIENNACL_AMG_INTERPOL_ONEPASS |
RS0 coarsening | VIENNACL_AMG_INTERPOL_RS0 |
RS3 coarsening | VIENNACL_AMG_INTERPOL_RS3 |
AMG interpolation methods available in ViennaCL. Per default, direct interpolation is used.
In addition, the following parameters can be controlled in the amg_tag
and can be passed to the constructor:
0.25
)1
)1
)1
)1
)An alternative construction of a preconditioner for a sparse system matrix is to compute a matrix
with a prescribed sparsity pattern such that
where denotes the Frobenius norm. This is the basic idea of sparse approximate inverse (SPAI) preconditioner. It becomes increasingly attractive because of their inherent high degree of parallelism, since the minimization problem can be solved independently for each column of
. ViennaCL provides two preconditioners of this family: The first is the classical SPAI algorithm as described by Grote and Huckle [11] , the second is the factored SPAI (FSPAI) for symmetric matrices as proposed by Huckle [12] .
SPAI can be employed for a CPU matrix A
of type MatrixType
as follows:
The first parameter denotes the residual norm threshold for the full matrix, the second parameter the maximum number of pattern updates, and the third parameter is the threshold for the residual of each minimization problem.
For GPU-matrices, only parts of the setup phase are computed on the CPU, because compute-intensive tasks can be carried out on the GPU:
The GPUMatrixType
is typically a viennacl::compressed_matrix
type.
For symmetric matrices, FSPAI can be used with the conjugate gradient solver:
Our experience is that FSPAI is typically more efficient than SPAI when applied to the same matrix, both in computational effort and in terms of convergence acceleration of the iterative solvers.
Note that FSPAI depends on the ordering of the unknowns, thus bandwidth reduction algorithms may be employed first, cf. Bandwidth Reduction.
Since there is no standardized complex type in OpenCL at the time of the release of ViennaCL, vectors need to be set up with real- and imaginary part before computing a fast Fourier transform (FFT). In order to store complex numbers ,
, etc. in a
viennacl::vector
, say v
, the real and imaginary parts are mapped to even and odd entries of v
respectively: v[0] = Real(z_0)
, v[1] = Imag(z_0)
, v[2] = Real(z_1)
, v[3] = Imag(z_1)
, etc.
The FFT of v
can then be computed either by writing to a second vector output
or by directly writing the result to v
Conversely, the inverse FFT is computed as
The second option for computing the FFT is with Bluestein algorithm. Currently, the implementation supports only input sizes less than . The Bluestein algorithm uses at least three-times more additional memory than another algorithms, but should be fast for any size of data. As with any efficient FFT algorithm, the sequential implementation has a complexity of
. To compute the FFT with the Bluestein algorithm from a complex vector
v
and store the result in a vector output
, one uses the code
Some of the FFT functions are also suitable for matrices and can be computed in 2D. The computation of an FFT for objects of type viennacl::matrix
, say mat
, require that even entries are real parts and odd entries are imaginary parts of complex numbers. In order to store complex numbers ,
, etc.~in
mat
: mat(0,0) = Real(z_0)
, mat(0,1) = Imag(z_0)
,mat(0,2) = Real(z_0)
, mat(0,3) = Imag(z_0)
etc.
Than user can compute FFT either by writing to a second matrix output
or by directly writing the result to mat
There are two additional functions to calculate the convolution of two vectors. It expresses the amount of overlap of one function represented by vector v
as it is shifted over another function represented by vector u
. Formerly a convolution is defined by the integral
Define as the Fast Fourier Transform operator, there holds for a convolution of two infinite sequences
v
and u
To compute the convolution of two complex vectors v
and u
, where the result will be stored in output
, one can use
which does not modify the input. If a modification of the input vectors is acceptable,
can be used, reducing the overall memory requirements.
Multiplication of two complex vectors u
, v
where the result will be stored in output
, is provided by
For creating a complex vector v
from a real vector u
with suitable sizes (size = v.size() / 2
), use
Conversely, computing a real vector v
from a complex vector u
with length size = v.size() / 2
is achieved through
To reverse a vector v
inplace, use
The bandwidth of a sparse matrix is defined as the maximum difference of the indices of nonzero entries in a row, taken over all rows. A low bandwidth may allows for the use of efficient banded matrix solvers instead of iterative solvers. Moreover, better cache utilization as well as lower fill-in in LU-factorization based algorithms can be expected.
For a given sparse matrix with large bandwidth, ViennaCL provides routines for renumbering the unknowns such that the reordered system matrix shows much smaller bandwidth. Typical applications stem from the discretization of partial differential equations by means of the finite element or the finite difference method. The algorithms employed are as follows:
The iterated Cuthill-McKee algorithm applies the classical Cuthill-McKee algorithm to different starting nodes with small, but not necessarily minimal degree as root node into account. While this iterated application is more expensive in times of execution times, it may lead to better results than the classical Cuthill-McKee algorithm. A parameter controls the number of nodes considered: All nodes with degree
fulfilling
are considered, where and
are the miminum and maximum nodal degrees in the graph. A second parameter
gmax
specifies the number of additional root nodes considered.
The algorithms are called for a matrix A
of a type compatible with std::vector< std::map<int, double> >
by
and return the permutation array. In ViennaCL, the user then needs to manually reorder the sparse matrix based on the permutation array. Example code can be found in examples/tutorial/bandwidth-reduction.cpp
.
In various fields such as text mining, a matrix V
needs to be factored into factors W
and H
with the property that all three matrices have no negative elements, such that the function
is minimized. This can be achieved using the algorithm proposed by Lee and Seoung [14] with the following code:
The fourth and last parameter to the function nmf()
is an object of type viennacl::linalg::nmf_config
containing all necessary parameters of NMF routine:
eps_
: Relative tolerance for convergencestagnation_eps_
: Relative tolerance for the stagnation checkmax_iters_
: Maximum number of iterations for the NMF algorithmiters_
: The number of iterations of the last NMF run using this configuration objectprint_relative_error_
: Flag specifying whether the relative tolerance should be printed in each iterationcheck_after_steps_
: Number of steps after which the convergence of NMF should be checked (again)Multiple tests can be found in file viennacl/test/src/nmf.cpp
and tutorial in file viennacl/examples/tutorial/nmf.cpp