The new hmatrix version 0.16 includes a general matrix type which includes an optimized representation of sparse matrices and initial support for sparse linear systems.
As an illustrative example, we want to find $n$ numbers such that the sums of $k$ cyclically consecutive elements are known (deconvolution of a box filter). For example for $k=3$ we have the following equations:
$$x_1+x_2+x_3=b_1$$ $$x_2+x_3+x_4=b_2$$ $$\ldots$$ $$x_n+x_1+x_2=b_n$$
The coefficient matrix is very sparse. For example, for $n=20$ we have:
toDense (box 20 3)
20x20 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
This problem is fairly simple since the inverse of this kind of matrix can be obtained in closed form, but it is useful to check the correction of the sparse routines.
3 * inv (toDense (box 10 3))
10x10 1.0 -2.0 1.0 1.0 -2.0 1.0 1.0 -2.0 1.0 1.0 1.0 1.0 -2.0 1.0 1.0 -2.0 1.0 1.0 -2.0 1.0 1.0 1.0 1.0 -2.0 1.0 1.0 -2.0 1.0 1.0 -2.0 -2.0 1.0 1.0 1.0 -2.0 1.0 1.0 -2.0 1.0 1.0 1.0 -2.0 1.0 1.0 1.0 -2.0 1.0 1.0 -2.0 1.0 1.0 1.0 -2.0 1.0 1.0 1.0 -2.0 1.0 1.0 -2.0 -2.0 1.0 1.0 -2.0 1.0 1.0 1.0 -2.0 1.0 1.0 1.0 -2.0 1.0 1.0 -2.0 1.0 1.0 1.0 -2.0 1.0 1.0 1.0 -2.0 1.0 1.0 -2.0 1.0 1.0 1.0 -2.0 -2.0 1.0 1.0 -2.0 1.0 1.0 -2.0 1.0 1.0 1.0
On the other hand, for certain $n$ and $k$ the system in singular:
rank (toDense (box 20 4))
17
For nonsingular combinations of $n$ and $k$ the system is reasonably well conditioned. The following diagram shows the distribution of singular values for $n=301$ and several small $k$:
For testing purposes we choose a periodic shape as the right-hand side:
First we use the standard dense linear solver based on LU factorization. We have prepared functions to create the input data, solve the system, and show a few elements of the solution $x$ and the obtained right hand side $Ax$. We also show the relative errors (norm 2 and norm of infinity) of the result.
checkD2 a b = do let sol = denseSolve a b let obt = a #> sol print (dim sol) disp 6 $ (fromColumns [ subVector 0 10 obt , subVector (size obt-10) 10 obt , subVector 0 10 sol]) print $ norm2 (obt - b) / norm2 b print $ normInf (obt -b) / normInf b
We separately evaluate the time required to create the coefficient matrix and to perform an matrix-vector multiplication:
let ad = toDense (box 1000 3); bd = testb 1000
print $ sumElements $ ad #> bd
15000.0 create & mul dense: 16 ms
checkD2 ad bd
1000 10x3 0.000000 10.000000 -0.333333 1.000000 9.000000 0.666667 2.000000 8.000000 -0.333333 3.000000 7.000000 0.666667 4.000000 6.000000 1.666667 5.000000 5.000000 0.666667 6.000000 4.000000 1.666667 7.000000 3.000000 2.666667 8.000000 2.000000 1.666667 9.000000 1.000000 2.666667 3.032898180977698e-18 4.4408920985006264e-17 solve dense: 70 ms
For a bigger size the computation time of the dense solver becomes noticeable:
let ad = toDense (mat 5000 3); bd = testb 5000
print $ sumElements $ ad #> bd
75000.0 create & mul dense: 404 ms
checkD2 ad bd
5000 10x3 0.000000 10.000000 -0.333333 1.000000 9.000000 0.666667 2.000000 8.000000 -0.333333 3.000000 7.000000 0.666667 4.000000 6.000000 1.666667 5.000000 5.000000 0.666667 6.000000 4.000000 1.666667 7.000000 3.000000 2.666667 8.000000 2.000000 1.666667 9.000000 1.000000 2.666667 1.3563533003003317e-18 4.4408920985006264e-17 solve dense: 5226 ms
For much larger sizes we must use sparse solvers. The current hmatrix version includes a simple iterative solver based on the conjugate gradient method by Hestenes and Stiefel (1952). It admits both square symmetric and general rectangular systems (using the normal equations but without creating $A^TA$).
Let's solve a system of 100,000 equations:
let a = mkCSR (mat 100000 3); b = testb 100000
print $ sumElements $ fromCSR a !#> b
1500000.0 create: 273 ms
(We have used mkCSR
instead of mkSparse
for easy comparison of different solvers
working from the same, fully evaluated structure).
checkS2 meth mul a b = do let sols = meth a b print (length sols) let sol = last sols let obt = mul a sol print (dim sol) disp 6 $ (fromColumns [ subVector 0 10 obt , subVector (size obt-10) 10 obt , subVector 0 10 sol]) print $ norm2 (obt - b) / norm2 b print $ normInf (obt -b) / normInf b checkCG ϵr ϵs nMax = checkS2 meth mul where mul a = (fromCSR a !#>) meth a b = map cgx $ cgSolve' False ϵr ϵs nMax (fromCSR a) b 0
checkCG 1E-4 1E-3 10 a b
7 100000 10x3 -0.000000 10.000000 -0.333333 1.000000 9.000000 0.666667 2.000000 8.000000 -0.333333 3.000000 7.000000 0.666667 4.000000 6.000000 1.666667 5.000000 5.000000 0.666667 6.000000 4.000000 1.666667 7.000000 3.000000 2.666667 8.000000 2.000000 1.666667 9.000000 1.000000 2.666667 5.146589043975755e-10 3.282267457827004e-10 solveCG: 46 ms
This problem is very suitable for the conjugate gradient method, which obtains an excellent solution in just 7 steps, without any kind of preconditioning step.
We can compare this solution with the DSS direct method implemented by Intel's MKL optimized library,
available from the experimental package hmatrix-sparse
:
checkMKL2 = checkS2 meth mul where mul a = (fromCSR a !#>) meth a = return . dss a
checkMKL2 a b
1 100000 10x3 0.000000 10.000000 -0.333333 1.000000 9.000000 0.666667 2.000000 8.000000 -0.333333 3.000000 7.000000 0.666667 4.000000 6.000000 1.666667 5.000000 5.000000 0.666667 6.000000 4.000000 1.666667 7.000000 3.000000 2.666667 8.000000 2.000000 1.666667 9.000000 1.000000 2.666667 6.849785602162428e-18 8.881784197001253e-17 solveDSS: 1014 ms