Monday, 28 July 2014

Week of 7/21 Update

This week I've been working on making my PolyDet code useful by creating a polynomial ring where all the operations are done modulo another polynomial. I've called this new class GivaroPolyModPoly. It reuses the Givaro Extension class which already does something similar to represent extension field elements modulo an irreducible. All I do is manually set the "irreducible" to some arbitrary polynomial. Now I can supply this ring to SmithFormIliopolus and all the intermediate polynomials will be automagically reduced modulo the polynomial I computed with PolyDet.

I've also been working on implementing parallelism in the Pascal blackbox. Some people on the FFLAS-FFPACK team have written a wrapper for task-based parallelism that we've been thinking of integrating with Linbox. The advantage to task-based parallelism is that it allows for recursive uses of parallelism, whereas things like #omp for do not. In addition the wrapper would allow us to switch to other backends such as Kaapi. I've been looking in to using these wrappers to handle parallel applyLeft() in Pascal.

Monday, 21 July 2014

Week of 7/14 Update

This week I focused on speeding up computations over GF(3). We have a number of matrices over GF(3) for which we want to compute the invariant factors. Fortunately, Bryan Youse has written a very efficient representation of these matrices called Sliced3. It uses 2 bits to represent each matrix element and operates over n/2 elements in parallel when computing a matrix-matrix multiplication where n is the word size. The code has been developed in parallel with the mainline Linbox source so the interface has diverged a bit from the rest of the Linbox matrix classes. Also, in order to speed up the critical applyLeft() method I had to specialize for this representation so that I use iterators directly rather than constructing temporary submatrices.

I further improved what I'm now calling my Pascal blackbox class which represents matrices of the form $A_{i,j}=c_{i+j}{i+j\choose{}j}$ with elements in GF(3). It turns out it's not necessary to be fast when computing ${n\choose{}k}$, instead you can take advantage of the fractal structure of this matrix. A $3^{n}\times{}3^{n}$ matrix $B_{i,j}={i+j\choose{}j}$ consists of 9 $3^{n-1}\times{}3^{n-1}$ submatrices, $B_{i,j}$ where $B_{1,1}=B_{1,2}=B_{1,3}=B_{2,1}=B_{3,1}$, $B_{2,2}=-A_{1,1}$ and $A_{2,3}=A_{3,3}=A_{3,2}=0$. This gives rise to a natural recursive implementation of applyLeft() which proves to be about 10 times fast when used with the Sliced3 package than my old generic sparse matrix implementation. It also only uses $O(n)$ memory instead of $O(n^{\frac{4}{3}})$

Monday, 14 July 2014

Week of 7/7 Update

This week I've been trying to speed up PolyDet, and to get it working with SmithFormIliopolus. Testing has revealed that, while asymptotically fast, the PolyDet code is way slower than any other code in the invariant factor pipeline. The bottleneck is at the evaluation stage, where we take $b^{2}$ polynomials of degree $d$ and evaluate them at about $bn$ points. By contrast the determinant and interpolation steps are relatively cheap. I suspect that the problem is in the GivaroPoly implementation of polynomial arithmetic. GivaroPoly and GivaroExtension extension both represent a polynomial by a std::vector. This means that a polynomial over an extension field is a vector of vectors and I suspect the doubly nested loops involved are killing performance. I'm looking into FLINT which also implements polynomial arithmetic in pure C and may be faster. Some other Linboxers have also been trying out FLINT and it sounds promising. I'm also looking into automatically switching to a Zech-log implementation which the extension field is sufficiently small, but this probably won't work for very large matrices with correspondingly large values of $d$ (roughly proportional to $\frac{n}{b}$).

I've also written a blackbox class which represents a matrix which we've been trying to get the invariant factors of. It has the structure of Pascal's triangle and the $(i,j)$ element is given by $c_{i+j}{i+j\choose{}j}$. It turns out that, with a bit of dynamic programming, ${n\choose{}k}$ can be computed in just 2 field operations using $O(n)$ space for precomputation. I took advantage of this and the regularity of the matrix to come up with a very space efficient representation that should help for the larger cases.

Monday, 7 July 2014

Week of 6/30 Update

This week I finished implementing the PolyDet class which computes the determinant of a matrix over a polynomial ring. The basic steps of the algorithm are:
  • Given: $F=GF(p)$, $A\in{}F^{n*n}[x]$
  • Let $d=\sum_{i=1}^{i=m}\max_{j\in\{1..n\}}\deg(A_{i,j})$
    • (Choose $d$ to be the sum of the maximum row degrees)
  • Let $d\leftarrow{}\min\left\{k|2^{k}>d+1\right\}$
    • (Add 1 to $d$ and round up to the nearest power of 2)
  • Let $EF=GF(p^{e})$ where $p^{e}=\min{\left\{k|\geq{}d\right\}}$
  • Map the elements of $A$ to $EF$ homomorphicaly
  • Choose d distinct points $p_{i}$ in $EF$
  • Compute $\forall{}i\in\left\{1..d\right\}:v_{i}=det(A(p_{i}))$
  • Interpolate the polynomial $P(x)$ of minimal degree that fits the points $v_{i}$
  • Map the coefficients of $P(x)$ back to the ground field and return
This was a real bear to get working primarily because it involved working so much with extension fields. I used the GivaroExtension class for this and it turns out that it's still fairly buggy. A lot of algorithms in Linbox don't work for extension fields because of subtle differences in how the interfaces work. In addition, currently you can't construct an extension of an extension using GivaroExtension, you need a different extension class such as GivaroZpz which has some limitations of its own. I'm hoping to clean up some of these issues in the near future.

Having gotten polynomial determinants working the next step is to adapt the Smith Normal Form code to use this functionality. My plan is to implement a ring class that will automatically perform operations mod the determinant polynomial $P(x)$. The SmithFormIliopolus code was intended to work in a similar manner, albeit over the integers. With luck I'll be able to reuse the field extension code which already does everything mod a polynomial, and just tell it to use an arbitrary polynomial instead.

I'm also looking into using chinese remaindering to compute the invariant factors of integer matrices. With luck this should be as simple as wrapping the existing finite field code with a call to the existing CRADomain code, and performing a quick verification step. It should also provide more opportunities for parallelism since this step is embarrassingly parallel.