Alex Stachnik's LinBox GSoC Blog

Monday, 28 July 2014

Week of 7/21 Update

This week I've been working on making my PolyDet code useful by creating a polynomial ring where all the operations are done modulo another polynomial. I've called this new class GivaroPolyModPoly. It reuses the Givaro Extension class which already does something similar to represent extension field elements modulo an irreducible. All I do is manually set the "irreducible" to some arbitrary polynomial. Now I can supply this ring to SmithFormIliopolus and all the intermediate polynomials will be automagically reduced modulo the polynomial I computed with PolyDet.

I've also been working on implementing parallelism in the Pascal blackbox. Some people on the FFLAS-FFPACK team have written a wrapper for task-based parallelism that we've been thinking of integrating with Linbox. The advantage to task-based parallelism is that it allows for recursive uses of parallelism, whereas things like #omp for do not. In addition the wrapper would allow us to switch to other backends such as Kaapi. I've been looking in to using these wrappers to handle parallel applyLeft() in Pascal.

Monday, 21 July 2014

Week of 7/14 Update

This week I focused on speeding up computations over GF(3). We have a number of matrices over GF(3) for which we want to compute the invariant factors. Fortunately, Bryan Youse has written a very efficient representation of these matrices called Sliced3. It uses 2 bits to represent each matrix element and operates over n/2 elements in parallel when computing a matrix-matrix multiplication where n is the word size. The code has been developed in parallel with the mainline Linbox source so the interface has diverged a bit from the rest of the Linbox matrix classes. Also, in order to speed up the critical applyLeft() method I had to specialize for this representation so that I use iterators directly rather than constructing temporary submatrices.

I further improved what I'm now calling my Pascal blackbox class which represents matrices of the form $A_{i,j}=c_{i+j}{i+j\choose{}j}$ with elements in GF(3). It turns out it's not necessary to be fast when computing ${n\choose{}k}$, instead you can take advantage of the fractal structure of this matrix. A $3^{n}\times{}3^{n}$ matrix $B_{i,j}={i+j\choose{}j}$ consists of 9 $3^{n-1}\times{}3^{n-1}$ submatrices, $B_{i,j}$ where $B_{1,1}=B_{1,2}=B_{1,3}=B_{2,1}=B_{3,1}$, $B_{2,2}=-A_{1,1}$ and $A_{2,3}=A_{3,3}=A_{3,2}=0$. This gives rise to a natural recursive implementation of applyLeft() which proves to be about 10 times fast when used with the Sliced3 package than my old generic sparse matrix implementation. It also only uses $O(n)$ memory instead of $O(n^{\frac{4}{3}})$

Monday, 14 July 2014

Week of 7/7 Update

This week I've been trying to speed up PolyDet, and to get it working with SmithFormIliopolus. Testing has revealed that, while asymptotically fast, the PolyDet code is way slower than any other code in the invariant factor pipeline. The bottleneck is at the evaluation stage, where we take $b^{2}$ polynomials of degree $d$ and evaluate them at about $bn$ points. By contrast the determinant and interpolation steps are relatively cheap. I suspect that the problem is in the GivaroPoly implementation of polynomial arithmetic. GivaroPoly and GivaroExtension extension both represent a polynomial by a std::vector. This means that a polynomial over an extension field is a vector of vectors and I suspect the doubly nested loops involved are killing performance. I'm looking into FLINT which also implements polynomial arithmetic in pure C and may be faster. Some other Linboxers have also been trying out FLINT and it sounds promising. I'm also looking into automatically switching to a Zech-log implementation which the extension field is sufficiently small, but this probably won't work for very large matrices with correspondingly large values of $d$ (roughly proportional to $\frac{n}{b}$).

I've also written a blackbox class which represents a matrix which we've been trying to get the invariant factors of. It has the structure of Pascal's triangle and the $(i,j)$ element is given by $c_{i+j}{i+j\choose{}j}$. It turns out that, with a bit of dynamic programming, ${n\choose{}k}$ can be computed in just 2 field operations using $O(n)$ space for precomputation. I took advantage of this and the regularity of the matrix to come up with a very space efficient representation that should help for the larger cases.

Monday, 7 July 2014

Week of 6/30 Update

This week I finished implementing the PolyDet class which computes the determinant of a matrix over a polynomial ring. The basic steps of the algorithm are:

Given: $F=GF(p)$, $A\in{}F^{n*n}[x]$
Let $d=\sum_{i=1}^{i=m}\max_{j\in\{1..n\}}\deg(A_{i,j})$

(Choose $d$ to be the sum of the maximum row degrees)

Let $d\leftarrow{}\min\left\{k|2^{k}>d+1\right\}$

(Add 1 to $d$ and round up to the nearest power of 2)

Let $EF=GF(p^{e})$ where $p^{e}=\min{\left\{k|\geq{}d\right\}}$
Map the elements of $A$ to $EF$ homomorphicaly
Choose d distinct points $p_{i}$ in $EF$
Compute $\forall{}i\in\left\{1..d\right\}:v_{i}=det(A(p_{i}))$
Interpolate the polynomial $P(x)$ of minimal degree that fits the points $v_{i}$
Map the coefficients of $P(x)$ back to the ground field and return

This was a real bear to get working primarily because it involved working so much with extension fields. I used the GivaroExtension class for this and it turns out that it's still fairly buggy. A lot of algorithms in Linbox don't work for extension fields because of subtle differences in how the interfaces work. In addition, currently you can't construct an extension of an extension using GivaroExtension, you need a different extension class such as GivaroZpz which has some limitations of its own. I'm hoping to clean up some of these issues in the near future.

Having gotten polynomial determinants working the next step is to adapt the Smith Normal Form code to use this functionality. My plan is to implement a ring class that will automatically perform operations mod the determinant polynomial $P(x)$. The SmithFormIliopolus code was intended to work in a similar manner, albeit over the integers. With luck I'll be able to reuse the field extension code which already does everything mod a polynomial, and just tell it to use an arbitrary polynomial instead.

I'm also looking into using chinese remaindering to compute the invariant factors of integer matrices. With luck this should be as simple as wrapping the existing finite field code with a call to the existing CRADomain code, and performing a quick verification step. It should also provide more opportunities for parallelism since this step is embarrassingly parallel.

Monday, 30 June 2014

Week of 6/23 Update

This week I implemented fast polynomial interpolation and evaluation. As I mentioned last week the plan is to compute $det(A)$ for a polynomial matrix $A$ by evaluating $det(A(p))$ at a bunch of points $p$ and interpolating the polynomial that runs through those values. If the largest polynomial in $A$ has degree $d$ then we need $d+1$ points. The naive algorithm for evaluating a polynomial takes $O(d)$ time per point for a total of $O(d^{2})$ field operations. Since the polynomials from block-wiedemann have degree of order $\frac{n}{b}$ (here $n$ is the matrix dimension and very big, $b$ is the blocking factor and fairly small) we definitely needed a faster method. Although we can't do an FFT per se since our fields lack points with the requisite properties, there is a similar algorithm (Modern Computer Algebra, algorithms 10.5 and 10.11) that works for any choice of points. The new PolyInterpolation class can transform a polynomial from the coefficient domain to the point domain and back in $O(d\log^{2}{d})$ time. So far it seems to work pretty well, but it still needs a little polishing up.

I also refactored the code I'd been using to compute invariant factors and wrapped it up in its own class called CoppersmithInvariantFactors. It was getting quite complicated and this should make it significantly easier to use for other applications. I also fixed some old Linbox bugs and cleaned up a matrix representation I'd been using for testing called sparse-map-map-matrix. It's now a fully fledged SparseMatrix type.

The plan now is to automatically extend the base field as needed when doing polynomial interpolation/evaluation so that there are sufficient points to represent the original polynomial. Once I've got that working I'll pull it all together into the determinant class and I'll be able to use that for the SNF computation. My original plan was to write some kind of a wrapper for the polynomial ring GivaroPoly but now I think I might be able to use a different SNF class (SmithFormIliopoulos) that already supports doing computations modulo $det(A)$.

Wednesday, 25 June 2014

Python Implementation of The Block-Wiedemann Algorithm

I've created a repository for my Python implementation of the Block-Wiedemann and scalar Wiedemann algorithms. It's available here. I must warn anyone interested in the code that it was never intended for production use and is both buggy and horrendously slow. Nevertheless, it has the advantage of being much simpler than the Linbox implementation, and may be useful to anyone trying to understand how the algorithm works.

Monday, 23 June 2014

Week of 6/16 Update

This week I worked on improving parallel performance and on computing the Smith Normal Form (SNF) of matrix min-polys. I've finally got the entire pipeline of code up and running. Given a matrix supplied in Matrix Market Format I have a single program that will read it, compute the matrix min-poly in parallel, change the representation of the min-poly to a matrix entries over a polynomial ring, compute the SNF of that matrix and return the main diagonal. This particular program is in the Linbox repository under benchmarks/invariant-factors-benchmark. So far the results look good, with the last entry being the min-poly and the others looking like the correct invariant factors.

While this is a big step forward there's still a lot that needs to be done. In particular the algorithm I'm using for computing the SNF (The Kannan-Bachem algorithm) works great for matrices over the integers, but is horrendously slow for polynomial rings. I'm 90% sure that the problem is that it's generating intermediate values that are polynomials of enormous degree, and spending way too much time manipulating those. Fortunately, there's a way around this blow-up in polynomial degree, and that's to use the fact that all the computations of the SNF of $A$ can be done modulo $det(A)$. Unfortunately Linbox doesn't have a good way of finding the determinant of a polynomial matrix right now so that's my next step. Once I've got $det(A)$, I'll either extend the Kannan-Bachem class or just write a wrapper for the polynomial ring class that does everything modulo a given polynomial.

My basic strategy for computing $det(A)$ is to evaluate $A$ at $n$ points (over a field extension if there aren't enough distinct points in the base field), then compute $n$ determinant of these scalar matrices and finally interpolate the polynomial that goes through those points. This shouldn't be too hard, but the interpolation step might be a bit tricky since the usual way to do this (Inverse FFT) requires a root of unity with an order that's greater than $n$ and also a power of 2. This isn't always easy to find over an arbitrary field so I'm looking into alternative algorithms with less strict requirements for the interpolation points.