- The degree of $\bar{x}$, $N$, needs to be bigger than I originally thought. A safe bound is $N\geq{}2n*deg(A)+deg(B)+1$ although I'm not yet sure whether this bound is sharp.
- The degree of $\vec{b}$, $d_{b}$, can be fairly small, about $2\log_{q}n$. This ensures that the components are probably distinct.
- When computing $\frac{n_{i}}{d}$ such that $\frac{n_{i}}{d}\equiv{}\bar{x_{i}}(mod\;x^{N})$ there's one such for any given choice of $k=deg(d)$. However only one such choice of $k$ will actually solve the system. To figure out which choice is correct I run the extended euclidean algorithm for several rows of $\bar{x}$ simultaneously and mark values of $d$ that occur in more than one row as candidates.
- Given a candidate $d$ the other $n_{i}$'s can be determined quickly by dividing $\frac{\bar{x}_{i}}{d}$
- Any particular $\bar{x}_{i}$ has a chance of producing the wrong $d$ due to cancellation in the Cramer's rule formula $\bar{x}_{i}=\frac{|A_{i}|}{|A|}$. This occurs with probability $\frac{1}{q}$. Checking more $\bar{x}_{i}$'s for candidate $d$'s fixes this. To ensure that we get the correct $d$ as a candidate with probability $p$ we need to check $qbinom(p,n,1-\frac{1}{q})$ where qbinom is the binomial quantile function.
- Sometimes there will be no $\alpha$ such that $|A(-\alpha)|\neq{}0$, even if $A$ is non-singular (over the rational functions). In this case we can still find an answer by finding a random solution to the Block-Toeplitz system: $\begin{pmatrix}A_{d_{b}}&A_{d_{b-1}}&...&A_{0}&0&...\\A_{d_{b+1}}&A_{d_{b}}&...&A_{0}&0&...\\&&\ddots{}&&\\0&...&A_{d_{A}}&A_{d_{A}-1}&...&A_{0}\end{pmatrix}\begin{pmatrix}\bar{x}^{(0)}\\\bar{x}^{(1)}\\\vdots{}\\\bar{x}^{N}\end{pmatrix}=\vec{0}$

# Alex Stachnik's LinBox GSoC Blog

## Monday, 11 August 2014

### Week of 8/4 Update

This week I've been working on on fleshing out the algorithm I described last week for computing the last invariant factor of a polynomial matrix. I've written a high-level implementation in Sage to demonstrate proof of concept. There are a few subtleties I've uncovered:

## Monday, 4 August 2014

### Week of 7/28 Update

This week I've started looking into an alternative way of computing the modulus for the polynomial ring used by SmithFormIliopolus. The problem with PolyDet is that it's painfully slow, mostly due to its heavy use of arithmetic over field extensions. The new approach would avoid the use of extensions and is based on a paper by Dixon (Exact Solutions of Linear Equations Using P-Adic Expansions (1982)). The difference is that here we're working with polynomial, not p-adic representations of integers. The goal is to compute the largest invariant factor of a matrix A. For our purposes, this is actually better than the determinant since it keeps our intermediate polynomials smaller. Here's the algorithm:

- Choose a random polynomial vector $\vec{b}$ over the ground field $\mathbb{F}$
- The degree of $\vec{b}$ (defined as the maximum of the degrees of the elements of $\vec{b}$) should be large enough to sample the range of $A$, but how big exactly remains unclear.
- Choose $\alpha$ to make $A(-\alpha)$ non-singular then let $A\leftarrow{}A(x-\alpha)$
- Solve $A\vec{x}=\vec{b}$ for the first $2n$ terms of a power series solution for $\vec{x}$ for $n$ the sum of the degrees of each row of $A$
- Perform a rational reconstruction to solve $\vec{x}=\frac{\vec{n}}{d}$ for $d$
- Take a few random linear combinations of the elements of $\vec{x}$
- For each linear combination find $f$ and $g$ of minimal degree such that $g\vec{x}\equiv{}f$ (mod $x^{2n}$)
- Take the least common multiple of the various $g$'s
- Return the polynomial $d$

## Monday, 28 July 2014

### Week of 7/21 Update

This week I've been working on making my PolyDet code useful by creating a polynomial ring where all the operations are done modulo another polynomial. I've called this new class GivaroPolyModPoly. It reuses the Givaro Extension class which already does something similar to represent extension field elements modulo an irreducible. All I do is manually set the "irreducible" to some arbitrary polynomial. Now I can supply this ring to SmithFormIliopolus and all the intermediate polynomials will be automagically reduced modulo the polynomial I computed with PolyDet.

I've also been working on implementing parallelism in the Pascal blackbox. Some people on the FFLAS-FFPACK team have written a wrapper for task-based parallelism that we've been thinking of integrating with Linbox. The advantage to task-based parallelism is that it allows for recursive uses of parallelism, whereas things like #omp for do not. In addition the wrapper would allow us to switch to other backends such as Kaapi. I've been looking in to using these wrappers to handle parallel applyLeft() in Pascal.

I've also been working on implementing parallelism in the Pascal blackbox. Some people on the FFLAS-FFPACK team have written a wrapper for task-based parallelism that we've been thinking of integrating with Linbox. The advantage to task-based parallelism is that it allows for recursive uses of parallelism, whereas things like #omp for do not. In addition the wrapper would allow us to switch to other backends such as Kaapi. I've been looking in to using these wrappers to handle parallel applyLeft() in Pascal.

## Monday, 21 July 2014

### Week of 7/14 Update

This week I focused on speeding up computations over GF(3). We have a number of matrices over GF(3) for which we want to compute the invariant factors. Fortunately, Bryan Youse has written a very efficient representation of these matrices called Sliced3. It uses 2 bits to represent each matrix element and operates over n/2 elements in parallel when computing a matrix-matrix multiplication where n is the word size. The code has been developed in parallel with the mainline Linbox source so the interface has diverged a bit from the rest of the Linbox matrix classes. Also, in order to speed up the critical applyLeft() method I had to specialize for this representation so that I use iterators directly rather than constructing temporary submatrices.

I further improved what I'm now calling my Pascal blackbox class which represents matrices of the form $A_{i,j}=c_{i+j}{i+j\choose{}j}$ with elements in GF(3). It turns out it's not necessary to be fast when computing ${n\choose{}k}$, instead you can take advantage of the fractal structure of this matrix. A $3^{n}\times{}3^{n}$ matrix $B_{i,j}={i+j\choose{}j}$ consists of 9 $3^{n-1}\times{}3^{n-1}$ submatrices, $B_{i,j}$ where $B_{1,1}=B_{1,2}=B_{1,3}=B_{2,1}=B_{3,1}$, $B_{2,2}=-A_{1,1}$ and $A_{2,3}=A_{3,3}=A_{3,2}=0$. This gives rise to a natural recursive implementation of applyLeft() which proves to be about 10 times fast when used with the Sliced3 package than my old generic sparse matrix implementation. It also only uses $O(n)$ memory instead of $O(n^{\frac{4}{3}})$

I further improved what I'm now calling my Pascal blackbox class which represents matrices of the form $A_{i,j}=c_{i+j}{i+j\choose{}j}$ with elements in GF(3). It turns out it's not necessary to be fast when computing ${n\choose{}k}$, instead you can take advantage of the fractal structure of this matrix. A $3^{n}\times{}3^{n}$ matrix $B_{i,j}={i+j\choose{}j}$ consists of 9 $3^{n-1}\times{}3^{n-1}$ submatrices, $B_{i,j}$ where $B_{1,1}=B_{1,2}=B_{1,3}=B_{2,1}=B_{3,1}$, $B_{2,2}=-A_{1,1}$ and $A_{2,3}=A_{3,3}=A_{3,2}=0$. This gives rise to a natural recursive implementation of applyLeft() which proves to be about 10 times fast when used with the Sliced3 package than my old generic sparse matrix implementation. It also only uses $O(n)$ memory instead of $O(n^{\frac{4}{3}})$

## Monday, 14 July 2014

### Week of 7/7 Update

This week I've been trying to speed up PolyDet, and to get it working with SmithFormIliopolus. Testing has revealed that, while asymptotically fast, the PolyDet code is way slower than any other code in the invariant factor pipeline. The bottleneck is at the evaluation stage, where we take $b^{2}$ polynomials of degree $d$ and evaluate them at about $bn$ points. By contrast the determinant and interpolation steps are relatively cheap. I suspect that the problem is in the GivaroPoly implementation of polynomial arithmetic. GivaroPoly and GivaroExtension extension both represent a polynomial by a std::vector. This means that a polynomial over an extension field is a vector of vectors and I suspect the doubly nested loops involved are killing performance. I'm looking into FLINT which also implements polynomial arithmetic in pure C and may be faster. Some other Linboxers have also been trying out FLINT and it sounds promising. I'm also looking into automatically switching to a Zech-log implementation which the extension field is sufficiently small, but this probably won't work for very large matrices with correspondingly large values of $d$ (roughly proportional to $\frac{n}{b}$).

I've also written a blackbox class which represents a matrix which we've been trying to get the invariant factors of. It has the structure of Pascal's triangle and the $(i,j)$ element is given by $c_{i+j}{i+j\choose{}j}$. It turns out that, with a bit of dynamic programming, ${n\choose{}k}$ can be computed in just 2 field operations using $O(n)$ space for precomputation. I took advantage of this and the regularity of the matrix to come up with a very space efficient representation that should help for the larger cases.

I've also written a blackbox class which represents a matrix which we've been trying to get the invariant factors of. It has the structure of Pascal's triangle and the $(i,j)$ element is given by $c_{i+j}{i+j\choose{}j}$. It turns out that, with a bit of dynamic programming, ${n\choose{}k}$ can be computed in just 2 field operations using $O(n)$ space for precomputation. I took advantage of this and the regularity of the matrix to come up with a very space efficient representation that should help for the larger cases.

## Monday, 7 July 2014

### Week of 6/30 Update

This week I finished implementing the PolyDet class which computes the determinant of a matrix over a polynomial ring. The basic steps of the algorithm are:

Having gotten polynomial determinants working the next step is to adapt the Smith Normal Form code to use this functionality. My plan is to implement a ring class that will automatically perform operations mod the determinant polynomial $P(x)$. The SmithFormIliopolus code was intended to work in a similar manner, albeit over the integers. With luck I'll be able to reuse the field extension code which already does everything mod a polynomial, and just tell it to use an arbitrary polynomial instead.

I'm also looking into using chinese remaindering to compute the invariant factors of integer matrices. With luck this should be as simple as wrapping the existing finite field code with a call to the existing CRADomain code, and performing a quick verification step. It should also provide more opportunities for parallelism since this step is embarrassingly parallel.

- Given: $F=GF(p)$, $A\in{}F^{n*n}[x]$
- Let $d=\sum_{i=1}^{i=m}\max_{j\in\{1..n\}}\deg(A_{i,j})$
- (Choose $d$ to be the sum of the maximum row degrees)
- Let $d\leftarrow{}\min\left\{k|2^{k}>d+1\right\}$
- (Add 1 to $d$ and round up to the nearest power of 2)
- Let $EF=GF(p^{e})$ where $p^{e}=\min{\left\{k|\geq{}d\right\}}$
- Map the elements of $A$ to $EF$ homomorphicaly
- Choose d distinct points $p_{i}$ in $EF$
- Compute $\forall{}i\in\left\{1..d\right\}:v_{i}=det(A(p_{i}))$
- Interpolate the polynomial $P(x)$ of minimal degree that fits the points $v_{i}$
- Map the coefficients of $P(x)$ back to the ground field and return

Having gotten polynomial determinants working the next step is to adapt the Smith Normal Form code to use this functionality. My plan is to implement a ring class that will automatically perform operations mod the determinant polynomial $P(x)$. The SmithFormIliopolus code was intended to work in a similar manner, albeit over the integers. With luck I'll be able to reuse the field extension code which already does everything mod a polynomial, and just tell it to use an arbitrary polynomial instead.

I'm also looking into using chinese remaindering to compute the invariant factors of integer matrices. With luck this should be as simple as wrapping the existing finite field code with a call to the existing CRADomain code, and performing a quick verification step. It should also provide more opportunities for parallelism since this step is embarrassingly parallel.

## Monday, 30 June 2014

### Week of 6/23 Update

This week I implemented fast polynomial interpolation and evaluation. As I mentioned last week the plan is to compute $det(A)$ for a polynomial matrix $A$ by evaluating $det(A(p))$ at a bunch of points $p$ and interpolating the polynomial that runs through those values. If the largest polynomial in $A$ has degree $d$ then we need $d+1$ points. The naive algorithm for evaluating a polynomial takes $O(d)$ time per point for a total of $O(d^{2})$ field operations. Since the polynomials from block-wiedemann have degree of order $\frac{n}{b}$ (here $n$ is the matrix dimension and very big, $b$ is the blocking factor and fairly small) we definitely needed a faster method. Although we can't do an FFT per se since our fields lack points with the requisite properties, there is a similar algorithm (Modern Computer Algebra, algorithms 10.5 and 10.11) that works for any choice of points. The new PolyInterpolation class can transform a polynomial from the coefficient domain to the point domain and back in $O(d\log^{2}{d})$ time. So far it seems to work pretty well, but it still needs a little polishing up.

I also refactored the code I'd been using to compute invariant factors and wrapped it up in its own class called CoppersmithInvariantFactors. It was getting quite complicated and this should make it significantly easier to use for other applications. I also fixed some old Linbox bugs and cleaned up a matrix representation I'd been using for testing called sparse-map-map-matrix. It's now a fully fledged SparseMatrix type.

The plan now is to automatically extend the base field as needed when doing polynomial interpolation/evaluation so that there are sufficient points to represent the original polynomial. Once I've got that working I'll pull it all together into the determinant class and I'll be able to use that for the SNF computation. My original plan was to write some kind of a wrapper for the polynomial ring GivaroPoly but now I think I might be able to use a different SNF class (SmithFormIliopoulos) that already supports doing computations modulo $det(A)$.

I also refactored the code I'd been using to compute invariant factors and wrapped it up in its own class called CoppersmithInvariantFactors. It was getting quite complicated and this should make it significantly easier to use for other applications. I also fixed some old Linbox bugs and cleaned up a matrix representation I'd been using for testing called sparse-map-map-matrix. It's now a fully fledged SparseMatrix type.

The plan now is to automatically extend the base field as needed when doing polynomial interpolation/evaluation so that there are sufficient points to represent the original polynomial. Once I've got that working I'll pull it all together into the determinant class and I'll be able to use that for the SNF computation. My original plan was to write some kind of a wrapper for the polynomial ring GivaroPoly but now I think I might be able to use a different SNF class (SmithFormIliopoulos) that already supports doing computations modulo $det(A)$.

Subscribe to:
Posts (Atom)