Monday 9 June 2014

Week of 6/2 Update

This week I sorted out most of the issues I'd been having when using block-coppersmith-domain (Yuhasz's Block-Wiedemann implementation). The biggest problem was that delta, a parameter which basically determines the maximum possible degree of the candidate min-poly, was being set in something of a kludgey way. I committed a patch which chooses delta large enough that it should always find the min-poly and implements early termination. The idea behind early termination is that if the discrepancy of the candidate min-poly has been zero for O(1) iterations of the algorithms then there's a good chance that it's the true minimal polynomial. For all but the most pathological cases we can perservere for 20 or so iterations and be basically certain that the min-poly has been found. Preliminary testing suggests that this technique works extremely well and finds the min-poly for large matrices quite quickly.

I finished my Python implementation of Block-Wiedemann and confirmed that it gives the same results as the Linbox version. It was educational to write and may be of use later for testing of the parallel code. I'll post the code in a later update in case someone finds it useful. (Update: The code is available here)

I've started parallelizing the Block-Wiedemann code and have some results for what steps are the most time consuming. Computing the discrepancy, generating blocks and updating the min-poly (by matrix-matrix multiplication) all take significant time. The choice of block size b and number of non-zeroes nnz determine which is the most time-consuming step. Fortunately these can all be easily parallelized and I've already gotten great results. I'm hesitant to declare victory though since something fishy is going on. After checking all the obvious mistakes (faulty timers, unrealistically bad sequential code, etc.) I'm seeing consistent, super-linear speed-up in one particular part of the parallel code. When updating the min-poly I get a speed-up of more than 2 when going from one processor to 2. It might just be some funny cache performance thing, but I'll investigate further to see if there isn't something wrong with the sequential code.

No comments:

Post a Comment