## Monday, 16 June 2014

### Week of 6/9 Update

This week I worked on generating benchmarking results. The super-linear speed-up I observed went away when I ran tests on other multicore machines so I'm going to chalk up the strange results to some quirk of cache behavior. I'm still seeing very good performance from the parallel code with around 50-70% efficiency for most test cases when using 8 cores.

I've done some tests on the effects of block size b on the performance of Block-Wiedemann. As expected, the degree of the matrix min-poly of a sparse, random, $n\times{}n$ matrix $A$ is about $\frac{n}{b}$. Since the degree is roughly proportional to the number of iterations of the algorithms, this means that as $b$ increases we need to do fewer loops to converge. However the amount of work done per loop increases with $b$. Benchmarking shows that the optimal choice of $b$ to compute the min-poly is small, around 5 for $n=5000$. The opportunity for parallelism increases with $b$. With enough cores the increased parallelism is just about enough to cancel out the additional work introduced by choosing a large value for $b$. This is important because when computing invariant factors it's necessary to take a much larger $b$. This will likely also be the case for computing the min-poly of certain pathological matrices. All these results look quite promising so far as they suggest parallel programming is the way to go for finding invariant factors.