This addresses #1123.
This basically involves finding the second smallest eigenvalue and its eigenvector of the Laplacian matrix of a graph. Supposedly, this should be handled by SciPy, but I included an one extra algorithm so that for each of my eight test problems, at least one algorithm solves it within 10 seconds.
Algorithms
To be clear up front, NumPy alone or anything based on dense matrices will not do the job as it will become a memory bomb when used on large graphs. SciPy provides two sparse solvers eigsh
and lobpcg
. eigsh
is the implicitly restarted Lanczos iteration from ARPACK. lobpcg
stands for "locally optimal block preconditioned conjugate gradient". I know about the "PCG" part but not the "LOB" part.
The algorithm not from SciPy is TraceMin. I am using a version specifically designed for finding Fiedler vectors. TraceMin relies on a linear system solver. PCG is an obvious options. But personally I prefer sparse direct solvers as they are robust, and their comfort zone well covers the graph sizes within NetworkX's reach. SciPy bundles SuperLU as splu
. There are better options, but they are shut out by licensing issues—UMFPACK, CHOLMOD and PARDISO all fall into this category.
Benchmarking
Long-n (n = 32768, m = 63472, weighted)
----------------------------------------------------------------------
eigsh (did not converge) 2191.91
lobpcg 0.0302404279791517 38.62
TraceMin_pcg 0.0302404277402188 49.09
TraceMin_chol 0.0302404277401874 0.95 (Cholesky 0.058)
TraceMin_lu 0.0302404277402692 1.11 (LU 0.145)
Random4-n (n = 32768, m = 131056, weighted)
----------------------------------------------------------------------
eigsh 3900.84878990967 29.43
lobpcg 3900.84878990965 4.03
TraceMin_pcg 3900.84878997654 26.30
TraceMin_chol 3900.84878997652 76.62 (Cholesky 21.657)
TraceMin_lu 3900.84878997653 819.56 (LU 615.625)
Square-n (n = 32761, m = 65160, weighted)
----------------------------------------------------------------------
eigsh 3.88833036807499 680.97
lobpcg 3.88833036795032 253.81
TraceMin_pcg 3.88833036815728 49.98
TraceMin_chol 3.88833036815718 1.62 (Cholesky 0.097)
TraceMin_lu 3.88833036815734 1.92 (LU 0.249)
ak6 (n = 32774, m = 49159)
----------------------------------------------------------------------
eigsh (did not converge) 1826.40
lobpcg 5.70035769998342e-08 102.51
TraceMin_pcg 5.72572519105717e-08 80.71
TraceMin_chol 5.72572519439957e-08 0.62 (Cholesky 0.014)
TraceMin_lu 5.72572486979079e-08 0.70 (LU 0.095)
gl6 (n = 32786, m = 93145)
----------------------------------------------------------------------
eigsh 0.00026223266873431 94.72
lobpcg 0.000262232668724498 4.54
TraceMin_pcg 0.000262232668735754 5.78
TraceMin_chol 0.000262232668735774 1.55 (Cholesky 0.283)
TraceMin_lu 0.000262232668735766 3.06 (LU 1.637)
gw6 (n = 32768, m = 93184)
----------------------------------------------------------------------
eigsh 0.152240934977433 1.16
lobpcg 0.152240934977427 2.60
TraceMin_pcg 0.152240934977428 2.56
TraceMin_chol 0.152240934977426 11.75 (Cholesky 7.699)
TraceMin_lu 0.152240934977426 345.57 (LU 331.141)
netgen-6 (n = 8000, m = 14999, weighted)
----------------------------------------------------------------------
eigsh 74.0188255406067 4.29
lobpcg 74.0188255406006 0.73
TraceMin_pcg 74.0188255457315 1.28
TraceMin_chol 74.0188255457307 0.55 (Cholesky 0.126)
TraceMin_lu 74.0188255457307 1.93 (LU 1.299)
wlm6 (n = 8194, m = 179238)
----------------------------------------------------------------------
eigsh 0.00829854600893233 13.50
lobpcg 0.00829854600893219 2.75
TraceMin_pcg 0.00829854600893260 2.92
TraceMin_chol 0.00829854600893253 1.41 (Cholesky 0.089)
TraceMin_lu 0.00829854600893275 1.73 (LU 0.344)
The second column in the computed eigenvalue, and the third column is the time in seconds. The numbers in brackets are the Cholesky/LU factorization times. Cholesky factorization relies on the GPL scikits.sparse
wrapper for CHOLMOD. For this reason, the related code is not in this PR. I suppose that scikits.sparse
can be stripped down to use only the LGPL portion of CHOLMOD and become LGPL itself, but I am not familiar with Cython to do the work.
From the data, it is evident that eigsh
is not nearly as robust as the rest, although it can be fast at times. lobpcg
and TraceMin_chol/lu
both have good and bad cases, while TraceMin_pcg
sits somewhere between the two. It should be noted that for lobpcg
and TraceMin_pcg
, I am using only Jacobi preconditioning. (In separate tests, passing the Cholesky factorization as the preconditioner to lobpcg
improves its bad cases by at least an order magnitude but produces incorrect results in one case.) lobpcg
is very sensitive to the initial guess and can fail to converge if it is not good enough. I produce an estimate of the Fiedler vector using the reverse Cuthill–McKee ordering. The same estimate does not seem to help TraceMin.
The effectiveness of Cholesky/LU factorization highly depends on the structure of the matrix and the matrix reordering algorithm. For example, both gl6
and gw6
are three-dimensional grid graphs. gw6
has a larger bisection width and thus requires long time to factorize. The METIS library used by CHOLMOD is much more effective in discovering small bisections than the multiple minimum degree algorithm used by SuperLU. As a result, CHOLMOD is more than 40 times faster in factorizing gw6
. (As a separate issue, it may be of interest to some users if NetworkX includes graph ordering and partitioning functionalities.)