渋沢さんに、OpenBLASのportsのupdateをもらったのでcommitして、OpenBLASでCPUのパフォーマンス測定をしてみた。
=== CPU: Intel(R) Core(TM)2 Quad CPU Q9650 @ 3.00GHz === Peak 48GFlops $./dlinpack.goto 4000 4000 1 From : 4000 To : 4000 Step = 1 SIZE Residual Decompose Solve Total 4000 : 1.253020e-11 36254.64 MFlops 1314.76 MFlops 35546.68 MFlops approx. 74% of peak performance ./dcholesky.goto 4000 4000 1 From : 4000 To : 4000 Step = 1 M = 4000 : 5.617729e-14 69457.827 MFlops 5.691975e-14 68052.289 MFlops
===
CPU: Intel(R) Core(TM) i5-2310 CPU @ 2.90GHz
===
peak performance:
2.9 * 4 * 8 =92.8 (Turboost off)
102.4 * 4 * 8 =102.4 (Turboost on; not likely to occur)
$ ./dlinpack.goto 4000 4000 1
From : 4000 To : 4000 Step = 1
SIZE Residual Decompose Solve Total
4000 : 1.189238e-11 71590.41 MFlops 3685.36 MFlops 70615.29 MFlops
peak performace ratio: 76%
$ ./dcholesky.goto 4000 4000 1
From : 4000 To : 4000 Step = 1
M = 4000 : 5.617729e-14 69457.827 MFlops 5.691975e-14 68052.289 MFlops
peak performace ratio: 73%
On Opteron 285 (2.6GHz) x2 -> 21GFlops(=2.6*4 * 2) at peak.
$ ./dlinpack.goto 4000 4000 1
From : 4000 To : 4000 Step = 1
SIZE Residual Decompose Solve Total
4000 : 7.699408e-11 17418.13 MFlops 810.45 MFlops 17154.68 MFlops
peak performance ratio: 83%
$ ./dcholesky.goto 4000 4000 1
From : 4000 To : 4000 Step = 1
M = 4000 : 5.598300e-14 16881.149 MFlops 5.173639e-14 15945.122 MFlops
peak performance ratio: 76%