最適化問題に対する超高速&安定計算

大規模最適化問題、グラフ探索、機械学習やデジタルツインなどの研究のお話が中心

QAPLIB tai40b その3

2014年10月10日 21時30分48秒 | Weblog
九大 CX400 での Cholesky 分解の性能値は 128GPU (NVIDIA K20m)で以下の通り。100TFlops を少し超える程度の性能。

tai40b に対する QNN 緩和問題:
1218400 = mDIM
2 = nBLOCK
-1463440 1522 = bLOCKsTRUCT

gpu.tai40b_ZKRW_R3_e0:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 6108.246sec --> 98703.346GFlops ###
gpu.tai40b_ZKRW_R3_e0:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 6035.300sec --> 99896.336GFlops ###
gpu.tai40b_ZKRW_R3_e0:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 6034.922sec --> 99902.586GFlops ###
gpu.tai40b_ZKRW_R3_e0:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 6035.628sec --> 99890.905GFlops ###
gpu.tai40b_ZKRW_R3_e0:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 6016.467sec --> 100209.032GFlops ###
gpu.tai40b_ZKRW_R3_e0:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 6014.609sec --> 100239.991GFlops ###
gpu.tai40b_ZKRW_R3_e0.1:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 6067.971sec --> 99358.476GFlops ###
gpu.tai40b_ZKRW_R3_e0.1:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 6007.004sec --> 100366.892GFlops ###
gpu.tai40b_ZKRW_R3_e0.1:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 6002.535sec --> 100441.628GFlops ###
gpu.tai40b_ZKRW_R3_e0.1:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 6004.056sec --> 100416.174GFlops ###
gpu.tai40b_ZKRW_R3_e0.1:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 6004.984sec --> 100400.654GFlops ###
gpu.tai40b_ZKRW_R3_e0.1:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 6001.672sec --> 100456.069GFlops ###
gpu.tai40b_ZKRW_R3_e0.10:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 6084.231sec --> 99092.939GFlops ###
gpu.tai40b_ZKRW_R3_e0.10:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 6000.982sec --> 100467.622GFlops ###
gpu.tai40b_ZKRW_R3_e0.10:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 6001.369sec --> 100461.132GFlops ###
gpu.tai40b_ZKRW_R3_e0.10:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 5998.202sec --> 100514.176GFlops ###
gpu.tai40b_ZKRW_R3_e0.2:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 6073.922sec --> 99261.130GFlops ###
gpu.tai40b_ZKRW_R3_e0.2:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 5993.609sec --> 100591.208GFlops ###
gpu.tai40b_ZKRW_R3_e0.2:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 5992.482sec --> 100610.116GFlops ###
gpu.tai40b_ZKRW_R3_e0.2:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 5993.416sec --> 100594.440GFlops ###
gpu.tai40b_ZKRW_R3_e0.2:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 5993.983sec --> 100584.928GFlops ###
gpu.tai40b_ZKRW_R3_e0.2:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 5995.609sec --> 100557.656GFlops ###
gpu.tai40b_ZKRW_R3_e0.3:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 6088.760sec --> 99019.240GFlops ###
gpu.tai40b_ZKRW_R3_e0.3:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 5993.546sec --> 100592.266GFlops ###
gpu.tai40b_ZKRW_R3_e0.3:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 5992.918sec --> 100602.810GFlops ###
gpu.tai40b_ZKRW_R3_e0.3:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 5997.957sec --> 100518.291GFlops ###
gpu.tai40b_ZKRW_R3_e0.3:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 5989.645sec --> 100657.781GFlops ###
gpu.tai40b_ZKRW_R3_e0.3:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 5993.478sec --> 100593.405GFlops ###
gpu.tai40b_ZKRW_R3_e0.4:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 6089.100sec --> 99013.698GFlops ###
gpu.tai40b_ZKRW_R3_e0.4:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 5991.683sec --> 100623.538GFlops ###
gpu.tai40b_ZKRW_R3_e0.4:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 5991.717sec --> 100622.962GFlops ###
gpu.tai40b_ZKRW_R3_e0.4:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 5985.861sec --> 100721.400GFlops ###
gpu.tai40b_ZKRW_R3_e0.4:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 5982.920sec --> 100770.915GFlops ###
gpu.tai40b_ZKRW_R3_e0.4:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 5984.565sec --> 100743.222GFlops ###
gpu.tai40b_ZKRW_R3_e0.5:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 6081.938sec --> 99130.294GFlops ###
gpu.tai40b_ZKRW_R3_e0.5:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 6004.970sec --> 100400.885GFlops ###
gpu.tai40b_ZKRW_R3_e0.5:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 6006.461sec --> 100375.967GFlops ###
gpu.tai40b_ZKRW_R3_e0.5:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 6005.662sec --> 100389.319GFlops ###
gpu.tai40b_ZKRW_R3_e0.5:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 6011.675sec --> 100288.913GFlops ###
gpu.tai40b_ZKRW_R3_e0.5:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 6006.091sec --> 100382.152GFlops ###
gpu.tai40b_ZKRW_R3_e0.6:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 6093.153sec --> 98947.846GFlops ###
gpu.tai40b_ZKRW_R3_e0.6:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 6003.723sec --> 100421.739GFlops ###
gpu.tai40b_ZKRW_R3_e0.6:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 5999.893sec --> 100485.850GFlops ###
gpu.tai40b_ZKRW_R3_e0.6:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 5998.815sec --> 100503.906GFlops ###
gpu.tai40b_ZKRW_R3_e0.6:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 5983.322sec --> 100764.155GFlops ###
gpu.tai40b_ZKRW_R3_e0.6:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 5991.518sec --> 100626.312GFlops ###
gpu.tai40b_ZKRW_R3_e0.7:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 6075.362sec --> 99237.608GFlops ###
gpu.tai40b_ZKRW_R3_e0.7:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 5998.667sec --> 100506.386GFlops ###
gpu.tai40b_ZKRW_R3_e0.7:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 5993.682sec --> 100589.982GFlops ###
gpu.tai40b_ZKRW_R3_e0.7:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 5993.228sec --> 100597.604GFlops ###
gpu.tai40b_ZKRW_R3_e0.7:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 5994.083sec --> 100583.250GFlops ###
gpu.tai40b_ZKRW_R3_e0.7:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 5993.857sec --> 100587.047GFlops ###
gpu.tai40b_ZKRW_R3_e0.8:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 6096.865sec --> 98887.596GFlops ###
gpu.tai40b_ZKRW_R3_e0.8:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 6017.823sec --> 100186.454GFlops ###
gpu.tai40b_ZKRW_R3_e0.8:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 6013.860sec --> 100252.472GFlops ###
gpu.tai40b_ZKRW_R3_e0.8:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 6008.871sec --> 100335.704GFlops ###
gpu.tai40b_ZKRW_R3_e0.8:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 6007.036sec --> 100366.367GFlops ###
gpu.tai40b_ZKRW_R3_e0.8:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 6007.593sec --> 100357.061GFlops ###
gpu.tai40b_ZKRW_R3_e0.9:[gpdpotrf] ### END n=1218400, nb=1024, 8x16 procs, ver 50: 6095.993sec --> 98901.745GFlops ###

高性能演算サーバシステム (Fujitsu PRIMERGY CX400)
演算ノード 理論演算性能 345.6GFLOPS
主記憶容量 128GB
メモリバンド幅 102.4GB/s
総ノード数 1476ノード
総プロセッサ (コア) 数 2952プロセッサ (23616コア)
理論演算性能(倍精度実数)の総和 966.2TFLOPS
(CPU: 510.1TF, GPGPU[K20m&K20Xm]: 456.1TF)
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする