最適化問題に対する超高速&安定計算

大規模最適化問題、グラフ探索、機械学習やデジタルツインなどの研究のお話が中心

新 GPU クラスタと SDPARA その6

2016年05月07日 00時54分09秒 | Weblog
昨日と同じ問題を Huawei GPU サーバで計算したところ、以下のように数値エラーが出まくってます。。。 GPU クラスタは安定しているので、こちらで解けば良いのですが、この不安定さの原因は不明です。。。

◯SDPARA 7.6.0-G

◯DSJC1000.5.col に対する SDP 緩和問題(Lovász number)
DSJC1000.5.col.dat-s
249675 = mDIM
1 = nBLOCK
1000 = bLOCKsTRUCT


249668-th diag is adjusted from -177952.517937 to 1e+100
249669-th diag is adjusted from -177079.166513 to 1e+100
249670-th diag is adjusted from -761.748974 to 1e+100
249671-th diag is adjusted from -289539.957242 to 1e+100
249672-th diag is adjusted from -25716.478858 to 1e+100
249673-th diag is adjusted from -299.843801 to 1e+100
249674-th diag is adjusted from -108761.144344 to 1e+100
[pdpotf2(1,1)] [time 67531.79] m=1867 took 73.3ms (29.600GFlops)
[gpdpotrf] end at 2016年 5月 6日 金曜日 01:58:54 JST
[gpdpotrf] ### END n=249675, nb=2048, 2x2 procs, ver 50: 1442.685sec --> 3596.106GFlops ###
sdpa_newton.cpp:3332
sdpa_solve.cpp:165
sdpa_newton.cpp:3328
sdpa_newton.cpp:3332
sdpa_solve.cpp:170
sdpa_solve.cpp:188
13 2.9e-05 3.1e-17 2.7e-20 +3.19e+01 +3.19e+01 1.8e-33 1.8e-33 1.00e+00
Step length is too small. :: line 165 in sdpa_dataset.cpp :: iam 0
cannot move: step length is too short :: line 196 in sdpa_solve.cpp :: iam 0
13 2.9e-05 3.1e-17 2.7e-20 +3.19e+01 +3.19e+01 1.8e-33 1.8e-33 1.00e+00

phase.value = pdFEAS
Iteration = 13
mu = +2.9084063716336001e-05
relative gap = +9.1208013923436385e-04
gap = +2.9084063716283026e-02
digits = +3.0399670010375832e+00
objValPrimal = +3.1902160745601147e+01
objValDual = +3.1873076681884864e+01
p.feas.error = +3.1086244689504383e-15
d.feas.error = +6.1045374693748761e-05
total time = 30395.250248
main loop time = 30393.608871
total time = 30395.250248
file check time = 0.000000
file change time = 0.046797
file read time = 1.594580
SDPA end at [Fri May 6 02:00:21 2016]
ALL TIME = 30673.208034


◯計算サーバ
Huawei RH5885H V3
CPU :Intel Xeon E7-4890 v2 @ 2.80GHz x 4 socket
Memory :2.0TB (32GB LRDIMM x 64 DIMMs)
GPU : NVIDIA Tesla K40m x 4
HDD :2.5-inch 300GB SAS 15000rpm HDD x 2
SSD : ES3000 2.4TB x 2 + 2.5-inch 800GB SSD (Intel DC S3500) x 8
RAID :RAID-0/1/10/5/50/6/60 1GB Cache with Power Protection
NIC :On Board 1GE x 4 port interface card
I/O Box :6 Slot Riser Card x 2、Hot-Plugged Riser Card x 1
PSU :2000W Platinum AC Power Supply Unit x 2
Rail :4U Slide Rail with Cable Management Arm
CUDA : 7.5
OS : CentOS 7.2
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする