前回と同じ 32ノード, 64 GPU での Graph500 の実行。64GPU でホストメモリの負担が減ったために Scale 29 でも動作した。また TEPS 値が上がりましたが、Scale 30 はメモリ不足で実行は無理でした。
◯ Graph500 & Scale 28
median_TEPS: 1.15001e+10
◯ Graph500 & Scale 29
median_TEPS: 1.2173e+10
============= Result ==============
SCALE: 29
edgefactor: 16
NBFS: 64
graph_generation: 33.4577598572
num_mpi_processes: 64
construction_time: 52.3046398163
redistribution_time: 5.70524597168
min_time: 0.661997
firstquartile_time: 0.68565
median_time: 0.705649
thirdquartile_time: 0.729336
max_time: 0.834971
mean_time: 0.711857
stddev_time: 0.0367017
min_nedge: 8589858508
firstquartile_nedge: 8589858508
median_nedge: 8589858508
thirdquartile_nedge: 8589858508
max_nedge: 8589858508
mean_nedge: 8589858508
stddev_nedge: 0
min_TEPS: 1.02876e+10
firstquartile_TEPS: 1.17776e+10
median_TEPS: 1.2173e+10
thirdquartile_TEPS: 1.25281e+10
max_TEPS: 1.29757e+10
harmonic_mean_TEPS: 1.20668e+10
harmonic_stddev_TEPS: 7.83821e+07
min_validate: 8.90402
firstquartile_validate: 9.08575
median_validate: 9.19315
thirdquartile_validate: 9.3226
max_validate: 10.2834
mean_validate: 9.22924
stddev_validate: 0.240041
TSUBAME-KFC - LX 1U-4GPU/104Re-1G Cluster, Intel Xeon E5-2620v2 6C 2.100GHz, Infiniband FDR, NVIDIA K20x
◯ Graph500 & Scale 28
median_TEPS: 1.15001e+10
◯ Graph500 & Scale 29
median_TEPS: 1.2173e+10
============= Result ==============
SCALE: 29
edgefactor: 16
NBFS: 64
graph_generation: 33.4577598572
num_mpi_processes: 64
construction_time: 52.3046398163
redistribution_time: 5.70524597168
min_time: 0.661997
firstquartile_time: 0.68565
median_time: 0.705649
thirdquartile_time: 0.729336
max_time: 0.834971
mean_time: 0.711857
stddev_time: 0.0367017
min_nedge: 8589858508
firstquartile_nedge: 8589858508
median_nedge: 8589858508
thirdquartile_nedge: 8589858508
max_nedge: 8589858508
mean_nedge: 8589858508
stddev_nedge: 0
min_TEPS: 1.02876e+10
firstquartile_TEPS: 1.17776e+10
median_TEPS: 1.2173e+10
thirdquartile_TEPS: 1.25281e+10
max_TEPS: 1.29757e+10
harmonic_mean_TEPS: 1.20668e+10
harmonic_stddev_TEPS: 7.83821e+07
min_validate: 8.90402
firstquartile_validate: 9.08575
median_validate: 9.19315
thirdquartile_validate: 9.3226
max_validate: 10.2834
mean_validate: 9.22924
stddev_validate: 0.240041
TSUBAME-KFC - LX 1U-4GPU/104Re-1G Cluster, Intel Xeon E5-2620v2 6C 2.100GHz, Infiniband FDR, NVIDIA K20x