自作PCのベンチマークデータ
マシンの仕様は以下のとおり。
- CPU: 11700K 第11世代 Corei 7
- マザーボード: Z590 チップセット ( ASUS TUF GAMING Z590 PLUS )
- Memory: DDR4-3200 (14-18-18-38) 32GB x2
- グラフィクス: CPU内蔵GPUを使用
- GPGPU: Tesla K80 ( CPU直結の PCIe4.0 x16 スロットに設置)
- OS: Windows 10 Pro. (build 19043)
Cinebench R23
Multi Core 14791 pts
Single Core 1592 pts
MP Ratio 9.29 x
linpack - intel版
intel oneAPI に含まれる linpack_xeon64.exe を runme_xeon64.batで起動. 実行時間は約11分。出力(win_xeon64.txt のパフォーマンス測定結果部分)を以下に転記。
CPU frequency: 4.886 GHz
Number of CPUs: 1
Number of cores: 8
Number of threads: 8
... (中略) ...
Performance Summary (GFlops)
Size LDA Align. Average Maximal
1000 1000 4 244.8787 264.0864
2000 2000 4 192.6778 197.2063
5000 5008 4 297.0413 298.1333
10000 10000 4 345.4598 345.8902
15000 15000 4 366.3256 369.1550
18000 18008 4 384.3963 384.7625
20000 20016 4 386.3005 388.0600
22000 22008 4 386.3178 387.1281
25000 25000 4 392.2190 393.8578
26000 26000 4 391.3257 393.2621
27000 27000 4 390.7530 390.7530
30000 30000 1 397.3227 397.3227
35000 35000 1 401.5751 401.5751
40000 40000 1 407.3746 407.3746
45000 45000 1 411.4282 411.4282
姫野ベンチ
https://i.riken.jp/supercom/documents/himenobmt/
あらかじめコンパイルされたWin版の実行結果は、 L sizeでは 6145.943 MFLOPS、 M sizeでは 6234.035 MFLOPS、S size では実行時間が短すぎて MFLOPSを計算できず。2002年1月に作られたバイナリなので、シングルコアしか使ってないし最新のSIMD命令も使っていないはずなので遅いはずだが、それでも6GFLOPS.
I:\姫野ベンチ>.\himenoBMTxp_l.exe
mimax= 513 mjmax= 257 mkmax= 257
imax= 512 jmax= 256 kmax= 256
Start rehearsal measurement process.
Measure the performance in 3 times.
MFLOPS: 5805.185 time(s): 0.5781250 8.3494873E-04
Now, start the actual measurement process.
The loop will be excuted in 311 times.
This will take about one minute.
Wait for a while.
Loop executed for 311 times
Gosa : 7.0602610E-04
MFLOPS: 6145.943 time(s): 56.60938
Score based on Pentium III 600MHz : 74.19053
Fortran Pause - Enter command or to continue.
nbody
TESLA K80をベンチマーク。CUDA sample をビルドして使用。
下記サイトの条件に合わせて numbodies=204800 (CPUの場合は20480)を利用. https://www.hpc-technologies.co.jp/gpu-nbody-benchmark
倍精度
2 device 1396.210 double-precision GFLOP/s ( 1台のK80全体を使用時 )
1 device 774.942 double-precision GFLOP/s ( K80の半分を使用時 )
CPU 11.173 double-precision GFLOP/s
単精度
2 device 2895.046 single-precision GFLOP/s
1 device 1629.395 single-precision GFLOP/s
CPU 6.395 single-precision GFLOP/s
---------- 実行例 --------------------------------
C:\ProgramData\NVIDIA Corporation\CUDA Samples\v11.3\bin\win64\Release>nbody.exe -benchmark -numbodies=204800 -numdevices=2 -fp64
Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
-fullscreen (run n-body simulation in fullscreen mode)
-fp64 (use double precision floating point values for simulation)
-hostmem (stores simulation data in host memory)
-benchmark (run benchmark to measure performance)
-numbodies=<N> (number of bodies (>= 1) to run in simulation)
-device=<d> (where d=0,1,2.... for the CUDA device to use)
-numdevices=<i> (where i=(number of CUDA devices > 0) to use for simulation)
-compare (compares simulation results running once on the default GPU and once on the CPU)
-cpu (run n-body simulation on the CPU)
-tipsy=<file.bin> (load a tipsy model file for simulation)
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
number of CUDA devices = 2
> Windowed mode
> Simulation data stored in video memory
> Double precision floating point simulation
> 2 Devices used for simulation
GPU Device 0: "Kepler" with compute capability 3.7
> Compute 3.7 CUDA device: [Tesla K80]
> Compute 3.7 CUDA device: [Tesla K80]
number of bodies = 204800
204800 bodies, total time for 10 iterations: 9012.189 ms
= 46.540 billion interactions per second
= 1396.210 double-precision GFLOP/s at 30 flops per interaction
----------------------------------------------------------------------------