kernelの性能
nVidia C1060, cuBLAS 3.1
正方行列、で行の数が64の倍数でだいたい75GFlopsでた。
正方行列、でそうでない場合は 53GFlopsでた。
ただし、これはカーネルのみの性能で、マトリックスのGPUへの転送速度は勘案してない。
[よーわからんがC1060では、magmaBLASが走らない]
以下生データ
$ ./testing_dgemm
This is a MAGMA 0.3 DGEMM Routine for Fermi GPUs.
In this version matrix sizes have to be divisible by 64
Usage:
./testing_dgemm N
N magmablas0.3 GFLops/s cudablas-3.1 GFlops/s error
==========================================================================
512 2684.35456 58.86742 3.444238e+01
1088 64395.67360 73.70245 5.198730e+01
1664 230372.14720 74.34717 6.925781e+01
2240 561971.20000 74.88007 8.645703e+01
2816 1116523.72480 74.96366 1.005762e+02
3392 1903766.45307 75.14409 1.018477e+02
3968 3047622.20644 75.14200 1.124590e+02
4544 4691211.05920 75.14489 1.160225e+02
5120 6547206.24390 75.22300 1.289854e+02
$ ./testing_dgemm 2049
This is a MAGMA 0.3 DGEMM Routine for Fermi GPUs.
In this version matrix sizes have to be divisible by 64
Usage:
./testing_dgemm N
N magmablas0.3 GFLops/s cudablas-3.1 GFlops/s error
==========================================================================
Dimension Should Be multiple of 64
Calling cublasDgemm
2049 52.81656 52.81397 0.000000e+00
$ ./testing_dgemm 4097
This is a MAGMA 0.3 DGEMM Routine for Fermi GPUs.
In this version matrix sizes have to be divisible by 64
Usage:
./testing_dgemm N
N magmablas0.3 GFLops/s cudablas-3.1 GFlops/s error
==========================================================================
Dimension Should Be multiple of 64
Calling cublasDgemm
4097 53.31029 53.31349 0.000000e+00
nVidia C1060, cuBLAS 3.1
正方行列、で行の数が64の倍数でだいたい75GFlopsでた。
正方行列、でそうでない場合は 53GFlopsでた。
ただし、これはカーネルのみの性能で、マトリックスのGPUへの転送速度は勘案してない。
[よーわからんがC1060では、magmaBLASが走らない]
以下生データ
$ ./testing_dgemm
This is a MAGMA 0.3 DGEMM Routine for Fermi GPUs.
In this version matrix sizes have to be divisible by 64
Usage:
./testing_dgemm N
N magmablas0.3 GFLops/s cudablas-3.1 GFlops/s error
==========================================================================
512 2684.35456 58.86742 3.444238e+01
1088 64395.67360 73.70245 5.198730e+01
1664 230372.14720 74.34717 6.925781e+01
2240 561971.20000 74.88007 8.645703e+01
2816 1116523.72480 74.96366 1.005762e+02
3392 1903766.45307 75.14409 1.018477e+02
3968 3047622.20644 75.14200 1.124590e+02
4544 4691211.05920 75.14489 1.160225e+02
5120 6547206.24390 75.22300 1.289854e+02
$ ./testing_dgemm 2049
This is a MAGMA 0.3 DGEMM Routine for Fermi GPUs.
In this version matrix sizes have to be divisible by 64
Usage:
./testing_dgemm N
N magmablas0.3 GFLops/s cudablas-3.1 GFlops/s error
==========================================================================
Dimension Should Be multiple of 64
Calling cublasDgemm
2049 52.81656 52.81397 0.000000e+00
$ ./testing_dgemm 4097
This is a MAGMA 0.3 DGEMM Routine for Fermi GPUs.
In this version matrix sizes have to be divisible by 64
Usage:
./testing_dgemm N
N magmablas0.3 GFLops/s cudablas-3.1 GFlops/s error
==========================================================================
Dimension Should Be multiple of 64
Calling cublasDgemm
4097 53.31029 53.31349 0.000000e+00
※コメント投稿者のブログIDはブログ作成者のみに通知されます