CUDA 9.1 + cuDNN 7.0.5 + Chainer 3.3.0 + cupy 2.3.0 + Python 3.6.4 の組合せで動作させてみました。。。
NVIDIA Quadro K620 でも、CPU ( Xeon(R) CPU E5-2687W v4 @ 3.00GHz) 24コアよりもかなり速いようです。
○ CPU 実行のみ
$ time python ./train_mnist.py
GPU: -1
# unit: 1000
# Minibatch-size: 100
# epoch: 20
epoch main/loss validation/main/loss main/accuracy validation/main/accuracy elapsed_time
1 0.186352 0.0989221 0.943083 0.9688 11.391
2 0.0723007 0.095402 0.977817 0.9677 24.3193
3 0.0470672 0.0662951 0.985 0.9793 37.6937
4 0.0351649 0.0791272 0.9888 0.9769 51.5521
5 0.0276496 0.0922011 0.99105 0.9757 65.5114
6 0.0252488 0.0776275 0.992167 0.9777 79.9253
7 0.0230062 0.0786154 0.992467 0.9784 94.551
8 0.0176244 0.0796685 0.994283 0.9803 109.435
9 0.0158112 0.0906824 0.99505 0.9787 124.785
10 0.0149168 0.0736694 0.9951 0.9815 140.244
11 0.012578 0.0810864 0.996033 0.983 156.146
12 0.0159714 0.0898045 0.995133 0.9796 172.44
13 0.012258 0.106515 0.996233 0.979 189.184
14 0.00974748 0.0941018 0.99705 0.9808 206.713
15 0.0124353 0.100479 0.99645 0.9801 224.486
16 0.0096153 0.101046 0.997033 0.9818 242.873
17 0.00955086 0.115592 0.99685 0.9811 261.603
18 0.0126071 0.0961814 0.996433 0.9828 280.604
19 0.00992429 0.121901 0.997067 0.9813 299.934
20 0.0101419 0.112412 0.997083 0.9815 320.127
real 5m22.323s
user 46m27.591s
sys 80m18.202s
○ CPU + GPU 実行
$ time python ./train_mnist.py -g 0
GPU: 0
# unit: 1000
# Minibatch-size: 100
# epoch: 20
epoch main/loss validation/main/loss main/accuracy validation/main/accuracy elapsed_time
1 0.193693 0.109649 0.941067 0.9655 3.58263
2 0.0716577 0.0860594 0.977683 0.9742 6.71554
3 0.0482691 0.0765099 0.984998 0.9774 9.71762
4 0.0343706 0.0782541 0.988765 0.9794 12.7348
5 0.0281391 0.0784323 0.990998 0.9796 15.7445
6 0.0226691 0.0779455 0.992148 0.9785 18.9664
7 0.0197018 0.0815052 0.993465 0.9806 22.1346
8 0.0199337 0.0781468 0.993482 0.9805 25.2467
9 0.0146195 0.0875199 0.995649 0.9797 28.2768
10 0.0150823 0.0881077 0.995282 0.982 31.2886
11 0.0145551 0.118743 0.995316 0.9773 34.2844
12 0.0131148 0.0961625 0.995948 0.981 37.3095
13 0.0128889 0.0710114 0.995932 0.9846 40.3107
14 0.00959946 0.0782425 0.997299 0.9842 43.322
15 0.0119166 0.11542 0.996298 0.9785 46.3243
16 0.0123473 0.113495 0.996298 0.9789 49.3341
17 0.0118735 0.0879922 0.996499 0.9832 52.3695
18 0.00768105 0.120469 0.997966 0.9791 55.3768
19 0.0126242 0.0907392 0.996382 0.9834 58.4051
20 0.00838919 0.115338 0.997599 0.9807 61.4162
real 1m4.962s
user 1m6.165s
sys 0m13.768s
◯計算サーバ
CPU : Intel(R) Xeon(R) CPU E5-2687W v4 @ 3.00GHz x 2
メモリ:512GB
GPU : NIVIDIA Quadra K620 x 1
OS : CentOS 7.4
NVIDIA Quadro K620 でも、CPU ( Xeon(R) CPU E5-2687W v4 @ 3.00GHz) 24コアよりもかなり速いようです。
○ CPU 実行のみ
$ time python ./train_mnist.py
GPU: -1
# unit: 1000
# Minibatch-size: 100
# epoch: 20
epoch main/loss validation/main/loss main/accuracy validation/main/accuracy elapsed_time
1 0.186352 0.0989221 0.943083 0.9688 11.391
2 0.0723007 0.095402 0.977817 0.9677 24.3193
3 0.0470672 0.0662951 0.985 0.9793 37.6937
4 0.0351649 0.0791272 0.9888 0.9769 51.5521
5 0.0276496 0.0922011 0.99105 0.9757 65.5114
6 0.0252488 0.0776275 0.992167 0.9777 79.9253
7 0.0230062 0.0786154 0.992467 0.9784 94.551
8 0.0176244 0.0796685 0.994283 0.9803 109.435
9 0.0158112 0.0906824 0.99505 0.9787 124.785
10 0.0149168 0.0736694 0.9951 0.9815 140.244
11 0.012578 0.0810864 0.996033 0.983 156.146
12 0.0159714 0.0898045 0.995133 0.9796 172.44
13 0.012258 0.106515 0.996233 0.979 189.184
14 0.00974748 0.0941018 0.99705 0.9808 206.713
15 0.0124353 0.100479 0.99645 0.9801 224.486
16 0.0096153 0.101046 0.997033 0.9818 242.873
17 0.00955086 0.115592 0.99685 0.9811 261.603
18 0.0126071 0.0961814 0.996433 0.9828 280.604
19 0.00992429 0.121901 0.997067 0.9813 299.934
20 0.0101419 0.112412 0.997083 0.9815 320.127
real 5m22.323s
user 46m27.591s
sys 80m18.202s
○ CPU + GPU 実行
$ time python ./train_mnist.py -g 0
GPU: 0
# unit: 1000
# Minibatch-size: 100
# epoch: 20
epoch main/loss validation/main/loss main/accuracy validation/main/accuracy elapsed_time
1 0.193693 0.109649 0.941067 0.9655 3.58263
2 0.0716577 0.0860594 0.977683 0.9742 6.71554
3 0.0482691 0.0765099 0.984998 0.9774 9.71762
4 0.0343706 0.0782541 0.988765 0.9794 12.7348
5 0.0281391 0.0784323 0.990998 0.9796 15.7445
6 0.0226691 0.0779455 0.992148 0.9785 18.9664
7 0.0197018 0.0815052 0.993465 0.9806 22.1346
8 0.0199337 0.0781468 0.993482 0.9805 25.2467
9 0.0146195 0.0875199 0.995649 0.9797 28.2768
10 0.0150823 0.0881077 0.995282 0.982 31.2886
11 0.0145551 0.118743 0.995316 0.9773 34.2844
12 0.0131148 0.0961625 0.995948 0.981 37.3095
13 0.0128889 0.0710114 0.995932 0.9846 40.3107
14 0.00959946 0.0782425 0.997299 0.9842 43.322
15 0.0119166 0.11542 0.996298 0.9785 46.3243
16 0.0123473 0.113495 0.996298 0.9789 49.3341
17 0.0118735 0.0879922 0.996499 0.9832 52.3695
18 0.00768105 0.120469 0.997966 0.9791 55.3768
19 0.0126242 0.0907392 0.996382 0.9834 58.4051
20 0.00838919 0.115338 0.997599 0.9807 61.4162
real 1m4.962s
user 1m6.165s
sys 0m13.768s
◯計算サーバ
CPU : Intel(R) Xeon(R) CPU E5-2687W v4 @ 3.00GHz x 2
メモリ:512GB
GPU : NIVIDIA Quadra K620 x 1
OS : CentOS 7.4