GeForce RTX 2080 Ti 1枚が壊れましたので、 Tesla V100 PCIe 16GB x 1 + GeForce RTX 2080 Ti x 3 という変則的な構成にしてみました。
SDPARA 7.6.1
Tesla V100 x 1枚よりも遅くなっている。。。
[gpdpotrf] ### END n=152928, nb=2048, 2x2 procs, ver 50: 692.637sec --> 1721.210GFlops ###
○参考:chainer 7.2.0
1: Tesla V100
$ time python ../imagenet/train_imagenet.py -a alex -g 0 -E 50 train.txt test.txt
epoch iteration main/loss validation/main/loss main/accuracy validation/main/accuracy lr
4 1000 3.3871 0.279 0.01
9 2000 1.98907 0.515312 0.01
13 3000 1.29902 0.655187 0.01
18 4000 0.841066 0.769531 0.01
23 5000 0.582484 0.830938 0.01
27 6000 0.430382 0.876969 0.01
32 7000 0.307563 0.910406 0.01
37 8000 0.261246 0.925313 0.01
41 9000 0.213145 0.940063 0.01
46 10000 0.177657 0.950125 0.01
real 5m32.839s
user 30m21.093s
sys 1m57.581s
2: GeForce RTX 2080 Ti
$ time python ../imagenet/train_imagenet.py -a alex -g 3 -E 50 train.txt test.txt
epoch iteration main/loss validation/main/loss main/accuracy validation/main/accuracy lr
4 1000 3.41491 0.274812 0.01
9 2000 1.99282 0.511031 0.01
13 3000 1.32556 0.649688 0.01
18 4000 0.86025 0.762531 0.01
23 5000 0.61107 0.825969 0.01
27 6000 0.425172 0.875125 0.01
32 7000 0.318805 0.907125 0.01
37 8000 0.256343 0.926594 0.01
41 9000 0.213341 0.939594 0.01
46 10000 0.17746 0.949625 0.01
real 5m45.645s
user 30m54.924s
sys 1m57.362s
SDPARA 7.6.1
Tesla V100 x 1枚よりも遅くなっている。。。
[gpdpotrf] ### END n=152928, nb=2048, 2x2 procs, ver 50: 692.637sec --> 1721.210GFlops ###
○参考:chainer 7.2.0
1: Tesla V100
$ time python ../imagenet/train_imagenet.py -a alex -g 0 -E 50 train.txt test.txt
epoch iteration main/loss validation/main/loss main/accuracy validation/main/accuracy lr
4 1000 3.3871 0.279 0.01
9 2000 1.98907 0.515312 0.01
13 3000 1.29902 0.655187 0.01
18 4000 0.841066 0.769531 0.01
23 5000 0.582484 0.830938 0.01
27 6000 0.430382 0.876969 0.01
32 7000 0.307563 0.910406 0.01
37 8000 0.261246 0.925313 0.01
41 9000 0.213145 0.940063 0.01
46 10000 0.177657 0.950125 0.01
real 5m32.839s
user 30m21.093s
sys 1m57.581s
2: GeForce RTX 2080 Ti
$ time python ../imagenet/train_imagenet.py -a alex -g 3 -E 50 train.txt test.txt
epoch iteration main/loss validation/main/loss main/accuracy validation/main/accuracy lr
4 1000 3.41491 0.274812 0.01
9 2000 1.99282 0.511031 0.01
13 3000 1.32556 0.649688 0.01
18 4000 0.86025 0.762531 0.01
23 5000 0.61107 0.825969 0.01
27 6000 0.425172 0.875125 0.01
32 7000 0.318805 0.907125 0.01
37 8000 0.256343 0.926594 0.01
41 9000 0.213341 0.939594 0.01
46 10000 0.17746 0.949625 0.01
real 5m45.645s
user 30m54.924s
sys 1m57.362s