今回は FP16 の使用による性能の差を見てみました。やり方が悪いのかもしれませんが、差があまり出てません。。。
◯ Tesla V100 : FP32 使用
$ time python ../imagenet/train_imagenet.py -a alex -g 0 -E 50 train.txt test.txt
epoch iteration main/loss validation/main/loss main/accuracy validation/main/accuracy lr
4 1000 3.35799 0.286312 0.01
9 2000 2.00128 0.511656 0.01
13 3000 1.3103 0.657094 0.01
18 4000 0.874036 0.759281 0.01
23 5000 0.598393 0.829406 0.01
27 6000 0.412055 0.881 0.01
32 7000 0.320549 0.909031 0.01
37 8000 0.26741 0.924375 0.01
41 9000 0.204615 0.940969 0.01
46 10000 0.179809 0.949094 0.01
real 7m31.958s
user 43m40.265s
sys 2m8.365s
◯ Tesla V100 : FP16 使用
$ time python ../imagenet/train_imagenet.py -a alex_fp16 -g 0 -E 50 train.txt test.txt
epoch iteration main/loss validation/main/loss main/accuracy validation/main/accuracy lr
4 1000 3.382 0.26975 0.01
9 2000 2.07 0.4945 0.01
13 3000 1.336 0.644 0.01
18 4000 0.8875 0.771 0.01
23 5000 0.621 0.856 0.01
27 6000 0.4445 0.918 0.01
32 7000 0.33525 0.953 0.01
37 8000 0.255875 0.97 0.01
41 9000 0.23675 0.974 0.01
46 10000 0.183125 0.9845 0.01
real 7m20.667s
user 45m0.828s
sys 2m15.185s
◯ GeForce GTX 1080 Ti : FP32 使用
$ time python ../imagenet/train_imagenet.py -a alex -g 1 -E 50 train.txt test.txt
epoch iteration main/loss validation/main/loss main/accuracy validation/main/accuracy lr
4 1000 3.46576 0.267594 0.01
9 2000 2.02987 0.505938 0.01
13 3000 1.33035 0.651531 0.01
18 4000 0.875972 0.757469 0.01
23 5000 0.593854 0.831187 0.01
27 6000 0.429627 0.875313 0.01
32 7000 0.304163 0.911156 0.01
37 8000 0.26019 0.925656 0.01
41 9000 0.212174 0.93975 0.01
46 10000 0.17245 0.949687 0.01
real 7m38.391s
user 45m40.814s
sys 2m2.824s
◯ GeForce GTX 1080 Ti : FP16 使用
$ time python ../imagenet/train_imagenet.py -a alex_fp16 -g 1 -E 50 train.txt test.txt
epoch iteration main/loss validation/main/loss main/accuracy validation/main/accuracy lr
4 1000 3.426 0.26625 0.01
9 2000 2.021 0.50325 0.01
13 3000 1.314 0.6385 0.01
18 4000 0.879 0.781 0.01
23 5000 0.603 0.8665 0.01
27 6000 0.41625 0.93 0.01
32 7000 0.321 0.95 0.01
37 8000 0.254 0.9695 0.01
41 9000 0.223 0.978 0.01
46 10000 0.19 0.983 0.01
real 7m0.676s
user 37m13.339s
sys 1m48.444s
◯ Tesla V100 : FP32 使用
$ time python ../imagenet/train_imagenet.py -a alex -g 0 -E 50 train.txt test.txt
epoch iteration main/loss validation/main/loss main/accuracy validation/main/accuracy lr
4 1000 3.35799 0.286312 0.01
9 2000 2.00128 0.511656 0.01
13 3000 1.3103 0.657094 0.01
18 4000 0.874036 0.759281 0.01
23 5000 0.598393 0.829406 0.01
27 6000 0.412055 0.881 0.01
32 7000 0.320549 0.909031 0.01
37 8000 0.26741 0.924375 0.01
41 9000 0.204615 0.940969 0.01
46 10000 0.179809 0.949094 0.01
real 7m31.958s
user 43m40.265s
sys 2m8.365s
◯ Tesla V100 : FP16 使用
$ time python ../imagenet/train_imagenet.py -a alex_fp16 -g 0 -E 50 train.txt test.txt
epoch iteration main/loss validation/main/loss main/accuracy validation/main/accuracy lr
4 1000 3.382 0.26975 0.01
9 2000 2.07 0.4945 0.01
13 3000 1.336 0.644 0.01
18 4000 0.8875 0.771 0.01
23 5000 0.621 0.856 0.01
27 6000 0.4445 0.918 0.01
32 7000 0.33525 0.953 0.01
37 8000 0.255875 0.97 0.01
41 9000 0.23675 0.974 0.01
46 10000 0.183125 0.9845 0.01
real 7m20.667s
user 45m0.828s
sys 2m15.185s
◯ GeForce GTX 1080 Ti : FP32 使用
$ time python ../imagenet/train_imagenet.py -a alex -g 1 -E 50 train.txt test.txt
epoch iteration main/loss validation/main/loss main/accuracy validation/main/accuracy lr
4 1000 3.46576 0.267594 0.01
9 2000 2.02987 0.505938 0.01
13 3000 1.33035 0.651531 0.01
18 4000 0.875972 0.757469 0.01
23 5000 0.593854 0.831187 0.01
27 6000 0.429627 0.875313 0.01
32 7000 0.304163 0.911156 0.01
37 8000 0.26019 0.925656 0.01
41 9000 0.212174 0.93975 0.01
46 10000 0.17245 0.949687 0.01
real 7m38.391s
user 45m40.814s
sys 2m2.824s
◯ GeForce GTX 1080 Ti : FP16 使用
$ time python ../imagenet/train_imagenet.py -a alex_fp16 -g 1 -E 50 train.txt test.txt
epoch iteration main/loss validation/main/loss main/accuracy validation/main/accuracy lr
4 1000 3.426 0.26625 0.01
9 2000 2.021 0.50325 0.01
13 3000 1.314 0.6385 0.01
18 4000 0.879 0.781 0.01
23 5000 0.603 0.8665 0.01
27 6000 0.41625 0.93 0.01
32 7000 0.321 0.95 0.01
37 8000 0.254 0.9695 0.01
41 9000 0.223 0.978 0.01
46 10000 0.19 0.983 0.01
real 7m0.676s
user 37m13.339s
sys 1m48.444s