サーバの中を開けて確認したところ、1つの GPU の電源コネクタが外れていまして(何故?)、繋ぎ直したら元に戻りました。。。
$ nvidia-smi
Thu Sep 15 00:01:23 2016
+------------------------------------------------------+
| NVIDIA-SMI 352.39 Driver Version: 352.39 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K40m Off | 0000:42:00.0 Off | 0 |
| N/A 33C P0 62W / 235W | 22MiB / 11519MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K40m Off | 0000:43:00.0 Off | 0 |
| N/A 31C P0 62W / 235W | 22MiB / 11519MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K40m Off | 0000:81:00.0 Off | 0 |
| N/A 32C P0 61W / 235W | 22MiB / 11519MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K40m Off | 0000:82:00.0 Off | 0 |
| N/A 33C P0 61W / 235W | 22MiB / 11519MiB | 43% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
$ nvidia-smi
Thu Sep 15 00:01:23 2016
+------------------------------------------------------+
| NVIDIA-SMI 352.39 Driver Version: 352.39 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K40m Off | 0000:42:00.0 Off | 0 |
| N/A 33C P0 62W / 235W | 22MiB / 11519MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K40m Off | 0000:43:00.0 Off | 0 |
| N/A 31C P0 62W / 235W | 22MiB / 11519MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K40m Off | 0000:81:00.0 Off | 0 |
| N/A 32C P0 61W / 235W | 22MiB / 11519MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K40m Off | 0000:82:00.0 Off | 0 |
| N/A 33C P0 61W / 235W | 22MiB / 11519MiB | 43% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+