最適化問題に対する超高速&安定計算

大規模最適化問題、グラフ探索、機械学習やデジタルツインなどの研究のお話が中心

Infiniband エラー

2019年01月29日 02時05分10秒 | Weblog
以下のような Infiniband のエラーが発生しています。簡単に解決しません。計算自体は可能です。

ibv_exp_query_device: invalid comp_mask !!! (comp_mask = 0x40 valid_mask = 0x3)
[cal46][[17705,1],0][../../../../../opal/mca/btl/openib/btl_openib_component.c:1670:init_one_device] error obtaining device attributes for mlx4_1 errno says Invalid argument
--------------------------------------------------------------------------
WARNING: There was an error initializing an OpenFabrics device.

Local host: cal46
Local device: mlx4_1
--------------------------------------------------------------------------


# ibnodes
Ca : 0x7cfe900300a293c0 ports 1 "opt-ds mlx4_0"
Ca : 0x70106fffffa62cb0 ports 2 "cal53 HCA-1"
Ca : 0x70106fffffa6ac40 ports 2 "cal53 HCA-2"
Ca : 0x70106fffffa63c30 ports 2 "cal52 HCA-2"
Ca : 0x70106fffffa67c50 ports 2 "cal52 HCA-1"
Ca : 0x70106fffffa63c10 ports 2 "cal51 HCA-1"
Ca : 0x70106fffffa67cc0 ports 2 "cal51 HCA-2"
Ca : 0x70106fffffa69ca0 ports 2 "cal50 HCA-1"
Ca : 0x70106fffffa64ca0 ports 2 "cal50 HCA-2"
Ca : 0x480fcffffff4f250 ports 2 "cal49 HCA-1"
Ca : 0x480fcffffff4c250 ports 2 "cal49 HCA-2"
Ca : 0x480fcffffff423e0 ports 2 "cal48 HCA-1"
Ca : 0xc4346bffffdc4540 ports 2 "cal48 HCA-2"
Ca : 0x480fcffffff41320 ports 2 "cal47 HCA-1"
Ca : 0x480fcffffff4c3a0 ports 2 "cal47 HCA-2"
Ca : 0xc4346bffffdc3520 ports 2 "cal46 HCA-2"
Ca : 0xc4346bffffdc3510 ports 2 "cal46 HCA-1"
Switch : 0xe41d2d0300874780 ports 36 "MF0;ibsw01:SX6036/U1" enhanced port 0 lid 1 lmc 0

# ibv_devinfo
hca_id: mlx4_1
transport: InfiniBand (0)
fw_ver: 2.35.5100
node_guid: c434:6bff:ffdc:3520
sys_image_guid: c434:6bff:ffdc:3523
vendor_id: 0x02c9
vendor_part_id: 4103
hw_ver: 0x0
board_id: HP_1370110017
phys_port_cnt: 2
Device ports:
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 1
port_lid: 6
port_lmc: 0x00
link_layer: InfiniBand

port: 2
state: PORT_DOWN (1)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: InfiniBand

hca_id: mlx4_0
transport: InfiniBand (0)
fw_ver: 2.35.5100
node_guid: c434:6bff:ffdc:3510
sys_image_guid: c434:6bff:ffdc:3513
vendor_id: 0x02c9
vendor_part_id: 4103
hw_ver: 0x0
board_id: HP_1370110017
phys_port_cnt: 2
Device ports:
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 1
port_lid: 2
port_lmc: 0x00
link_layer: InfiniBand

port: 2
state: PORT_DOWN (1)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: InfiniBand

コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする