Welcome 微信登录

首页 / 操作系统 / Linux / RHEL5.8 下Infiniband驱动安装

RHEL5.8 下Infiniband驱动安装过程笔记。1      下载驱动地址:http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_drivers根据操作系统版本进行驱动选择,建议使用ISO格式驱动包。备注:RHEL5及以前版本选择1.5.3系列驱动,RHEL6及以后版本选择2.0及以上系列驱动。2      驱动安装2.1  将下载好的驱动传到服务器上,挂载到/public/ofed目录。[root@node33 sourcecode]#mount -o loop MLNX_OFED_LINUX-1.5.3-4.0.42-rhel5.8-x86_64.iso /public/ofed/[root@node33 sourcecode]# cd[root@node33 ~]# df -hFilesystem         Size  Used Avail Use% Mounted on/dev/sda3            117G  9.8G  101G  9% //dev/sda1            494M 17M  452M  4% /boottmpfs                5.9G   0  5.9G  0% /dev/shm/tftpboot/rhel.iso 3.9G  3.9G   0 100% /tftpboot/iso/public/sourcecode/MLNX_OFED_LINUX-1.5.3-4.0.42-rhel5.8-x86_64.iso267M  267M    0 100% /public/ofed[root@node33 ~]#2.2  执行安装命令,开始软件包安装。[root@node33 ~]# /public/ofed/mlnxofedinstall -yUsage:/public/ofed/mlnxofedinstall [OPTIONS]Options-c|--config <packages config_file> Example of the configurationfilecan be found under docs-n|--net <network config_file> Example of the networkconfiguration filecanbe found under docs-k|--kernel-version <kernel version> Use provided kernel versioninstead of "uname -r"-p|--print-available     Printavailable packages for current platformAndcreate corresponding ofed.conf file--without-32bit            Skip32-bit libraries installation--without-depcheck       SkipDistro"s libraries check--without-fw-update        Skip firmware update--fw-update-only         Updatefirmware. Skip driver installation--force-fw-update          Forcefirmware update--force                    Forceinstallation--all|--hpc|--basic|--msm  Install all, hpc, basic or Mellanox Subnetmanager packagescorrespondingly--vma|--vma-vpi            Installpackages required by VMA to support VPI--vma-eth                  Install packages required by VMA towork over Ethernet-v|-vv|-vvv                Setverbosity level--umad-dev-rw              Grantnon root users read/write permission for umad devices instead of default--hugepages-overcommit   Setting 80% of MAX_MEMORY as overcommitfor huge page allocation--pfc <0|bitmask>       Priority based Flow Control policy on TX and RX [7:0].Perpriority bit mask (uint). Default 0.-q                       Setquiet - no messages will be printed[root@node33 ~]# echo y |/public/ofed/mlnxofedinstall --basic --msm --umad-dev-rw --hugepages-overcommitThis program will install the MLNX_OFED_LINUX packageon your machine.Note that all other Mellanox, OEM, OFED, orDistribution IB packages will be removed.Do you want to continue?[y/N]:Starting MLNX_OFED_LINUX-1.5.3-4.0.42 installation...Installing mlnx-ofa_kernel RPMPreparing...             ##################################################mlnx-ofa_kernel            ##################################################Installing kmod-mlnx-ofa_kernel RPMPreparing...                ##################################################kmod-mlnx-ofa_kernel     ##################################################Installing kmod-mlnx-ofa_kernel-xen RPMPreparing...             ##################################################kmod-mlnx-ofa_kernel-xen ##################################################Installing kernel-mft RPMPreparing...             ##################################################kernel-mft               ##################################################Installing user level RPMs:Preparing...             ##################################################mlnxofed-docs              ##################################################Preparing...             ##################################################ofed-scripts             ##################################################Preparing...             ##################################################libibverbs               ##################################################Preparing...             ##################################################libibverbs               ##################################################Preparing...             ##################################################libibverbs-utils         ##################################################Preparing...             ##################################################libmthca                 ##################################################Preparing...                ##################################################libmthca                 ##################################################Preparing...             ##################################################libmverbs                  ##################################################Preparing...             ##################################################libmverbs                  ##################################################Preparing...             ##################################################libmlx4                    ##################################################Preparing...             ##################################################libmlx4                    ##################################################Preparing...             ##################################################libcxgb3                 ##################################################Preparing...             ##################################################libcxgb3                    ##################################################Preparing...             ##################################################libnes                   ##################################################Preparing...                ##################################################libnes                   ##################################################Preparing...             ##################################################libipathverbs              ##################################################Preparing...             ##################################################libipathverbs              ##################################################Preparing...             ##################################################librdmacm                 ##################################################Preparing...             ##################################################librdmacm                  ##################################################Preparing...                ##################################################librdmacm-utils            ##################################################Preparing...             ##################################################mstflint                    ##################################################Preparing...             ##################################################libibumad                  ##################################################Preparing...             ##################################################libibumad                  ##################################################Preparing...             ##################################################libibmad                 ##################################################Preparing...             ##################################################libibmad                 ##################################################Preparing...             ##################################################mft                       ##################################################Preparing...             ##################################################opensm-libs                ##################################################Preparing...                ##################################################opensm-libs                ##################################################Preparing...             ##################################################infiniband-diags         ##################################################Preparing...             ##################################################opensm                   ##################################################Preparing...             ##################################################ibutils                    ##################################################Device (06:00.0):06:00.0InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR/ 10GigE] (rev b0)LinkWidth: 8xPCILink Speed: 2.5Gb/sInstallation finished successfully.Programming HCA firmware for /dev/mst/mt26428_pci_cr0deviceRunning: mlxburn -d /dev/mst/mt26428_pci_cr0 -fw/public/ofed/firmware/fw-25408/2_9_1000/fw-ConnectX2-rel.mlx -dev_type25408  -no-I- Querying device ...-I- Using auto detected configuration file:/public/ofed/firmware/fw-25408/2_9_1000/MHQH19B-XTR_A1-A3.ini (PSID =MT_0D90110009)-I- Generating image ...Current FW version on flash: 2.7.626New FW version:              2.9.1000Burning FW image without signatures  - OKRestoring signature                  - OK-I- Image burn completed successfully.Configuring /etc/security/limits.conf.Please reboot your system for the changes to takeeffect.[root@node33 ~]#备注:安装可选all、hpc、basic、msm四种方式。建议使用basic标准模式。管理节点需要安装msm和basic两种模式!!!安装过程中会强制刷新HCA卡固件,非独立HCA卡请严格注意固件版本!!!2.3  配置IB网卡IP地址[root@node33 ~]# cat <<EOF >> /etc/sysconfig/network-scripts/ifcfg-ib0>DEVICE=ib0>BOOTPROTO=none>ONBOOT=yes>NETMASK=255.255.255.0>IPADDR=12.12.12.3> EOF[root@node33 ~]#[root@node33 ~]# cat/etc/sysconfig/network-scripts/ifcfg-ib0DEVICE=ib0BOOTPROTO=noneONBOOT=yesNETMASK=255.255.255.0IPADDR=12.12.12.33[root@node33 ~]#2.4  启动IB服务[root@node33 ~]# chkconfig--list | grep openopenibd            0:off 1:off 2:on 3:on 4:on 5:on 6:offopensmd        0:off 1:off 2:off 3:on 4:on 5:on 6:off[root@node33 ~]# /etc/init.d/openibdrestartUnloading HCA driver:                                      [  OK  ]Loading HCA driver and Access Layer:                     [  OK  ]Setting up InfiniBand network interfaces:Bringing up interface ib0:                               [  OK  ]Setting up service network . . .                         [  done  ][root@node33 ~]# /etc/init.d/opensmdrestartStopping IB Subnet Manager.                                [FAILED]Starting IB Subnet Manager.                                [  OK  ][root@node33 ~]# ibstatCA "mlx4_0"CAtype: MT26428Numberof ports: 1Firmwareversion: 2.9.1000Hardwareversion: b0NodeGUID: 0x0002c903000cc00eSystemimage GUID: 0x0002c903000cc011Port 1:State:ActivePhysicalstate: LinkUpRate:40Baselid: 1LMC:0SMlid: 1Capabilitymask: 0x0251086aPortGUID: 0x0002c903000cc00fLinklayer: InfiniBand[root@node33 ~]#备注:管理节点需要先启动openibd,后启动opensmd。计算节点只需要启动openibd。配置完成后注意通过ibstat检查速率和链路状态。3      卸载IB驱动[root@node33 ~]#echo y | /public/ofed.uninstall.shThis program will uninstall allMLNX_OFED_LINUX-1.5.3-4.0.42 packages on your machine.Do you want to continue?[y/N]:yrpm -e --allmatches --nodeps kmod-mlnx-ofa_kernel-xen-1.5.3-OFED.1.5.3.4.0.42.g3cb72fe.rhel5u8libnes-1.1.1mlnx1-1 libcxgb3-1.3.1-1 libmverbs-0.1.0-3.15.gd28970elibibmad-1.3.8.MLNX_20120424-0.1 libmthca-1.0.6mlnx1-0.1.gbe5eef3 libibumad-1.3.7.MLNX_20130110_ff06102-0.1libibverbs-1.1.5mlnx2-1 libmlx4-1.0.2mlnx6-1 librdmacm-1.0.15-1kernel-mft-2.7.1-2.6.18_308.el5 libmverbs-0.1.0-3.15.gd28970elibipathverbs-1.2mlnx1-1 libibmad-1.3.8.MLNX_20120424-0.1mlnx-ofa_kernel-1.5.3-OFED.1.5.3.4.0.42.g3cb72fe.rhel5u8libibverbs-utils-1.1.5mlnx2-1 libcxgb3-1.3.1-1 mstflint-1.4mlnx4-1.21.gd948dddlibmlx4-1.0.2mlnx6-1 librdmacm-1.0.15-1 libmthca-1.0.6mlnx1-0.1.gbe5eef3libibumad-1.3.7.MLNX_20130110_ff06102-0.1 libibverbs-1.1.5mlnx2-1 librdmacm-utils-1.0.15-1mlnxofed-docs-1.5.3-4.0.42 libipathverbs-1.2mlnx1-1kmod-mlnx-ofa_kernel-1.5.3-OFED.1.5.3.4.0.42.g3cb72fe.rhel5u8libnes-1.1.1mlnx1-1 kernel-mft-2.7.1-2.6.18_308.el5ofed-scripts-1.5.3-OFED.1.5.3.4.0.42 mft-2.7.1a-1Uninstall finished successfully[root@node33 ~]#rm –rf/etc/infiniband[root@node33 ~]#4      排错4.1  查看IB工作状态[root@node33 ~]# ibstatCA "mlx4_0"CAtype: MT26428Numberof ports: 1Firmwareversion: 2.9.1000Hardwareversion: b0NodeGUID: 0x0002c903000cc00eSystemimage GUID: 0x0002c903000cc011Port 1:State:ActivePhysicalstate: LinkUpRate:40Baselid: 1LMC:0SMlid: 1Capabilitymask: 0x0251086aPortGUID: 0x0002c903000cc00fLinklayer: InfiniBand[root@node33 ~]#4.2  查看hosts信息[root@node33 ~]# ibhostsCa    :0x0002c903000cc00a ports 1 "node34 HCA-1"Ca    :0x0002c903000cc00e ports 1 "node33 HCA-1"[root@node33 ~]#4.3  查看switch信息[root@node33 ~]# ibswitchesSwitch      :0x0002c9020042bcc0 ports 36 "MF0;switch-1140a2:IS5030/U1" enhancedport 0 lid 4 lmc 0[root@node33 ~]#4.4  查看拓扑信息[root@node33 ~]#ibnetdiscover## Topology file: generated on Sun Mar  8 19:53:35 2015## Initiated from node 0002c903000cc00e port0002c903000cc00fvendid=0x2c9devid=0xbd36sysimgguid=0x2c9020042bcc3switchguid=0x2c9020042bcc0(2c9020042bcc0)Switch      36"S-0002c9020042bcc0"                #"MF0;switch-1140a2:IS5030/U1" enhanced port 0 lid 4 lmc 0[30]  "H-0002c903000cc00e"[1](2c903000cc00f)          # "node33 HCA-1" lid 14xQDR[31]  "H-0002c903000cc00a"[1](2c903000cc00b)                 # "node34HCA-1" lid 7 4xQDRvendid=0x2c9devid=0x673csysimgguid=0x2c903000cc00dcaguid=0x2c903000cc00aCa    1"H-0002c903000cc00a"               #"node34 HCA-1"[1](2c903000cc00b)        "S-0002c9020042bcc0"[31]              # lid 7 lmc 0"MF0;switch-1140a2:IS5030/U1" lid 4 4xQDRvendid=0x2c9devid=0x673csysimgguid=0x2c903000cc011caguid=0x2c903000cc00eCa    1"H-0002c903000cc00e"               #"node33 HCA-1"[1](2c903000cc00f)       "S-0002c9020042bcc0"[30]              # lid 1 lmc 0"MF0;switch-1140a2:IS5030/U1" lid 4 4xQDR[root@node33 ~]#4.5  查看报错统计信息[root@node33 ~]# ibdiagnet -Pall=1Loading IBDIAGNET from:/opt/ibutils/lib64/ibdiagnet1.5.7-W- Topology file is not specified.Reportsregarding cluster links will use direct routes.Loading IBDM from: /opt/ibutils/lib64/ibdm1.5.7-I- Using port 1 as the local port.-I- Discovering ... 3 nodes (1 Switches & 2 CA-s)discovered.-I----------------------------------------------------I- Bad Guids/LIDs Info-I----------------------------------------------------I- No bad Guids were found-I----------------------------------------------------I- Links With Logical State = INIT-I----------------------------------------------------I- No bad Links (with logical state = INIT) werefound-I----------------------------------------------------I- General Device Info-I----------------------------------------------------I----------------------------------------------------I- PM Counters Info-I----------------------------------------------------I- No illegal PM counters values were found-I----------------------------------------------------I- Fabric Partitions Report (see ibdiagnet.pkey fora full hosts list)-I----------------------------------------------------I- PKey:0x7fff Hosts:2 full:2 limited:0-I----------------------------------------------------I- IPoIB Subnets Check-I----------------------------------------------------I- Subnet: IPv4 PKey:0x7fff QKey:0x00000b1bMTU:2048Byte rate:10Gbps SL:0x00-W- Suboptimal rate for group. Lowest memberrate:40Gbps > group-rate:10Gbps-I----------------------------------------------------I- Bad Links Info-I- No bad link were found-I--------------------------------------------------------------------------------------------------------------------I- Stages Status Report:STAGE                                    ErrorsWarningsBad GUIDs/LIDs Check                    0      0Link State Active Check               0      0General Devices Info Report           0      0Performance Counters Report           0      0Partitions Check                        0      0IPoIB Subnets Check                   0      1Please see /tmp/ibdiagnet.log for complete log-----------------------------------------------------------------I- Done. Run time was 1 seconds.[root@node33 ~]#4.6  查看全局详细报错信息[root@node33 ~]# ibqueryerrorsErrors for 0x2c9020042bcc0"MF0;switch-1140a2:IS5030/U1"GUID0x2c9020042bcc0 port ALL: [PortRcvSwitchRelayErrors == 64] [PortXmitDiscards ==29] [PortXmitWait == 240663]GUID0x2c9020042bcc0 port 0: [PortXmitWait == 1232]GUID0x2c9020042bcc0 port 1: [PortRcvSwitchRelayErrors == 2] [PortXmitDiscards == 3]GUID0x2c9020042bcc0 port 2: [PortRcvSwitchRelayErrors == 3] [PortXmitDiscards == 3]GUID0x2c9020042bcc0 port 3: [PortRcvSwitchRelayErrors == 1] [PortXmitDiscards == 3]GUID0x2c9020042bcc0 port 4: [PortRcvSwitchRelayErrors == 1] [PortXmitDiscards == 1]GUID0x2c9020042bcc0 port 5: [PortRcvSwitchRelayErrors == 1] [PortXmitDiscards == 2]GUID0x2c9020042bcc0 port 6: [PortRcvSwitchRelayErrors == 2] [PortXmitDiscards == 3]GUID0x2c9020042bcc0 port 7: [PortRcvSwitchRelayErrors == 1] [PortXmitDiscards == 2]GUID0x2c9020042bcc0 port 8: [PortRcvSwitchRelayErrors == 1] [PortXmitDiscards == 2]GUID0x2c9020042bcc0 port 9: [PortRcvSwitchRelayErrors == 1] [PortXmitDiscards == 2]GUID0x2c9020042bcc0 port 10: [PortRcvSwitchRelayErrors == 1] [PortXmitDiscards ==2]GUID0x2c9020042bcc0 port 11: [PortRcvSwitchRelayErrors == 1] [PortXmitDiscards ==2]GUID0x2c9020042bcc0 port 12: [PortRcvSwitchRelayErrors == 1] [PortXmitDiscards ==2]GUID0x2c9020042bcc0 port 13: [PortRcvSwitchRelayErrors == 1] [PortXmitDiscards ==1]GUID0x2c9020042bcc0 port 14: [PortRcvSwitchRelayErrors == 1] [PortXmitDiscards ==1]GUID0x2c9020042bcc0 port 30: [PortXmitWait == 4294967295]GUID0x2c9020042bcc0 port 31: [PortRcvSwitchRelayErrors == 46] [PortXmitWait == 295]GUID0x2c9020042bcc0 port 34: [PortXmitWait == 892]GUID0x2c9020042bcc0 port 36: [PortXmitWait == 238245]## Summary: 17 nodes checked, 1 bad nodes found##          53ports checked, 19 ports have errors beyond threshold## Thresholds:## Suppressed:[root@node33 ~]#更多RedHat相关信息见RedHat 专题页面 http://www.linuxidc.com/topicnews.aspx?tid=10本文永久更新链接地址