Hide Forgot
Description of problem: Not sure what may lead to this. May be a rdma network configure issue. I just guessed a component, please correct it if it's not proper. Version-Release number of selected component (if applicable): RHEL-6.8-20160125.0 How reproducible: Always Steps to Reproduce: 1. Submit beaker jobs to rdma-qe-xx machines, all aborted. The testing jobs I submitted: https://beaker.engineering.redhat.com/jobs/1208450 https://beaker.engineering.redhat.com/jobs/1208449 https://beaker.engineering.redhat.com/jobs/1208448 https://beaker.engineering.redhat.com/jobs/1208447 2. Submit beaker jobs to rdma-dev-xx machines Some jobs passed, and some aborted. Passed: https://beaker.engineering.redhat.com/jobs/1210043 https://beaker.engineering.redhat.com/jobs/1210042 Failed: https://beaker.engineering.redhat.com/jobs/1210044 https://beaker.engineering.redhat.com/jobs/1210048 Actual results: Jobs aborted. Expected results: rdma-qe-xx machines can be provisioned to RHEL-6.8-20160125.0 successfully Additional info: 1. rdma-qe-xx machines can be provisioned to RHEL-6.7 successfully
Hi Honggang or other developers who are available, Could you please help take a look at this issue which is blocking our testing? Thanks Zhaojuan
I will check this issue in this afternoon.
This should be a tg3 Ethernet device driver issue. All rdma-qe-xx has been connected to beaker network via tg3 NIC. The tg3 device never up after the installation reboot. As result, beaker jobs timed out because rdma-qe-xx can't detect beaker server's heartbeat. [file.bos.redhat.com] [10:43:18 PM] [honli@file machines]$ grep tg3 rdma* rdma-dev-10:Create_Interface tg3_1 Ethernet yes hwaddr 2c:59:e5:9a:2a:20 dhcp defroute rdma-dev-10:Create_Interface tg3_2 Ethernet no hwaddr 2c:59:e5:9a:2a:21 rdma-dev-11:Create_Interface tg3_1 Ethernet yes hwaddr 2c:59:e5:9a:2a:84 dhcp defroute rdma-dev-11:Create_Interface tg3_2 Ethernet no hwaddr 2c:59:e5:9a:2a:85 rdma-dev-12:Create_Interface tg3_1 Ethernet yes hwaddr 2c:59:e5:9a:3d:a4 dhcp defroute rdma-dev-12:Create_Interface tg3_2 Ethernet no hwaddr 2c:59:e5:9a:3d:a5 rdma-dev-13:Create_Interface tg3_1 Ethernet yes hwaddr 2c:59:e5:9a:23:3c dhcp defroute rdma-dev-13:Create_Interface tg3_2 Ethernet no hwaddr 2c:59:e5:9a:23:3d rdma-dev-14:Create_Interface tg3_1 Ethernet yes hwaddr 40:f2:e9:5c:51:1c bridge lab-bridge rdma-dev-14:Create_Interface tg3_2 Ethernet no hwaddr 40:f2:e9:5c:51:1d rdma-dev-14:Create_Interface tg3_3 Ethernet no hwaddr 40:f2:e9:5c:51:1e rdma-dev-14:Create_Interface tg3_4 Ethernet no hwaddr 40:f2:e9:5c:51:1f rdma-dev-15:Create_Interface tg3_1 Ethernet yes hwaddr 44:a8:42:2b:ab:4f dhcp defroute rdma-dev-15:Create_Interface tg3_2 Ethernet no hwaddr 44:a8:42:2b:ab:50 rdma-dev-15:Create_Interface tg3_3 Ethernet no hwaddr 44:a8:42:2b:ab:51 rdma-dev-15:Create_Interface tg3_4 Ethernet no hwaddr 44:a8:42:2b:ab:52 rdma-dev-16:Create_Interface tg3_1 Ethernet yes hwaddr 44:a8:42:2b:b2:9d dhcp defroute rdma-dev-16:Create_Interface tg3_2 Ethernet no hwaddr 44:a8:42:2b:b2:9e rdma-dev-16:Create_Interface tg3_3 Ethernet no hwaddr 44:a8:42:2b:b2:9f rdma-dev-16:Create_Interface tg3_4 Ethernet no hwaddr 44:a8:42:2b:b2:a0 rdma-master:Create_Interface tg3_1 Ethernet yes hwaddr e0:db:55:0b:b4:a8 bridge lab-bridge rdma-master:Create_Interface tg3_2 Ethernet no hwaddr e0:db:55:0b:b4:a9 rdma-master:Create_Interface tg3_3 Ethernet no hwaddr e0:db:55:0b:b4:aa rdma-master:Create_Interface tg3_4 Ethernet no hwaddr e0:db:55:0b:b4:ab rdma-perf-00:Create_Interface tg3_1 Ethernet yes hwaddr d8:9d:67:14:1e:f8 bridge lab-bridge rdma-perf-00:Create_Interface tg3_2 Ethernet no hwaddr d8:9d:67:14:1e:f9 rdma-perf-00:Create_Interface tg3_3 Ethernet no hwaddr d8:9d:67:14:1e:fa rdma-perf-00:Create_Interface tg3_4 Ethernet no hwaddr d8:9d:67:14:1e:fb rdma-perf-01:Create_Interface tg3_1 Ethernet yes hwaddr d8:9d:67:14:6c:6c bridge lab-bridge rdma-perf-01:Create_Interface tg3_2 Ethernet no hwaddr d8:9d:67:14:6c:6d rdma-perf-01:Create_Interface tg3_3 Ethernet no hwaddr d8:9d:67:14:6c:6e rdma-perf-01:Create_Interface tg3_4 Ethernet no hwaddr d8:9d:67:14:6c:6f rdma-perf-02:Create_Interface tg3_1 Ethernet yes hwaddr d8:9d:67:13:c8:80 bridge lab-bridge rdma-perf-02:Create_Interface tg3_2 Ethernet no hwaddr d8:9d:67:13:c8:81 rdma-perf-02:Create_Interface tg3_3 Ethernet no hwaddr d8:9d:67:13:c8:82 rdma-perf-02:Create_Interface tg3_4 Ethernet no hwaddr d8:9d:67:13:c8:83 rdma-perf-03:Create_Interface tg3_1 Ethernet yes hwaddr d8:9d:67:14:87:8c bridge lab-bridge rdma-perf-03:Create_Interface tg3_2 Ethernet no hwaddr d8:9d:67:14:87:8d rdma-perf-03:Create_Interface tg3_3 Ethernet no hwaddr d8:9d:67:14:87:8e rdma-perf-03:Create_Interface tg3_4 Ethernet no hwaddr d8:9d:67:14:87:8f rdma-qe-02:Create_Interface tg3_1 Ethernet yes hwaddr 40:a8:f0:75:ff:68 dhcp defroute rdma-qe-02:Create_Interface tg3_2 Ethernet no hwaddr 40:a8:f0:75:ff:69 rdma-qe-03:Create_Interface tg3_1 Ethernet yes hwaddr 9c:b6:54:bb:4a:90 dhcp defroute rdma-qe-03:Create_Interface tg3_2 Ethernet no hwaddr 9c:b6:54:bb:4a:91 rdma-qe-04:Create_Interface tg3_1 Ethernet yes hwaddr 9c:b6:54:bb:48:84 dhcp defroute rdma-qe-04:Create_Interface tg3_2 Ethernet no hwaddr 9c:b6:54:bb:48:85 rdma-qe-05:Create_Interface tg3_1 Ethernet yes hwaddr 9c:b6:54:bb:79:6c dhcp defroute rdma-qe-05:Create_Interface tg3_2 Ethernet no hwaddr 9c:b6:54:bb:79:6d rdma-qe-06:Create_Interface tg3_1 Ethernet yes hwaddr 2c:59:e5:9a:21:24 dhcp defroute rdma-qe-06:Create_Interface tg3_2 Ethernet no hwaddr 2c:59:e5:9a:21:25 rdma-qe-07:Create_Interface tg3_1 Ethernet yes hwaddr 2c:59:e5:9a:27:0c dhcp defroute rdma-qe-07:Create_Interface tg3_2 Ethernet no hwaddr 2c:59:e5:9a:27:0d rdma-qe-08:Create_Interface tg3_1 Ethernet yes hwaddr 2c:59:e5:9a:3d:e0 dhcp defroute rdma-qe-08:Create_Interface tg3_2 Ethernet no hwaddr 2c:59:e5:9a:3d:e1 rdma-qe-09:Create_Interface tg3_1 Ethernet yes hwaddr 2c:59:e5:9a:3b:ec dhcp defroute rdma-qe-09:Create_Interface tg3_2 Ethernet no hwaddr 2c:59:e5:9a:3b:ed rdma-qe-10:Create_Interface tg3_1 Ethernet yes hwaddr 2c:59:e5:9a:21:18 dhcp defroute rdma-qe-10:Create_Interface tg3_2 Ethernet no hwaddr 2c:59:e5:9a:21:19 rdma-qe-11:Create_Interface tg3_1 Ethernet yes hwaddr 2c:59:e5:9a:3b:d4 dhcp defroute rdma-qe-11:Create_Interface tg3_2 Ethernet no hwaddr 2c:59:e5:9a:3b:d5 rdma-qe-12:Create_Interface tg3_1 Ethernet yes hwaddr 34:64:a9:95:c9:3c dhcp defroute rdma-qe-12:Create_Interface tg3_2 Ethernet no hwaddr 34:64:a9:95:c9:3d rdma-qe-13:Create_Interface tg3_1 Ethernet yes hwaddr 40:a8:f0:75:fc:18 dhcp defroute rdma-qe-13:Create_Interface tg3_2 Ethernet no hwaddr 40:a8:f0:75:fc:19 rdma-qe-14:Create_Interface tg3_1 Ethernet yes hwaddr 44:a8:42:2b:af:30 dhcp defroute rdma-qe-14:Create_Interface tg3_2 Ethernet no hwaddr 44:a8:42:2b:af:31 rdma-qe-14:Create_Interface tg3_3 Ethernet no hwaddr 44:a8:42:2b:af:32 rdma-qe-14:Create_Interface tg3_4 Ethernet no hwaddr 44:a8:42:2b:af:33 rdma-qe-15:Create_Interface tg3_1 Ethernet yes hwaddr 44:a8:42:2b:b0:34 dhcp defroute rdma-qe-15:Create_Interface tg3_2 Ethernet no hwaddr 44:a8:42:2b:b0:35 rdma-qe-15:Create_Interface tg3_3 Ethernet no hwaddr 44:a8:42:2b:b0:36 rdma-qe-15:Create_Interface tg3_4 Ethernet no hwaddr 44:a8:42:2b:b0:37 rdma-storage-02:Create_Interface tg3_1 Ethernet yes hwaddr 54:9f:35:0c:24:70 dhcp defroute rdma-storage-02:Create_Interface tg3_2 Ethernet no hwaddr 54:9f:35:0c:24:71 rdma-storage-02:Create_Interface tg3_3 Ethernet no hwaddr 54:9f:35:0c:24:72 rdma-storage-02:Create_Interface tg3_4 Ethernet no hwaddr 54:9f:35:0c:24:73 rdma-storage-03:Create_Interface tg3_1 Ethernet yes hwaddr 54:9f:35:0c:1b:74 dhcp defroute rdma-storage-03:Create_Interface tg3_2 Ethernet no hwaddr 54:9f:35:0c:1b:75 rdma-storage-03:Create_Interface tg3_3 Ethernet no hwaddr 54:9f:35:0c:1b:76 rdma-storage-03:Create_Interface tg3_4 Ethernet no hwaddr 54:9f:35:0c:1b:77 rdma-storage-04:Create_Interface tg3_1 Ethernet yes hwaddr 54:9f:35:0c:2a:94 dhcp defroute rdma-storage-04:Create_Interface tg3_2 Ethernet no hwaddr 54:9f:35:0c:2a:95 rdma-storage-04:Create_Interface tg3_3 Ethernet no hwaddr 54:9f:35:0c:2a:96 rdma-storage-04:Create_Interface tg3_4 Ethernet no hwaddr 54:9f:35:0c:2a:97 rdma-virt-00:Create_Interface tg3_1 Ethernet yes hwaddr 44:a8:42:01:24:04 bridge lab-bridge rdma-virt-00:Create_Interface tg3_2 Ethernet no hwaddr 44:a8:42:01:24:05 rdma-virt-00:Create_Interface tg3_3 Ethernet no hwaddr 44:a8:42:01:24:06 rdma-virt-00:Create_Interface tg3_4 Ethernet no hwaddr 44:a8:42:01:24:07 rdma-virt-01:Create_Interface tg3_1 Ethernet yes hwaddr 44:a8:42:01:31:d9 bridge lab-bridge rdma-virt-01:Create_Interface tg3_2 Ethernet no hwaddr 44:a8:42:01:31:da rdma-virt-01:Create_Interface tg3_3 Ethernet no hwaddr 44:a8:42:01:31:db rdma-virt-01:Create_Interface tg3_4 Ethernet no hwaddr 44:a8:42:01:31:dc rdma-virt-02:Create_Interface tg3_1 Ethernet yes hwaddr 44:a8:42:01:1b:a2 bridge lab-bridge rdma-virt-02:Create_Interface tg3_2 Ethernet no hwaddr 44:a8:42:01:1b:a3 rdma-virt-02:Create_Interface tg3_3 Ethernet no hwaddr 44:a8:42:01:1b:a4 rdma-virt-02:Create_Interface tg3_4 Ethernet no hwaddr 44:a8:42:01:1b:a5 rdma-virt-03:Create_Interface tg3_1 Ethernet yes hwaddr 44:a8:42:01:22:29 bridge lab-bridge rdma-virt-03:Create_Interface tg3_2 Ethernet no hwaddr 44:a8:42:01:22:2a rdma-virt-03:Create_Interface tg3_3 Ethernet no hwaddr 44:a8:42:01:22:2b rdma-virt-03:Create_Interface tg3_4 Ethernet no hwaddr 44:a8:42:01:22:2c [file.bos.redhat.com] [10:43:26 PM] [honli@file machines]$
Buggy Create_Interface function in "rdma-function.sh" only updates udev rules for InfiniBand/IPoIB interfaces. It ignores Enthernet interfaces. The default /etc/udev/rules.d/70-persistent-net.rules file names the tg3 interfaces as ethX. So, network service failed to up the tg3_X interfaces as they did not rename by udev. I will fix the issue ASAP.
https://beaker.engineering.redhat.com/jobs/1211204 https://beaker.engineering.redhat.com/jobs/1211203 https://beaker.engineering.redhat.com/jobs/1211202 https://beaker.engineering.redhat.com/jobs/1211200 https://beaker.engineering.redhat.com/jobs/1211199 https://beaker.engineering.redhat.com/jobs/1211197 Issue had been fixed.
(In reply to Honggang LI from comment #7) > https://beaker.engineering.redhat.com/jobs/1211204 > https://beaker.engineering.redhat.com/jobs/1211203 > https://beaker.engineering.redhat.com/jobs/1211202 > https://beaker.engineering.redhat.com/jobs/1211200 > https://beaker.engineering.redhat.com/jobs/1211199 > https://beaker.engineering.redhat.com/jobs/1211197 > > Issue had been fixed. Thank Honggang very much for your quick response!
(In reply to Honggang LI from comment #6) > Buggy Create_Interface function in "rdma-function.sh" only updates udev > rules for InfiniBand/IPoIB interfaces. It ignores Enthernet interfaces. The > default /etc/udev/rules.d/70-persistent-net.rules file names the tg3 > interfaces as ethX. So, network service failed to up the tg3_X interfaces as > they did not rename by udev. I will fix the issue ASAP. This didn't used to be necessary. Just deleting the original device config files and creating new ones would override the previous device names stored in the udev rules file. Has this changed with the latest rhel6 then?
(In reply to Doug Ledford from comment #9) > This didn't used to be necessary. Just deleting the original device config The old script without my update works for RHEL-6.7 (and the older 6.x distros). I suspect my F23 provision jobs timed out may because it too. I did not play Fedora over the rdma cluster, so I'm not sure. > files and creating new ones would override the previous device names stored > in the udev rules file. Has this changed with the latest rhel6 then? Not sure. And the weird thing is that machines with bnx Ethernet NICs work with RHEL-6.8-20160125.0, only machines with tg3 NIC failed.
(In reply to Honggang LI from comment #10) > (In reply to Doug Ledford from comment #9) > > This didn't used to be necessary. Just deleting the original device config > > The old script without my update works for RHEL-6.7 (and the older 6.x > distros). I suspect my F23 provision jobs timed out may because it too. I > did not play Fedora over the rdma cluster, so I'm not sure. > > > files and creating new ones would override the previous device names stored > > in the udev rules file. Has this changed with the latest rhel6 then? > > Not sure. And the weird thing is that machines with bnx Ethernet NICs work > with RHEL-6.8-20160125.0, only machines with tg3 NIC failed. Is this a unique script to our test machines? a pkg change?
(In reply to Don Dutile from comment #11) > (In reply to Honggang LI from comment #10) > > (In reply to Doug Ledford from comment #9) > > > This didn't used to be necessary. Just deleting the original device config > > > > The old script without my update works for RHEL-6.7 (and the older 6.x > > distros). I suspect my F23 provision jobs timed out may because it too. I > > did not play Fedora over the rdma cluster, so I'm not sure. > > > > > files and creating new ones would override the previous device names stored > > > in the udev rules file. Has this changed with the latest rhel6 then? > > > > Not sure. And the weird thing is that machines with bnx Ethernet NICs work > > with RHEL-6.8-20160125.0, only machines with tg3 NIC failed. > > Is this a unique script to our test machines? It's one of the primary function routines in rdma-functions.sh that gets installed on all the rdma-* machines. > a pkg change? Maybe. We recently changed all rhel6 installs from using NetworkManager (which would read the device name from the ifcfg-* file and set the device name appropriately) back to using the old SysV init network package. This was due to the rhel6 NetworkManager not supporting vlans or maybe pkeys, can't remember off the top of my head, but one of the recent updates we made to the default network interface setup was not supported with the rhel6 NetworkManager but was by the network service and so we switched back. With that comes a concurrent change in how the device naming is done. The SysV init network script will attempt to change a device name, but I think they defer to the udev persistent device rules. The SysV init network script for infiniband device are actually part of the rhel6 rdma package and they will change the IB device name from rhel6.0 through about 6.5 I think, then they switched to using a udev rule. So the matrix of when we used udev rules and when we relied on the network service to rename the device is complex in the rhel6 lifetime. I don't have an explanation for why it would have changed recently, nor why it would have only effected the tg3 devices and not the bnx devices. That makes no sense and make me think that Honggang's change might have papered over the issue, but we don't really know what the true root cause of the issue was.
(In reply to Doug Ledford from comment #12) > Maybe. We recently changed all rhel6 installs from using NetworkManager > (which would read the device name from the ifcfg-* file and set the device > name appropriately) back to using the old SysV init network package. This > was due to the rhel6 NetworkManager not supporting vlans or maybe pkeys, > can't remember off the top of my head, but one of the recent updates we made https://bugzilla.redhat.com/show_bug.cgi?id=1276030 https://bugzilla.redhat.com/show_bug.cgi?id=1284115
(In reply to Honggang LI from comment #13) > (In reply to Doug Ledford from comment #12) > > > Maybe. We recently changed all rhel6 installs from using NetworkManager > > (which would read the device name from the ifcfg-* file and set the device > > name appropriately) back to using the old SysV init network package. This > > was due to the rhel6 NetworkManager not supporting vlans or maybe pkeys, > > can't remember off the top of my head, but one of the recent updates we made > > https://bugzilla.redhat.com/show_bug.cgi?id=1276030 > https://bugzilla.redhat.com/show_bug.cgi?id=1284115 There are two patches for 1284115. Have they been tried ? Should we get NM patchesd for 6.8 & additional doc update for no-NM support for ib-vlan's ?
(In reply to Don Dutile from comment #14) > There are two patches for 1284115. > Have they been tried ? Should we get NM patchesd for 6.8 & additional doc > update for no-NM support for ib-vlan's ? Zhaojuan Please test the patches for bz1284115.