Description of problem: The network doesn't work on RHEVH after upgrade to "RHEV Hypervisor - 6.5 - 20131115.0.3.2.el6_5". The hosts runs on Dell C6220. Version-Release number of selected component (if applicable): How reproducible: Install RHEV Hypervisor - 6.5 - 20131115.0.3.2.el6_5 on Dell C6220. Actual results: The host is unreachable via network. Additional info: The root cause seems to be the nic names, which switched from previously p2p1 and p2p2 to em3 and em4.
A short term solution is to rename the NIC name part of the old configuration files to the new NIC names. Looks a bit like a biosdevname issue.
Alexander - I'm trying to find any indicators of something which would have changed device naming, since it already appears to have biosdevnames, but I'm uncertain whether I'll locate it. Do the MAC addresses change? I'm not able to reproduce this on an R610 or a R300, so it may be something to do with the odd arrangement of the C6220. That said, if the MAC addresses are the same, we should be able to work around this.
Václav, could this be a regression of the biosdevname upgrade from 0.4.1-3 in 6.4 to 0.5.0-1 in 6.5? And do you need informations about the underlying hardware or pci layout?
Hi, please provide output of "biosdevname -d", thanks
Here you go: # biosdevname -d BIOS device: em1 Kernel name: em1 Permanent MAC: 84:8F:69:FE:26:36 Assigned MAC : 84:8F:69:FE:26:36 ifIndex: 2 Driver: igb Driver version: 5.0.5-k Firmware version: 1.57.0 Bus Info: 0000:02:00.0 PCI name : 0000:02:00.0 PCI Slot : embedded SMBIOS Device Type: Ethernet SMBIOS Instance: 1 SMBIOS Label: LOM 1G I350-BT2 (LAN1) sysfs Index: 1 sysfs Label: LOM 1G I350-BT2 (LAN1) Embedded Index: 1 BIOS device: em2 Kernel name: em2 Permanent MAC: 84:8F:69:FE:26:37 Assigned MAC : 84:8F:69:FE:26:37 ifIndex: 3 Driver: igb Driver version: 5.0.5-k Firmware version: 1.57.0 Bus Info: 0000:02:00.3 PCI name : 0000:02:00.3 PCI Slot : embedded SMBIOS Device Type: Ethernet SMBIOS Instance: 2 SMBIOS Label: LOM 1G I350-BT2 (LAN2) sysfs Index: 2 sysfs Label: LOM 1G I350-BT2 (LAN2) Embedded Index: 2 BIOS device: em3 Kernel name: em3 Permanent MAC: A0:36:9F:0A:06:F0 Assigned MAC : A0:36:9F:0A:06:F0 ifIndex: 4 Driver: ixgbe Driver version: 3.15.1-k Firmware version: 0x800002fc Bus Info: 0000:03:00.0 PCI name : 0000:03:00.0 PCI Slot : embedded SMBIOS Label: SLOT1_PCIE_G3_X16(CPU1) Embedded Index: 3 BIOS device: em4 Kernel name: em4 Permanent MAC: A0:36:9F:0A:06:F2 Assigned MAC : A0:36:9F:0A:06:F2 ifIndex: 5 Driver: ixgbe Driver version: 3.15.1-k Firmware version: 0x800002fc Bus Info: 0000:03:00.1 PCI name : 0000:03:00.1 PCI Slot : embedded SMBIOS Label: SLOT1_PCIE_G3_X16(CPU1) Embedded Index: 4
Václav, does comment 5 help?
Václav, any updated - we need your input.
Hi Chuzhoy, I can't reproduce this issue using our Dell machine which is not Dell C6220. I have to waited for requesting this machine from beaker. I just want to know if it is ok for you to borrow you Dell C6220 to me to reproduce this issue? Thanks huiwang
Narendra, Jordan, could you take a look at this? Biosdevname was rebased in 6.5 to the version 0.5.0 - could there be any changes that might be source of troubles?
Hi Wanghui, Unforetunately I can't let you use the machines that are part of the working setup. Having said that, there's a machine I can let you use. Could you please contact me on IRC. Thanks.
Hmm, which NICs did you use? We have DellC6220 with 10G cards on it and we use the 10G. I'm on #tlv (look for sasha). Thanks.
(In reply to Alexander Chuzhoy from comment #0) > Additional info: > The root cause seems to be the nic names, which switched from previously > p2p1 and p2p2 to em3 and em4. Vaclav, can this name change be related to biosdevname?
Hi, could you please attach the output from the following - 1. dmidecode 2. lspci -tv 3. lspci -vvvxxxx 4. biosdecode It seems like the SMBIOS type 9 records for the add-in NICs are incorrect. Just to confirm, the system has two onboard 'igb' NICs and a single dual port ixgbe NIC on PCIE slot 1. Is this understanding correct ?
Created attachment 859544 [details] lspci -vvvxxxx output
Created attachment 859545 [details] lspci output
Created attachment 859546 [details] dmidecode output
Created attachment 859547 [details] biosdecode output
Hi Fabian and Narendra , I've attached the output as requested . I'm not really familiar with that bug ,but I'll try to help as much as I can . Please let me know if you need anything else. Thanks , Dotan
Dotan, thanks for those files. Could you also please attach the following two files: /etc/udev/rules.d/70-persistent-net.rules /etc/udev/rules.d/71-persistent-node-net.rules
Hi Fabian , They don't exist .
Dotan, Any output for "dmesg | grep rename" ?
dmesg | grep rename: udev: renamed network interface eth1 to em2 udev: renamed network interface eth3 to em4 udev: renamed network interface eth2 to em3 udev: renamed network interface eth0 to em1
It seems odd that udev would be naming without rules. find / -name "*net.rules" ?
All I managed to find is : /lib/udev/rules.d/60-net.rules
/lib/udev/rules.d/60-net.rules will rename interfaces when the ifcfg-eth* or ifcfg-em* files have DEVICE= and HWADDR= set. Could you please attach the /etc/sysconfig/network-scripts/ifcfg-* files ? Looking at the output from dmidecode from comment #26, it seems like the SMBIOS type 9 records are incorrect. Handle 0x000C, DMI type 9, 17 bytes System Slot Information Designation: SLOT1_PCIE_G3_X16(CPU1) Type: x16 PCI Express x16 Current Usage: In Use Length: Long ID: 0 Characteristics: 3.3 V is provided Opening is shared PME signal is supported Hot-plug devices are supported Bus Address: 0000:00:02.0 <--------Observe this. The Bus Address is the address of the bridge above the add-in NIC. The bus address needs to be 03:00.0. The issue planned to be fixed in a future BIOS update. As the type 9 record is incorrect, biosdevname thinks that the NIC is an embedded NIC and names them em3 and em4 instead of pXpY.
Also, 1. Could you please provide the output from 'dmidecode -u' 2. It seems like the add-in device is in PCIE slot 1 and not 2. Could you please verify/confirm which slot the add-in NIC is plugged in ? (asking this because the description mentions p2p1 and p2p2, but they are being named em3 and em4 which seems to indicate that they are being treated as embedded devices and slot number seems to be 0)
Created attachment 860089 [details] dmidecode -u
Attached it as requested. Unfortunately I cannot physically check the PCI slot since I can't reboot this server , it's a critical server , but IMHO it's PCIE slot 1
Hi, thanks for attaching the output. From the 'dmidecode -u' output from comment#37, the slot number (offset 9th shown with ^^) is zero. As the slot number is zero, biosdevname is thinking that it is embedded. A fix is planned in a future BIOS update. Handle 0x000C, DMI type 9, 17 bytes Header and Data: 09 11 0C 00 01 AA 0D 04 04 00 00 0C 03 00 00 00 10 ^^ Strings: 53 4C 4F 54 31 5F 50 43 49 45 5F 47 33 5F 58 31 36 28 43 50 55 31 29 00 "SLOT1_PCIE_G3_X16(CPU1)"
Narendra, thanks for this informations. But did this behavior change between 0.4.1 an 0.5?
(In reply to Fabian Deutsch from comment #40) > Narendra, > > thanks for this informations. > > But did this behavior change between 0.4.1 an 0.5? Hi, sorry for the delay. Yes, it changed from 0.4.1 to 0.5. The change is that 0.5.0 introduced a new function 'smbios_setslot' in file 'src/dmidecode/dmidecode.c'. The system has a SMBIOS type 9 record for the bridge above the NIC device. In 0.5.0 biosdevname scans the secondary bus of this bridge device and sets the slot number, in this scenario, it is zero. The slot number set to zero is later interpreted by biosdevname as embedded device(em3 and em4). In 0.4.1, if biosdevname does not find slot number from SMBIOS type 9 and falls back on slot capability. From lspci -vvv output it seems to be set to 2 and hence it becomes p2p1. As mentioned a fix is planned in a future BIOS release to correctly set the type 9 record for the add-in NIC.
(In reply to Narendra K from comment #42) > (In reply to Fabian Deutsch from comment #40) > > Narendra, > > > > thanks for this informations. > > > > But did this behavior change between 0.4.1 an 0.5? > > Hi, sorry for the delay. Yes, it changed from 0.4.1 to 0.5. The change is > that 0.5.0 introduced a new function 'smbios_setslot' in file > 'src/dmidecode/dmidecode.c'. The system has a SMBIOS type 9 record for the > bridge above the NIC device. In 0.5.0 biosdevname scans the secondary bus of > this bridge device and sets the slot number, in this scenario, it is zero. > The slot number set to zero is later interpreted by biosdevname as embedded > device(em3 and em4). In 0.4.1, if biosdevname does not find slot number from > SMBIOS type 9 and falls back on slot capability. From lspci -vvv output it > seems to be set to 2 and hence it becomes p2p1. > > As mentioned a fix is planned in a future BIOS release to correctly set the > type 9 record for the add-in NIC. Hi Narendra, Thanks for this explanation. Do you know when a future release can be expected? Or is there a workaround / hotfix, to bring back the old 0.4.1 behavior?
(In reply to Dotan Paz from comment #38) > Attached it as requested. > Unfortunately I cannot physically check the PCI slot since I can't reboot > this server , it's a critical server , but IMHO it's PCIE slot 1 Hi, i think it would very helpful if this could be confirmed. (The issue description says the names changed from p2p1,p2p2 to em3 and em4. If the NIC was on slot 1, i expected the names to be p1p1 and p1p2. To understand this, confirming if the NIC is in slot 1 or slot 2 would be helpful).
Dotan, can you try to help Narendra with the question in comment 46?
(In reply to Fabian Deutsch from comment #43) > (In reply to Narendra K from comment #42) > > (In reply to Fabian Deutsch from comment #40) [...] > Hi Narendra, > > Thanks for this explanation. > Do you know when a future release can be expected? > Or is there a workaround / hotfix, to bring back the old 0.4.1 behavior? Hi, I missed to check one more detail earlier. Comment #34 mentioned that the system has a '/lib/udev/rules.d/60-net.rules' file. As i understand, this rules file get to run first on the network interfaces before '71-biosdevname.rules' runs. On an R610 with RHEL 6.5, the '/lib/udev/rules.d/60-net.rules' looks like - # cat 60-net.rules ACTION=="add", SUBSYSTEM=="net", DEVPATH=="/devices/virtual/net/lo", RUN+="/sbin/ifup $env{INTERFACE}" ACTION=="add", SUBSYSTEM=="net", PROGRAM="/lib/udev/rename_device", RESULT=="?*", ENV{INTERFACE_NAME}="$result" < ----- observe this. SUBSYSTEM=="net", RUN+="/etc/sysconfig/network-scripts/net.hotplug" '/lib/udev/rename_device' gets a chance to rename a network interface by looking at DEVICE= and HWADDR= in the /etc/sysconfih/network-scripts/ifcfg-* files. Before upgrading from RHEL 6.4 to RHEL 6.5, the ifcfg-p2p1 and ifcfg-p2p2 files would have something like DEVICE="p2p1" HWADDR="mac id of p2p1" This should have ensured that even across an upgrade, the network interface with "mac id" matching with HWADDR in ifcfg-p2p1 is renamed to p2p1. If this happens, biosdevname will not run because it seems NAME= already set. Could you please check the contents of ifcfg-* files before and after the upgrade ? It seems like this is not working and needs to be addressed if there is an issue. (Please let me know if i am missing something.)
One possible reason why '/lib/udev/rules.d/60-net.rules' is not actually renaming the interface could be that it is setting the new name as ENV{INTERFACE_NAME}="$result" instead of NAME="$result". As rules that run after this do not see NAME= set, biosdevname might be actually renaming it. ACTION=="add", SUBSYSTEM=="net", PROGRAM="/lib/udev/rename_device", RESULT=="?*", ENV{INTERFACE_NAME}="$result" Could you please modify the above line in '/lib/udev/rules.d/60-net.rules' as ACTION=="add", SUBSYSTEM=="net", PROGRAM="/lib/udev/rename_device", RESULT=="?*", NAME="$result" and check if it helps to work around this issue (names as same RHEL 6.4) ?
Exploring this further, on an R610 with RHEL 6.5, the system had biosdevname enabled. The interfaces were em1,em2,em3,em4, p2p1 and p2p2. I passed 'biosdevname=0' and rebooted the system and observed the following - 1. '/lib/udev/rules.d/60-net.rules' sets ENV{INTERFACE_NAME} to "$result" 2. '/lib/udev/rules.d/71-biosdevname.rules' does not run as 'biosdevname=0' is passed 3. '/lib/udev/rules.d/75-persistent-net-generator.rules' generates a '/etc/udev/rules/70-persistent-net.rules'. This ensures that names remain same across future reboots. # PCI device 0x14e4:0x1639 (bnx2) (custom name provided by external tool) SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:22:19:57:5a:0d", ATTR{type}=="1", KERNEL=="eth*", NAME="em1" # PCI device 0x8086:0x10fb (ixgbe) (custom name provided by external tool) SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:1b:21:8a:15:c5", ATTR{type}=="1", KERNEL=="eth*", NAME="p2p2" 4. '/lib/udev/rules.d/75-persistent-net-generator.rules' also renames the network interface to whatever INTERFACE_NAME is set to # rename interface if needed ENV{INTERFACE_NEW}=="?*", NAME="$env{INTERFACE_NEW}" It seems like the same behavior needs to exist in 'RHEV Hypervisor - 6.5 - 20131115.0.3.2.el6_5'. Could you please check if this is working in 'RHEV Hypervisor - 6.5 - 20131115.0.3.2.el6_5' ?
Ouyang, could you please provide the following: (In reply to Narendra K from comment #51) > Could you please check the contents of ifcfg-* files before and after the > upgrade ? It seems like this is not working and needs to be addressed if > there is an issue. (Please let me know if i am missing something.)
Created attachment 868333 [details] different mac for eth0 (In reply to Fabian Deutsch from comment #54) > Ouyang, > > could you please provide the following: > > (In reply to Narendra K from comment #51) > > Could you please check the contents of ifcfg-* files before and after the > > upgrade ? It seems like this is not working and needs to be addressed if > > there is an issue. (Please let me know if i am missing something.) 1. before upgrade, eth0 is set up and it's configure file is # cat ifcfg-eth0 DEVICE="eth0" HWADDR="00:10:18:81:a4:a0" ONBOOT="yes" 2. upgrade with BOOTIF=eth0, it's # cat ifcfg-eth0 DEVICE="eth0" HWADDR="00:10:18:81:a4:a0" ONBOOT="yes" But on TUI, eth0's MAC is "00:23:7d:53:ab:75", which is different from the ifcfg-eth0 attach the screenshot for the different mac address.
(In reply to Ouyang guohua from comment #55) > Created attachment 868333 [details] > different mac for eth0 [...] > > 1. before upgrade, eth0 is set up and it's configure file is > # cat ifcfg-eth0 > DEVICE="eth0" > HWADDR="00:10:18:81:a4:a0" > ONBOOT="yes" > > 2. upgrade with BOOTIF=eth0, it's > # cat ifcfg-eth0 > DEVICE="eth0" > HWADDR="00:10:18:81:a4:a0" > ONBOOT="yes" > These files do not seem to have biosdevname names are present in the system. I expected config files to be ifcfg-em1, ifcfg-em2, ifcfg-p2p1 and ifcfg-p2p2 (based on the details provided in issue description). Is the system C6220 ?
(In reply to Narendra K from comment #56) > (In reply to Ouyang guohua from comment #55) > > Created attachment 868333 [details] > > different mac for eth0 > [...] > > > > 1. before upgrade, eth0 is set up and it's configure file is > > # cat ifcfg-eth0 > > DEVICE="eth0" > > HWADDR="00:10:18:81:a4:a0" > > ONBOOT="yes" > > > > 2. upgrade with BOOTIF=eth0, it's > > # cat ifcfg-eth0 > > DEVICE="eth0" > > HWADDR="00:10:18:81:a4:a0" > > ONBOOT="yes" > > > > These files do not seem to have biosdevname names are present in the system. > I expected config files to be ifcfg-em1, ifcfg-em2, ifcfg-p2p1 and > ifcfg-p2p2 (based on the details provided in issue description). Is the > system C6220 ? no, my system is not c6220. ederevea, Could you please provide the requested information?
Hi , Unfortunately the exact host is not available for tests . I can try to find another one , except for model and nics , does it have to be same cpu/ram/anything else ?
(In reply to Dotan Paz from comment #62) > Hi , > Unfortunately the exact host is not available for tests . > I can try to find another one , except for model and nics , does it have to > be same cpu/ram/anything else ? I think the cpu/ram is not important, it's not must be the same.
(In reply to Dotan Paz from comment #62) > Hi , > Unfortunately the exact host is not available for tests . > I can try to find another one , except for model and nics , does it have to > be same cpu/ram/anything else ? Dotan, most important part here is the NIC and the position of the NIC on the pci bus .. IIUIC
OK ,I'll talk to some dep managers , see if they can spare one . How long do you need it for?
Until we resolved the bug. A week or two ...
This was probably fixed as a part of bug 1065256. Please test and re-open if needed.
(In reply to Fabian Deutsch from comment #43) > (In reply to Narendra K from comment #42) > > (In reply to Fabian Deutsch from comment #40) > > > Narendra, > > > > > > thanks for this informations. > > > > > > But did this behavior change between 0.4.1 an 0.5? > > > > Hi, sorry for the delay. Yes, it changed from 0.4.1 to 0.5. The change is > > that 0.5.0 introduced a new function 'smbios_setslot' in file > > 'src/dmidecode/dmidecode.c'. The system has a SMBIOS type 9 record for the > > bridge above the NIC device. In 0.5.0 biosdevname scans the secondary bus of > > this bridge device and sets the slot number, in this scenario, it is zero. > > The slot number set to zero is later interpreted by biosdevname as embedded > > device(em3 and em4). In 0.4.1, if biosdevname does not find slot number from > > SMBIOS type 9 and falls back on slot capability. From lspci -vvv output it > > seems to be set to 2 and hence it becomes p2p1. > > > > As mentioned a fix is planned in a future BIOS release to correctly set the > > type 9 record for the add-in NIC. > > Hi Narendra, > > Thanks for this explanation. > Do you know when a future release can be expected? Hi, please find the BIOS update here (please refer to the BIOS section - it has version 2.2.3). http://www.dell.com/support/drivers/us/en/19/Product/poweredge-c6220 > Or is there a workaround / hotfix, to bring back the old 0.4.1 behavior?
(In reply to Dotan Paz from comment #65) > OK ,I'll talk to some dep managers , see if they can spare one . Hi Dotan, Is this test hardware available now? And can you reproduce this bug then help us to verify this bug? Waiting for your input. Thanks Ying
Narendra, thanks for the update!
Hi Dotan, Any response for comment 70? Thanks Ying
Hi , Sorry for not replying. Unfortunately we don't have a spear HW atm . Could you obtain one from another place ?
There seems to be an update on bug 1058170 comment 5 - Do we have a customer who could test this fix with a draft RHEV-H build we would provide, or maybe Narendra?
Hi Fabian, does upgrading to the BIOS version mentioned in comment #69 not fix the issue ?
biosdevname-0.5.1-1.el6.x86_64 mentioned in comment 42 and comment 78 with 6.6 tag is included into rhev-hypervisor6-6.5-20140527.biosdevname.0.iso but as QE tried many times, still can not reproduce this bug always, QE only can do sanity testing for this bug verification. And the Dotan also can not test this bug on their hardware see comment 75. So suggested to better let customer test this bug, and QE will do sanity testing only on it. Thanks Ying
According to comment 80, we need to ensure which team can help to verify this bug. To David As there is customer case 01053347 attached to this bug, when RHEV-H 7.0 for rhev 3.5 build will build ready, could you help to test this bug to check whether the customer issue is fixed yet. Thanks Ying
I've asked the customer if they'd be happy to test this, but I'm not sure they'll be able to reproduce it now. I'm waiting to hear back from them.
(In reply to David Swegen from comment #83) > I've asked the customer if they'd be happy to test this, but I'm not sure > they'll be able to reproduce it now. I'm waiting to hear back from them. Hey David, any update here? Thanks
Since this is Test Only bug, so only do a sanity test test steps: 1) Installed fresh RHEV-H RHEVH-6.5-1017.0.el6ev from PXE 2) added it to RHEV-M 3.5.0-0.25.el6ev 3) installed RHEVH 6.6-20141212.0.el6ev on RHEV-M 4) upgraded RHEV-H via RHEV-M Actual results: upgrade is successful network works fine NIC name is consistent
*** Bug 1058170 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2015-0160.html