Bug 487718

Summary: oVirt node reassigns network devices despite persistance
Product: Red Hat Enterprise Linux 5 Reporter: Bill Nottingham <notting>
Component: initscriptsAssignee: initscripts Maintenance Team <initscripts-maint-list>
Status: CLOSED ERRATA QA Contact: BaseOS QE <qe-baseos-auto>
Severity: high Docs Contact:
Priority: high    
Version: 5.3CC: harald, jonathan.beckman, notting, ovirt-maint, rvokal, yeylon
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 487090 Environment:
Last Closed: 2009-09-02 11:12:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 487090    

Description Bill Nottingham 2009-02-27 16:28:23 UTC
+++ This bug was initially created as a clone of Bug #487090 +++

Description of problem:
I have a oVirt node install (hard drive) with three network interfaces eth0, eth1, eth2.  These are defined in /configure tree and the ifcfg-eth files contain MAC ids for each NIC.  The bridging is also based on the nic definitions. 

If I remove eth0 from the system and replace it with a card from another vendor and then boot the node, the order of the nics is now incorrect.  eth1 is now eth0.  This completely breaks the bridging that was previously defined and  persisted.  

Version-Release number of selected component (if applicable):


How reproducible:

Haven't tried to reproduce this one.

Steps to Reproduce:
1. Have a system with multiple nics, one of them being the Intel 10G card 
2. define fixed ips and bridging in the initial installation.
3. shutdown and remove eth0, put a card from a Chelsio in the pcie socket
4.  reboot the system and examine the network interfaces.
  
Actual results:
the onboard nic, which was installed as eth1 is now eth0.  It is bridged into existing guests and effectively puts them on the wrong network

Expected results:
The discovery on boot should follow the RHEL behavior and not change the nic designation if the MACID is present.

Additional info:

--- Additional comment from apevec on 2009-02-24 06:12:52 EDT ---

Please attach ifcfg-*
Could we get the console access to the machine?
In my testing, I tried swapping NICs and they got renamed on boot.

Did you try exactly the same use-case on RHEL?
It's the same initscripts, only difference could be that Node doesn't have kudzu.

--- Additional comment from mwagner on 2009-02-24 07:02:33 EDT ---

[root@perf23 ~]# cd /config/etc/sysconfig/network-scripts/

[root@perf23 network-scripts]# cat ifcfg-eth0
DEVICE=eth0
HWADDR=00:1b:21:10:04:bf
BRIDGE=breth0
ONBOOT=yes

[root@perf23 network-scripts]# cat ifcfg-breth0
DEVICE=breth0
TYPE=Bridge
PEERNTP=yes
DELAY=0
BOOTPROTO=none
IPADDR=172.17.10.23
NETMASK=255.255.255.0
ONBOOT=yes

[root@perf23 network-scripts]# cat ifcfg-eth1
DEVICE=eth1
HWADDR=00:30:48:5f:65:4a
BRIDGE=breth1
ONBOOT=yes

[root@perf23 network-scripts]# cat ifcfg-breth1
DEVICE=breth1
TYPE=Bridge
PEERNTP=yes
DELAY=0
BOOTPROTO=dhcp
#IPADDR=192.168.1.23
#NETMASK=255.255.255.0
ONBOOT=yes

[root@perf23 network-scripts]# cat ifcfg-eth2
DEVICE=eth2
HWADDR=00:30:48:5f:65:4b
BRIDGE=breth2
ONBOOT=yes

[root@perf23 network-scripts]# cat ifcfg-breth2
DEVICE=breth2
TYPE=Bridge
PEERNTP=yes
DELAY=0
BOOTPROTO=none
IPADDR=192.168.1.23
NETMASK=255.255.255.0
ONBOOT=yes

--- Additional comment from mwagner on 2009-02-24 07:04:54 EDT ---

I did not try the exact same config on RHEL in this specific instance.  However I have done this multiple times on RHEL and the renaming of existing NICs does not happen.  I also talked with the network driver maintainer when i noticed this and he said it should not get renamed if it has the HWADDR specified in the ifcfg-eth file

--- Additional comment from mwagner on 2009-02-24 07:08:51 EDT ---

The other interesting aspect of this is that the ifcfg-eth and ifcfg-breth files do not get modified on reboot.  

I was able to work around this entire issue by going into the ifcfg-eth0 file and giving it the HWADDR of the new card. Things now work as expected. 

Perhaps the above fix would indicate that the DEVICE line in the files is not being honored if the HWADDR is not found.

--- Additional comment from apevec on 2009-02-24 07:44:30 EDT ---

It's the initscripts which tries to rename the interface to match ifcfg-*
so the same issue must be present in RHEL. We'll investigate.

--- Additional comment from apevec on 2009-02-24 10:04:51 EDT ---

RHEL ifup-eth does this:

# remap, if the device is bound with a MAC address and not the right device num
# bail out, if the MAC does not fit
if [ -n "${HWADDR}" ]; then
    FOUNDMACADDR=`get_hwaddr ${REALDEVICE}`
    if [ "${FOUNDMACADDR}" != "${HWADDR}" ]; then
        curdev=`get_device_by_hwaddr ${HWADDR}`
        if [ -n "$curdev" ]; then
          rename_device "${REALDEVICE}" "${HWADDR}" "${curdev}" || {

it will only rename eth to match ifcfg if specified HWADDR exist in the system.
If NIC is removed and another one gets enumerated as old eth0, it will happily continue, leaving it as eth0 and failing later when processing ifcfg-eth2 where it was previously enumerated:

Bringing up interface eth0:  [  OK  ]
Bringing up interface eth2:  Device eth2 does not seem to be present, delaying initialization.
[FAILED]

It should instead fail on ifup eth0 and succeed on ifup eth2 !

Changing component to initscripts

--- Additional comment from notting on 2009-02-25 18:44:49 EDT ---

You did a hardware reconfiguration on the node, yet have removed the piece of software from the node that is used to account for this. I'm not seeing where this is a bug.

--- Additional comment from apevec on 2009-02-25 19:23:11 EDT ---

(In reply to comment #7)
> removed the piece of software from the node that is used to account for this

You mean kudzu? What will kudzu do in this case? Would it automatically adjust ifcfg-* ?

--- Additional comment from notting on 2009-02-25 19:28:09 EDT ---

It will (in RHEL 5) remove the config for the old device, and write a basic one for the new device.

--- Additional comment from mwagner on 2009-02-25 22:20:14 EDT ---

re Comment #7

Bill while its a matter of semantics, from the end user point of view, this is a bug.  
I appreciate that you able to determine that the initscripts are functioning correctly given the components installed.  It would appear that the bug exists because kudzu is not installed. 

Moving this to ovirt-node-image as it looks like kudzu needs to be there.

--- Additional comment from apevec on 2009-02-26 05:08:08 EDT ---

Don't forget that node-image is stateless, I don't think kudzu was designed for that, e.g. removing bind-mounted ifcfg-* files would fail.

What about this patch to initscripts to ensure correctness even w/o relying on kudzu: ifup should fail if HWADDR is specified and not present in the system

--- ifup-eth.orig	2009-02-26 04:37:17.000000000 -0500
+++ ifup-eth	2009-02-26 04:44:28.000000000 -0500
@@ -44,10 +44,13 @@
         if [ -n "$curdev" ]; then
 	  rename_device "${REALDEVICE}" "${HWADDR}" "${curdev}" || {
 	    echo $"Device ${DEVICE} has different MAC address than expected, ignoring."
 	    exit 1
 	  }
+	else
+	    echo $"Device ${DEVICE} has different MAC address than expected, ignoring."
+	    exit 1
 	fi
     fi
 fi
 
 if [ "${TYPE}" = "Bridge" ]; then



In anycase, we'll document that in case of hardware change, local configuration should be redone manually.

--- Additional comment from apevec on 2009-02-26 18:10:35 EDT ---

tested above patch, seems to be doing the right thing IMO.
pushing to RHEL-5-OVIRT in initscripts-8.45.25-7

--- Additional comment from notting on 2009-02-27 11:21:22 EDT ---

Given that the result of the patch is (essentially) just changing the failure mode in the face of invalid configuration, I'm not in a huge hurry to push it into base RHEL.

Comment 2 Harald Hoyer 2009-05-05 12:52:41 UTC
Please test the erratum candidate:
http://people.redhat.com/harald/downloads/initscripts/initscripts-8.45.26.1.el5/

Comment 6 Bill Nottingham 2009-08-21 14:45:33 UTC
*** Bug 518600 has been marked as a duplicate of this bug. ***

Comment 7 errata-xmlrpc 2009-09-02 11:12:47 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1344.html