Bug 749479

Summary: adding bond to management network may result in ip address change and disconnection
Product: Red Hat Enterprise Linux 6 Reporter: Guohua Ouyang <gouyang>
Component: vdsmAssignee: Dan Kenigsberg <danken>
Status: CLOSED NOTABUG QA Contact: yeylon <yeylon>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.2CC: abaron, bazulay, cshao, gouyang, iheim, leiwang, mburns, moli, srevivo, ycui, ykaul
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: network
Fixed In Version: Doc Type: Bug Fix
Doc Text:
When adding a bond to an existing network, its world-visible MAC address may change. If the DHCP server is not aware that the new MAC address belongs to the same host as the old one, it may assign the host a different IP address, that is unknown to the DNS server nor to Red Hat Enterprise Virtualization Manager. As a result, Red Hat Enterprise Virtualization Manager VDSM connectivity is broken. To work around this issue, configure your DHCP server to assign the same IP for all the MAC addresses of slave NICs. Alternatively, when editing a management network, do not check connectivity, and make sure that Red Hat Enterprise Virtualization Manager / DNS use the newly-assigned IP address for the node.
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-05-30 14:12:54 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 812149    
Bug Blocks:    
Attachments:
Description Flags
failed to make bond network
none
succeed to make bond network
none
vdsm-4.9.6-7.log
none
successful with rhevm(eth0) + eth1
none
failed with rhevm(eth1) + eth0
none
vdsm.log
none
message vdsm.log
none
vdsm.log none

Description Guohua Ouyang 2011-10-27 08:00:39 UTC
Created attachment 530439 [details]
failed to make bond network

Description of problem:
Have two nics in machine, after RHEV-H is installed, configure eth1 , register to RHEV-M, approve it, after it shows up, maintenance it, then make bond with eth0 eth1, result is failed, tried 3-4 times, all failed. (bond mode is 4)
The strange thing is if choose to configure eth0 after rhevh is installed, then repeat above steps, it succeed to create bond4.
(attach both the failed logs and the succeed logs)

Version-Release number of selected component (if applicable):
6.2-20111019.2
vdsm-4.9-108.el6

How reproducible:
100%.

Steps to Reproduce:
1. Install RHEV-H on a machine have at least two nics.
2. Configure eth1 and register to RHEV-M.
3. Approve it in RHEV-M.
4. After it shows up, maintenance it.
5. Choose the two nics to make bond network.


Actual results:
Failed to make the bond network.

Expected results:
Make the bond network successfully.

Comment 1 Guohua Ouyang 2011-10-27 08:02:33 UTC
Created attachment 530440 [details]
succeed to make bond network

this time I was configuring eht0 and register to rhevm, it actually failed at the first time, but succeed at the second time.

Comment 4 Dan Kenigsberg 2012-04-10 12:45:14 UTC
gouyang, long time has passed, but now I do not see a failure in vdsm.log. Would you agree to reproduce the issue once you have a RHEV-6.3 with an operational vdsm-4.9.6?

Comment 5 Guohua Ouyang 2012-04-10 14:21:11 UTC
(In reply to comment #4)
> gouyang, long time has passed, but now I do not see a failure in vdsm.log.
> Would you agree to reproduce the issue once you have a RHEV-6.3 with an
> operational vdsm-4.9.6?

OK, I will try reproduce it on RHEL6.3 next week and try RHEV-H 6.3 when bz808626 fixed.

Comment 6 Guohua Ouyang 2012-04-18 07:35:37 UTC
Created attachment 578262 [details]
vdsm-4.9.6-7.log

Tested this on RHEL63 with vdsm-4.9.6-7, it also failed to create the bonding network with an error 503.

Comment 7 Guohua Ouyang 2012-04-18 10:00:34 UTC
(In reply to comment #6)
> Created attachment 578262 [details]
> vdsm-4.9.6-7.log
> 
> Tested this on RHEL63 with vdsm-4.9.6-7, it also failed to create the bonding
> network with an error 503.

Env is RHEL63 + RHEVM SI2.1 + vdsm-4.9.6-7,  error on RHEVM UI is "Error: A Request to the Server failed with the following Status Code: 503".

Comment 8 Dan Kenigsberg 2012-04-18 18:26:30 UTC
I believe that the log of attachment 578262 [details] is irrelevant to this bug, but to another one: but 812149. Please try to reproduce again once we have basic rhev-h functionality fixed.

Comment 9 RHEL Program Management 2012-05-04 04:04:58 UTC
Since RHEL 6.3 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 10 Guohua Ouyang 2012-05-15 10:08:28 UTC
Created attachment 584628 [details]
successful with rhevm(eth0) + eth1

re-tested on 6.3-20120509.1 which vdsm is 4.9.6-11, if the network rhevm is bridge with eth0, make bond with network rhevm and eth1 can successful, same as before.

Comment 11 Guohua Ouyang 2012-05-15 10:10:56 UTC
Created attachment 584629 [details]
failed with rhevm(eth1) + eth0

if the network rhevm is bridge with eth1, make bond with network rhevm and eth0 is failed, please refer vdsm.log for details.

Comment 12 Dan Kenigsberg 2012-05-24 08:13:35 UTC
the error

ifup-waiting-on-dhcp::WARNING::2012-05-15 09:56:22,663::configNetwork::83::root::(ifup) dhclient(14586) is already running - exiting. 

leads me to think that what's left of this bug is a dup of bug 755937. Don't you think? Please verify with http://gerrit.usersys.redhat.com/1289 applied.

Comment 13 Guohua Ouyang 2012-05-24 10:01:37 UTC
Created attachment 586598 [details]
vdsm.log

(In reply to comment #12)
> the error
> 
> ifup-waiting-on-dhcp::WARNING::2012-05-15
> 09:56:22,663::configNetwork::83::root::(ifup) dhclient(14586) is already
> running - exiting. 
> 
> leads me to think that what's left of this bug is a dup of bug 755937. Don't
> you think? Please verify with http://gerrit.usersys.redhat.com/1289 applied.

After apply this patch, cannot see above error but still failed to create the rhevm network over bonding, it seems been deleted after the device is created:

MainProcess|Thread-27::DEBUG::2012-05-24 09:44:39,528::configNetwork::219::root::(_persistentBackup) Persistently backed up /etc/sysconfig/network-scripts/ifcfg-eth0 (until next 'set safe config')
ifup-waiting-on-dhcp::INFO::2012-05-24 09:44:47,111::configNetwork::81::root::(ifup) 
Determining IP information for rhevm... done.

MainProcess|Thread-27::INFO::2012-05-24 09:46:41,203::configNetwork::726::root::(delNetwork) Removing network rhevm with vlan=None, bonding=bond4, nics=['eth0', 'eth1']. options={}

Comment 14 Dan Kenigsberg 2012-05-24 11:13:35 UTC
Thanks for researching into this.

The 120 seconds delay before delNetwork suggest that connectivity to the node was lost, so it decided to revert the changes.

Could you tell (/var/log/messages should show this) if the IP address has changed at 09:44:47 ? Does the issue reproduce if you configure static IP for rhevm network?

Comment 15 Guohua Ouyang 2012-05-29 08:04:58 UTC
Created attachment 587337 [details]
message vdsm.log

(In reply to comment #14)
> Thanks for researching into this.
> 
> The 120 seconds delay before delNetwork suggest that connectivity to the
> node was lost, so it decided to revert the changes.
> 
> Could you tell (/var/log/messages should show this) if the IP address has
> changed at 09:44:47 ? Does the issue reproduce if you configure static IP
> for rhevm network?

re-test this, /var/log/messages show that IP address is changed at the time when ifup rhevm.

/var/log/messages:
May 29 06:39:37 localhost ntpd[9782]: Listening on interface #5 rhevm, 10.66.8.209#123 Enabled

vdsm.log:
MainProcess|Thread-19::WARNING::2012-05-29 06:39:27,923::configNetwork::72::root::(ifup) /etc/sysconfig/network-scripts/ifup-eth: line 136: echo: write error: Operation not permitted

ifup-waiting-on-dhcp::INFO::2012-05-29 06:39:37,919::configNetwork::70::root::(ifup)
Determining IP information for rhevm... done.

MainProcess|Thread-19::INFO::2012-05-29 06:41:29,067::configNetwork::551::root::(delNetwork) Removing bridge rhevm with vlan=None, bonding=bond4, nics=['eth0', 'eth1']. options={}
MainProcess|Thread-19::WARNING::2012-05-29 06:41:30,188::configNetwork::61::root::(ifdown)
MainProcess|Thread-19::DEBUG::2012-05-29 06:41:30,412::configNetwork::162::root::(_atomicBackup) Backed up /etc/sysconfig/network-scripts/ifcfg-eth0

While Configure static IP met a network error issue.

Comment 16 Dan Kenigsberg 2012-05-29 09:32:01 UTC
I'm sure I follow. If dhcp gives the host a new IP address (different than the one it used before), then 'connectivityCheck' has very high probability to fail. That's not surprising. Is this the fact?

Which "network error issue" did you meet when you have set a static IP? Could you state all the addresses that you host has?

Comment 17 Guohua Ouyang 2012-05-29 10:03:15 UTC
Created attachment 587375 [details]
vdsm.log

(In reply to comment #16)
> I'm sure I follow. If dhcp gives the host a new IP address (different than
> the one it used before), then 'connectivityCheck' has very high probability
> to fail. That's not surprising. Is this the fact?

Yes, the new IP address is different from the one it used before.

> 
> Which "network error issue" did you meet when you have set a static IP?
> Could you state all the addresses that you host has?

Did not meet the error this time, configure static IP same with the used one 10.66.9.50 successful this time, but configure static IP with another one 10.66.8.209 failed.  please check the log.

on my host, can obtain IP 10.66.9.50 & 10.66.8.209.

Comment 18 Dan Kenigsberg 2012-05-30 08:27:09 UTC
(In reply to comment #17)
> 
> Yes, the new IP address is different from the one it used before.

That may happen because the mac address of the bond is different from the mac address of the nic that used to hold the former IP address. Is this the case?

Comment 19 Guohua Ouyang 2012-05-30 09:05:49 UTC
(In reply to comment #18)
> (In reply to comment #17)
> > 
> > Yes, the new IP address is different from the one it used before.
> 
> That may happen because the mac address of the bond is different from the
> mac address of the nic that used to hold the former IP address. Is this the
> case?

yes, the rhevm is bridged over eth1, when creating bond with eth0 eth1, it will use eth0's mac address.

Comment 20 Dan Kenigsberg 2012-05-30 14:12:54 UTC
We cannot guarantee that the world-visible mac address for the management network is never changed (consider replacing eth0 with eth1). dhcp server should assign the two addresses the same ip

Comment 21 Dan Kenigsberg 2012-05-30 14:12:54 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
When adding a bond to an existing network, its world-visible mac address may change. If dhcp server is not aware that the new mac address belong to the same host as the old one, it may assign the host a different ip address, that is unknown to the dns nor to rhevm. Hence rhevm-vdsm connectivity is broken.

To mitigate this, configure your dhcp server to assign the same ip for all the macs of slave nics. Alternatively, when editting management network, do not check connectivity, and conivnce rhevm/dns to use the newly-assigned ip address for the node.

Comment 22 Martin Prpič 2012-05-31 08:47:02 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,3 +1,2 @@
-When adding a bond to an existing network, its world-visible mac address may change. If dhcp server is not aware that the new mac address belong to the same host as the old one, it may assign the host a different ip address, that is unknown to the dns nor to rhevm. Hence rhevm-vdsm connectivity is broken.
+When adding a bond to an existing network, its world-visible MAC address may change. If the DHCP server is not aware that the new MAC address belongs to the same host as the old one, it may assign the host a different IP address, that is unknown to the DHS server nor to Red Hat Enterprise Virtualization Manager. As a result, Red Hat Enterprise Virtualization Manager VDSM connectivity is broken.
-
+    To work around this issue, configure your DHCP server to assign the same IP for all the MAC addresses of slave NICs. Alternatively, when editing a management network, do not check connectivity, and make sure that Red Hat Enterprise Virtualization Manager / DNS use the newly-assigned IP address for the node.-To mitigate this, configure your dhcp server to assign the same ip for all the macs of slave nics. Alternatively, when editting management network, do not check connectivity, and conivnce rhevm/dns to use the newly-assigned ip address for the node.

Comment 23 Martin Prpič 2012-05-31 09:08:25 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,2 +1,2 @@
-When adding a bond to an existing network, its world-visible MAC address may change. If the DHCP server is not aware that the new MAC address belongs to the same host as the old one, it may assign the host a different IP address, that is unknown to the DHS server nor to Red Hat Enterprise Virtualization Manager. As a result, Red Hat Enterprise Virtualization Manager VDSM connectivity is broken.
+When adding a bond to an existing network, its world-visible MAC address may change. If the DHCP server is not aware that the new MAC address belongs to the same host as the old one, it may assign the host a different IP address, that is unknown to the DNS server nor to Red Hat Enterprise Virtualization Manager. As a result, Red Hat Enterprise Virtualization Manager VDSM connectivity is broken.
     To work around this issue, configure your DHCP server to assign the same IP for all the MAC addresses of slave NICs. Alternatively, when editing a management network, do not check connectivity, and make sure that Red Hat Enterprise Virtualization Manager / DNS use the newly-assigned IP address for the node.