Bug 1209401
Summary: | [RHEV-H] vdsm with predefined bonds is down after upgrade from 3.5.0 to 3.5.1 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | cshao <cshao> | ||||||||
Component: | vdsm | Assignee: | Fabian Deutsch <fdeutsch> | ||||||||
Status: | CLOSED DUPLICATE | QA Contact: | Aharon Canan <acanan> | ||||||||
Severity: | urgent | Docs Contact: | |||||||||
Priority: | urgent | ||||||||||
Version: | 3.5.1 | CC: | bazulay, danken, dougsland, ecohen, fdeutsch, gklein, hadong, huiwa, ibarkan, leiwang, lpeer, lsurette, lvernia, mburman, mkalinin, rbarry, sbonazzo, yaniwang, ycui, yeylon, ylavi | ||||||||
Target Milestone: | --- | Keywords: | Reopened | ||||||||
Target Release: | 3.5.1 | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | network | ||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2015-04-21 13:11:41 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | Network | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 1193058 | ||||||||||
Attachments: |
|
Created attachment 1011677 [details]
7.0 upgrade to 7.1 failed
Update vdsm and rhevm version info in here: vdsm-4.16.13-1.el7ev.x86_64 RHEVM vt14.2 (3.5.1-0.3.el6ev) Hello shaochen, (In reply to shaochen from comment #0) > Created attachment 1011676 [details] > nic-info > > Description of problem: > Host gone to non-operational state after upgrade from 7.0 to 7.1, and RHEV-M > bridge disappeared. > > Version-Release number of selected component (if applicable): > rhev-hypervisor7-7.0-20150127 > rhev-hypervisor7-7.1-20150402.0.el7ev > ovirt-node-3.2.2-3.el7.noarch > > How reproducible: > 100% > > Steps to Reproduce: > 1. Install RHEV-H GA build(rhev-hypervisor7-7.0-20150127) with [bond+ vlan] > network, Could you please provide step by step how did you configure your bond+vlan settings so I could replicate locally? Example: Was it via TUI? Also, I would appreciate if you could provide the below data: Before the upgrade (after installation/registration/approval of rhev-hypervisor7-7.0-20150127 in RHEV-M and host is UP) =============================================================================== * The output of ls -la /config/etc/sysconfig/network-scripts/* * The output of ls -la /etc/sysconfig/network-scripts/* * is ifcfg-rhevm persisted? The host keeps the ifcfg-rhevm across reboots? * ifconfig -a Thanks! I met the same issue when upgrade from 6.6 for 3.5.0 (rhev-hypervisor6-6.6-20150128.0)-> 6.6 for 3.5.1 latest (rhev-hypervisor6-6.6-20150402.0). Test steps: 1. Install 6.6 for 3.5.0 (rhev-hypervisor6-6.6-20150128.0)- 2. Configure [bond+ vlan] network via TUI. 3. Add rhevh to RHEV-M via RHEV-M UI. 4. Maintenance the host. 5. Upgrade to 6.6 for 3.5.1 latest (rhev-hypervisor6-6.6-20150402.0). Test result: 1. RHEV-M side: Host gone to non-operational state, ifcfg-rhevm disappeared. 2. RHEV-H side: 1) Networking show as "Unknown" 2) Bond status show as "Unconfigured" 3) RHEV-M bridge disappeared. I will reply #c4 ASAP. Thanks! Thanks shaochen, that helps. I could reproduce the original report using a different scenario as well: - Install RHEV-H 20150127.0 # cat /etc/redhat-release Red Hat Enterprise Virtualization Hypervisor 7.0 (20150127.0.el7ev) # rpm -qa | grep -i vdsm vdsm-xmlrpc-4.16.8.1-6.el7ev.noarch vdsm-python-4.16.8.1-6.el7ev.noarch vdsm-cli-4.16.8.1-6.el7ev.noarch vdsm-yajsonrpc-4.16.8.1-6.el7ev.noarch vdsm-4.16.8.1-6.el7ev.x86_64 vdsm-hook-ethtool-options-4.16.8.1-6.el7ev.noarch ovirt-node-plugin-vdsm-0.2.0-18.el7ev.noarch vdsm-python-zombiereaper-4.16.8.1-6.el7ev.noarch vdsm-jsonrpc-4.16.8.1-6.el7ev.noarch vdsm-reg-4.16.8.1-6.el7ev.noarch vdsm-hook-vhostmd-4.16.8.1-6.el7ev.noarch - Setup the nic to dhcp via network TUI - Register to RHEV-M via TUI (or call via shell /usr/share/vdsm-reg/vdsm-reg-setup) - Approve in RHEV-M Web admin * Host will be up, the network settings will be available via /etc/sysconfig/network-scripts/ but not persisted. Data from tests ======================= #1 Settings when configuring network via ovirt-node TUI # ls -la /etc/sysconfig/network-scripts/ifcfg-* -rw-r--r--. 1 root root 137 Apr 8 17:06 /etc/sysconfig/network-scripts/ifcfg-ens3 -rw-r--r--. 1 root root 64 Apr 8 17:06 /etc/sysconfig/network-scripts/ifcfg-lo # cat /config/etc/sysconfig/network-scripts/ifcfg-ens3 BOOTPROTO="dhcp" DEVICE="ens3" HWADDR="52:54:00:da:98:4e" IPV6INIT="no" IPV6_AUTOCONF="no" NM_CONTROLLED="no" ONBOOT="yes" PEERNTP="yes" # cat /config/etc/sysconfig/network-scripts/ifcfg-lo DEVICE="lo" IPADDR="127.0.0.1" NETMASK="255.0.0.0" ONBOOT="yes" Both files are persisted correctly: **************************************************** # ls /config/etc/sysconfig/network-scripts/ ifcfg-ens3 ifcfg-lo Registered the node into rhevm ===================================== Host will be pending approval in RHEV-M web admin, however the network settings in the node vdsm now owns it: # cat /etc/sysconfig/network-scripts/ifcfg-ens3 # Generated by VDSM version 4.16.8.1-6.el7ev DEVICE=ens3 HWADDR=52:54:00:da:98:4e BRIDGE=rhevm ONBOOT=yes NM_CONTROLLED=no IPV6_AUTOCONF=no PEERNTP=yes IPV6INIT=no # cat /etc/sysconfig/network-scripts/ifcfg-rhevm # Generated by VDSM version 4.16.8.1-6.el7ev DEVICE=rhevm TYPE=Bridge DELAY=0 STP=off ONBOOT=yes BOOTPROTO=dhcp DEFROUTE=yes NM_CONTROLLED=no IPV6_AUTOCONF=no PEERNTP=yes IPV6INIT=no HOTPLUG=no However, there is no more ifcfg-ens3 persisted or ifcfg-rhevm file: # ls /config/etc/sysconfig/network-scripts/ ifcfg-lo The persist command in ovirt-node also is working, as example in the same scenario I have execute the persist into ifcfg-rhevm: # ls /config/etc/sysconfig/network-scripts/ ifcfg-lo # persist /etc/sysconfig/network-scripts/ifcfg-rhevm # ls /config/etc/sysconfig/network-scripts/ifcfg-rhevm Additional data ================= vdsm-reg.log MainThread::DEBUG::2015-04-08 17:27:37,754::deployUtil::487::root::Bridge rhevm not found, need to create it. MainThread::DEBUG::2015-04-08 17:27:37,754::vdsm-reg-setup::94::root::renameBridge begin. MainThread::DEBUG::2015-04-08 17:27:37,754::deployUtil::1015::root::makeBridge begin. MainThread::DEBUG::2015-04-08 17:27:37,755::deployUtil::438::root::_getMGTIface: read host name: 192.168.122.70 MainThread::DEBUG::2015-04-08 17:27:37,755::deployUtil::446::root::_getMGTIface: using host name 192.168.122.70 strIP= 192.168.122.70 MainThread::DEBUG::2015-04-08 17:27:37,755::deployUtil::453::root::_getMGTIface IP=192.168.122.70 strIface=ens3 MainThread::DEBUG::2015-04-08 17:27:37,756::deployUtil::1059::root::makeBridge found the following bridge paramaters: ['BOOTPROTO=dhcp', 'IPV6INIT=no', 'IPV6_AUTOCONF=no', 'ONBOOT=yes', 'PEERNTP=yes'] MainThread::DEBUG::2015-04-08 17:27:37,760::deployUtil::140::root::['/usr/share/vdsm/addNetwork', 'rhevm', '', '', 'ens3', 'BOOTPROTO=dhcp', 'IPV6INIT=no', 'IPV6_AUTOCONF=no', 'ONBOOT=yes', 'PEERNTP=yes', 'blockingdhcp=true'] MainThread::DEBUG::2015-04-08 17:27:42,537::deployUtil::149::root:: MainThread::DEBUG::2015-04-08 17:27:42,538::deployUtil::150::root::libvirt: Network Driver error : Network not found: no network with matching name 'vdsm-rhevm' MainThread::DEBUG::2015-04-08 17:27:42,538::deployUtil::140::root::['/usr/share/vdsm/get-conf-item', '/etc/vdsm/vdsm.conf', 'vars', 'net_persistence', 'ifcfg'] MainThread::DEBUG::2015-04-08 17:27:42,555::deployUtil::149::root::unified MainThread::DEBUG::2015-04-08 17:27:42,555::deployUtil::150::root:: MainThread::DEBUG::2015-04-08 17:27:42,556::deployUtil::140::root::['/usr/share/vdsm/vdsm-store-net-config', 'unified'] MainThread::DEBUG::2015-04-08 17:27:43,869::deployUtil::149::root:: MainThread::DEBUG::2015-04-08 17:27:43,869::deployUtil::150::root:: MainThread::DEBUG::2015-04-08 17:27:43,869::deployUtil::1144::root::makeBridge return. MainThread::DEBUG::2015-04-08 17:27:43,870::deployUtil::140::root::['/usr/share/vdsm/vdsm-store-net-config'] MainThread::DEBUG::2015-04-08 17:27:43,883::deployUtil::149::root:: MainThread::DEBUG::2015-04-08 17:27:43,883::deployUtil::150::root:: MainThread::ERROR::2015-04-08 17:27:43,883::vdsm-reg-setup::124::root::renameBridge: failed to chmod bridge file MainThread::DEBUG::2015-04-08 17:27:43,883::vdsm-reg-setup::126::root::renameBridge return. MainThread::DEBUG::2015-04-08 17:27:43,884::vdsm-reg-setup::238::root::execute: after renameBridge: False MainThread::DEBUG::2015-04-08 17:27:43,884::vdsm-reg-setup::316::root::Registration status: False Please note that the first registration failed after /usr/share/vdsm/vdsm-store-net-config and /var/lib/vdsm/netconfback is empty so in the end the chmod operation couldn't be executed or even file persisted. More detailed from vdsm-reg-setup.in ========================================== SCRIPT_NAME_SAVE = "vdsm-store-net-config" # Rename existing bridge fReturn = deployUtil.makeBridge(self.vdcName, self.vdsmDir) if not fReturn: logging.error("renameBridge Failed to rename existing bridge!") # Persist changes if fReturn: try: out, err, ret = deployUtil._logExec( [os.path.join(self.vdsmDir, SCRIPT_NAME_SAVE)]) <snip> os.chmod( ( "/config/etc/sysconfig/network-scripts/ifcfg-" + MGT_BRIDGE_NAME ), 0o644 ) except: fReturn = False logging.error("renameBridge: failed to chmod bridge file") From deployUtil.py ======================= # Add bridge if fReturn: try: lstBridgeOptions.append('blockingdhcp=true') out, err, ret = _logExec([os.path.join(vdsmDir, SCRIPT_NAME_ADD), bridgeName, vlan, bonding, nic] + lstBridgeOptions) if ret: raise Exception('Failed to add bridge') # Save current config by removing the undo files: try: if fReturn: if fIsOvirt: out, err, ret = _logExec( [os.path.join(vdsmDir, SCRIPT_NAME_GET_CONFIG), P_VDSM_CONF, 'vars', 'net_persistence', 'ifcfg']) if ret: raise Exception('Failed to retrieve vdsm persistence ' 'mode. Stderr: %s' % err) net_persistence = out.strip() out, err, ret = _logExec( [os.path.join(vdsmDir, SCRIPT_NAME_STORE_NET_CONFIG), net_persistence]) if ret: raise Exception('Failed to persist vdsm networking ' 'configuration. Stderr: %s' % err) else: setSafeVdsmNetworkConfig() From this perspective ovirt-node is working as expected, I am moving to vdsm component for the vdsm network guys review the bugzilla. Please let me know if you guys have any patch or test so I can quickly help. Created attachment 1012344 [details]
logs apr8
*** This bug has been marked as a duplicate of bug 1209486 *** Re-opening this because we are not sure if ifcfg-rhevm must be persisted, thus we can not make it a dupe of bug 1209486. In RHEV 3.5.0 network configuration files which were created by other parties than vdsm were unpersisted once the host got registered to Engine. This lead to the problem that devices like bonds and bridges did not come up, because either the configuration for the device itself, or for a required (i.e. slave) got unpersisted and was not available any longer. This had several effects: - Devices appeared as unconfigured in the TUI - vdsmd did not come up because libvirtd was not coming up because it found no device to bind to - because vdsmd did not come up no other devices were configured Devices which were created in Engine (and thus owned by vdsm) were not affected. This bug can not be fixed, because we can not bring back the unpersisted configuration files. RHEV 3.5.1 has the logic to also persist config files which were not created by vdsm. *** This bug has been marked as a duplicate of bug 1205711 *** |
Created attachment 1011676 [details] nic-info Description of problem: Host gone to non-operational state after upgrade from 7.0 to 7.1, and RHEV-M bridge disappeared. Version-Release number of selected component (if applicable): rhev-hypervisor7-7.0-20150127 rhev-hypervisor7-7.1-20150402.0.el7ev ovirt-node-3.2.2-3.el7.noarch How reproducible: 100% Steps to Reproduce: 1. Install RHEV-H GA build(rhev-hypervisor7-7.0-20150127) with [bond+ vlan] network, 2. Add rhevh to RHEV-M via RHEV-M UI. 3. Maintenance the host. 4. Upgrade to rhev-hypervisor7-7.1-20150402.0.el7ev. Actual results: 1. RHEV-M side: Host gone to non-operational state, ifcfg-rhevm disappeared. 2. RHEV-H side: 1) Networking show as "Unknown" 2) Bond status show as "Unconfigured" 3) RHEV-M bridge disappeared. Expected results: rhevh 7.1 host UP after upgrade from rhevh 7.0 GA. Additional info: 2015-04-07 04:25:54,631 INFO Effective changes {'nics': 'bond1'} 2015-04-07 04:25:55,718 ERROR An error appeared in the UI: UnknownNicError("Unknown network interface: 'bond1'",) 2015-04-07 04:25:55,718 INFO Exception: Traceback (most recent call last):