Bug 1185032

Summary: VDSM Service is not starting RHEV-M 3.5 Beta
Product: Red Hat Enterprise Virtualization Manager Reporter: Robert McSwain <rmcswain>
Component: ovirt-hosted-engine-haAssignee: Petr Horáček <phoracek>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Meni Yakove <myakove>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.5.0CC: asegurap, bcarlson, danken, ecohen, gklein, iheim, istein, lsurette, mburman, nicolas, rmcswain, sbonazzo, stirabos
Target Milestone: ---   
Target Release: 3.5.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: network
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-03-18 12:44:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Network RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Robert McSwain 2015-01-22 18:32:22 UTC
Description of problem:
VDSM restore.net config is throwing a backtrace and the vdsm service is not starting after setting up a Hosted Engine setup. This is after a reboot and when trying to ultimately set this system up with bonded interfaces with tagged vlan interfaces.

Provisioned with 2 interfaces bonded in mode 4,  four vlans, three tagged and one native (used for initial kickstart and ssh access. Intended vlan/subnet for RHEVM is supposed to be NAT-access only via the host system, iptables provisioned and working via a non-bonded installation. 

Note: Installation via single interfaces does not present this issue.

Bonded install was attempted two different ways, both with the same problem: vdsm-restore-net-config fails on reboot. One install method was basic, initial bonded interface in place with tagged vlans. Other method was run install on single interface, then convert to bonded after working.

backtrace:
:vdsm-restore-net-config:91:_filter_nets_bonds:KeyError: u'bond0'
:
:Traceback (most recent call last):
:  File "/usr/share/vdsm/vdsm-restore-net-config", line 137, in <module>
:    restore()
:  File "/usr/share/vdsm/vdsm-restore-net-config", line 123, in restore
:    unified_restoration()
:  File "/usr/share/vdsm/vdsm-restore-net-config", line 66, in unified_restoration
:    persistentConfig.bonds)
:  File "/usr/share/vdsm/vdsm-restore-net-config", line 91, in _filter_nets_bonds
:    bonds[bond]['nics'], net)
:KeyError: u'bond0'
:
:Local variables in innermost frame:
:bonds: {}
:available_bonds: {}
:available_nets: {}
:attrs: {u'bondingOptions': u'mode=4 lacp_rate=1 miimon=200', u'vlan': 3009, u'ipaddr': u'c.c.113.211', u'netmask': u'255.255.255.0', u'bonding': u'bond0', u'bootproto': u'static'}
:available_nics: ['eth0', 'eth1', 'eth2', 'eth3']
:net: 'rhevm'
:nets: {'rhevm': {u'bondingOptions': u'mode=4 lacp_rate=1 miimon=200', u'vlan': 3009, u'ipaddr': u'c.c.113.211', u'netmask': u'255.255.255.0', u'bonding': u'bond0', u'bootproto': u'static'}}
:bond: u'bond0'

Version-Release number of selected component (if applicable):
Single node with manual work around on boot:
rhevm-3.5.0-0.20.el6ev.noarch
vdsm-4.16.7.4-1.el6ev.x86_64

Hypervisors 6.5 and 6.6 both exhibit the same behavior

How reproducible:
Unknown

Steps to Reproduce:
Install RHEL 6.6 and update.
Provision interface with native(untagged) VLAN and one tagged VLAN (nonrouted VLAN for RHEVM).
Install/setup rhevm.
Install/setup self-hosted engine.
Stop self-hosted engine, ovirt-ha-agent and sanlock (kill -9 required) and reboot.
On reboot, vdsm-restore-net-config fails with abrt response noted.

Actual results:
vdsm-restore-net-config fails with the ABRT above

Expected results:
Everything works as normal

Additional info:
Worked around the vdsm network problem by installing with single interface instead of bonded. However, this failed as well, the console address provisioned for a new VM by RHEVM was on the wrong IP address, it presented the gateway c.c.113.1 address instead of the host system

This bug has the same traceback and is one of the few instances on the internet - https://bugzilla.redhat.com/show_bug.cgi?format=multiple&id=1154399

Comment 1 Simone Tiraboschi 2015-01-23 10:16:45 UTC
ovirt-hosted-engine-setup currently doesn't support vlan on bonded interface: it's an RFE for 3.6, please see https://bugzilla.redhat.com/1134346

Deploying with the plain interface and than moving to the bonded configuration could be a solution but it requires some manual action as described here: https://bugzilla.redhat.com/show_bug.cgi?id=1154399#c3

Comment 2 Simone Tiraboschi 2015-01-23 10:46:49 UTC
Antoni, can you please give a look too on why VDSM doesn't correctly restart on reboots?

Comment 3 Bill Carlson 2015-01-23 20:04:48 UTC
Re: https://bugzilla.redhat.com/show_bug.cgi?id=1134346

Minor issue, hosted-engine --deploy DID accept a vlan interface at config time and worked fine until host reboot. FYI.

Comment 4 Sandro Bonazzola 2015-01-26 08:11:40 UTC
Has the bond0 interface been created manually?
Soes /etc/sysconfig/network-scripts/ifcfg-bond0 have any of these headers:

- '# Generated by VDSM version'
- '# automatically generated by vdsm'

If the bond0 interface has been created manually, the command:

persist /etc/sysconfig/network-scripts/ifcfg-bond0

It worked until reboot because vdsm removes it at reboot if the configuration is not persisted.
hosted-engine --deploy take care only of persisting the bridge which is created automatically by the tool itself.

Bill, let's discuss the vlan over bonded in bug #1134346 here it seems that the issue is just a missing persistence of the bond0 configuration.

Comment 5 Bill Carlson 2015-01-26 13:15:04 UTC
Bonded interface was manual configuration, before rhevm installation on host.

What package supplies persist?

Comment 6 Robert McSwain 2015-01-26 19:34:22 UTC
Sandro,

The hosts in this instance are all RHEL 6.5/6.6, so there isn't any persisting needing to be done for this customer. 

Regards,
Robert McSwain

Comment 7 Dan Kenigsberg 2015-02-02 16:22:41 UTC
Could you attach supervdsm.log of the attempted startup?

This seems like a dup of bug 1154399, and https://bugzilla.redhat.com/show_bug.cgi?id=1154399#c3 can serve as a workaround. Does it help?

However, the issue of removal of pre-vdsm ifcfg file seems to be more harmful than first perceived; we're tracking that in bug 1188251.

Comment 8 Eyal Edri 2015-02-25 08:42:11 UTC
3.5.1 is already full with bugs (over 80), and since none of these bugs were added as urgent for 3.5.1 release in the tracker bug, moving to 3.5.2

Comment 9 Dan Kenigsberg 2015-03-18 12:44:32 UTC
Please re-open when requested information is available.

Comment 10 Red Hat Bugzilla 2023-09-14 02:53:48 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days