Bug 1128140 - vdsm changes ONBOOT=yes -> ONBOOT=no on management interface and underlying device
Summary: vdsm changes ONBOOT=yes -> ONBOOT=no on management interface and underlying d...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: oVirt
Classification: Retired
Component: vdsm
Version: 3.5
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: 3.5.0
Assignee: Antoni Segura Puimedon
QA Contact: Martin Pavlik
URL:
Whiteboard: network
: 1129385 (view as bug list)
Depends On:
Blocks: 1076944
TreeView+ depends on / blocked
 
Reported: 2014-08-08 11:52 UTC by Martin Pavlik
Modified: 2016-02-10 19:36 UTC (History)
14 users (show)

Fixed In Version: 4.16.4
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-10-17 12:20:49 UTC
oVirt Team: Network
Embargoed:


Attachments (Terms of Use)
logs (14.21 MB, application/x-xz)
2014-08-08 11:52 UTC, Martin Pavlik
no flags Details
logs_self_hosted_engine (168.13 KB, application/x-compressed-tar)
2014-08-26 07:27 UTC, Martin Pavlik
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 29312 0 ovirt-3.5 MERGED ifcfg: make default route network be started by sysV Never
oVirt gerrit 32703 0 master MERGED unified_persistence: only consider bonds created/touched by vdsm Never
oVirt gerrit 32769 0 ovirt-3.5 MERGED unified_persistence: only consider bonds created/touched by vdsm Never

Description Martin Pavlik 2014-08-08 11:52:05 UTC
Created attachment 925160 [details]
logs

Description of problem:

adding host into engine changes host ifcfg file from ONBOOT="yes" to ONBOOT="no" which results on loss of connectivity after host reboot or service network restart

before

[root@localhost ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0 
DEVICE="eth0"
BOOTPROTO="dhcp"
HWADDR="00:1A:4A:51:B2:06"
IPV6INIT="no"
MTU="1500"
NM_CONTROLLED="yes"
ONBOOT="yes"
TYPE="Ethernet"
UUID="c3453e86-3362-403f-be72-90bb3e4462c5"


after

[root@localhost ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0 
# Generated by VDSM version 4.16.1-6.gita4a4614.el6
DEVICE=eth0
ONBOOT=no
HWADDR=00:1a:4a:51:b2:06
BRIDGE=ovirtmgmt
MTU=1500
NM_CONTROLLED=no

[root@localhost ~]# cat /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt 
# Generated by VDSM version 4.16.1-6.gita4a4614.el6
DEVICE=ovirtmgmt
ONBOOT=no
TYPE=Bridge
DELAY=0
STP=off
BOOTPROTO=dhcp
MTU=1500
DEFROUTE=yes
NM_CONTROLLED=no
HOTPLUG=no




Version-Release number of selected component (if applicable):
vdsm-4.16.1-6.gita4a4614.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1. install new host
2. add host to engine via engine GUI


Actual results:
ONBOOT=no

Expected results:
it should be always ONBOOT=yes for mgmt brige (if used) and its underlying device/s

I can hardly imagine situation where user would want mgmt interface to stay down

Additional info:

MainProcess|Thread-13::INFO::2014-08-08 12:47:37,470::api::299::root::(addNetwork) Adding network ovirtmgmt with vlan=None, bonding=None, nics=['eth0'], bondingOptions=None, mtu=1500, bridged=True, defaultRoute=True,options={'bootproto': 'dhcp', 'STP': 'no', 'implicitBonding': True}
MainProcess|Thread-13::DEBUG::2014-08-08 12:47:37,471::ifcfg::541::root::(writeConfFile) Writing to file /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt configuration:
# Generated by VDSM version 4.16.1-6.gita4a4614.el6
DEVICE=ovirtmgmt
ONBOOT=no
TYPE=Bridge
DELAY=0
STP=off
BOOTPROTO=dhcp
MTU=1500
DEFROUTE=yes
NM_CONTROLLED=no
HOTPLUG=no

MainProcess|Thread-13::DEBUG::2014-08-08 12:47:37,575::utils::738::root::(execCmd) /sbin/ifdown ovirtmgmt (cwd None)
MainProcess|Thread-13::DEBUG::2014-08-08 12:47:37,837::utils::758::root::(execCmd) SUCCESS: <err> = ''; <rc> = 0
MainProcess|Thread-13::DEBUG::2014-08-08 12:47:37,838::utils::738::root::(execCmd) /sbin/ip route show to 0.0.0.0/0 table all (cwd None)
MainProcess|Thread-13::DEBUG::2014-08-08 12:47:37,841::utils::758::root::(execCmd) SUCCESS: <err> = ''; <rc> = 0
MainProcess|Thread-13::DEBUG::2014-08-08 12:47:37,852::ifcfg::374::root::(_atomicBackup) Backed up /etc/sysconfig/network-scripts/ifcfg-eth0
MainProcess|Thread-13::DEBUG::2014-08-08 12:47:37,852::ifcfg::541::root::(writeConfFile) Writing to file /etc/sysconfig/network-scripts/ifcfg-eth0 configuration:
# Generated by VDSM version 4.16.1-6.gita4a4614.el6
DEVICE=eth0
ONBOOT=no
HWADDR=00:1a:4a:51:b2:06
BRIDGE=ovirtmgmt
MTU=1500
NM_CONTROLLED=no

Comment 1 Dan Kenigsberg 2014-08-12 16:25:49 UTC
*** Bug 1129385 has been marked as a duplicate of this bug. ***

Comment 2 Martin Pavlik 2014-08-25 07:56:36 UTC
verified

after fresh host was added to engine

[root@localhost ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0
# Generated by VDSM version 4.16.2-1.gite8cba75.el6
DEVICE=eth0
HWADDR=00:1a:4a:51:b2:09
BRIDGE=ovirtmgmt
ONBOOT=yes
MTU=1500
NM_CONTROLLED=no

Comment 3 Martin Pavlik 2014-08-25 07:57:03 UTC
[root@localhost ~]# cat /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt 
# Generated by VDSM version 4.16.2-1.gite8cba75.el6
DEVICE=ovirtmgmt
TYPE=Bridge
DELAY=0
STP=off
ONBOOT=yes
BOOTPROTO=dhcp
MTU=1500
DEFROUTE=yes
NM_CONTROLLED=no
HOTPLUG=no

Comment 4 Martin Pavlik 2014-08-26 07:26:19 UTC
it seems there is a problem if combined with self hosted engine and bond

host freshly installed over em1 with following ifcfg

[root@dell-r210ii-06 ~]# cat /etc/sysconfig/network-scripts/ifcfg-em1
DEVICE="em1"
BOOTPROTO="dhcp"
IPV6INIT="no"
ONBOOT="yes"

# ifcfg manually edited as follows

[root@dell-r210ii-06 ~]# cat /etc/sysconfig/network-scripts/ifcfg-em1
DEVICE="em1"
BOOTPROTO="none"
IPV6INIT="no"
ONBOOT="yes"


cat > /etc/sysconfig/network-scripts/ifcfg-bond0 << EOF
DEVICE=bond0
ONBOOT=yes
BONDING_OPTS='mode=active-backup miimon=150'
NM_CONTROLLED=no
IPADDR=10.34.67.40
NETMASK=255.255.255.224
GATEWAY=10.34.67.62
EOF


cat > /etc/sysconfig/network-scripts/ifcfg-p1p1<< EOF
DEVICE=p1p1
ONBOOT=yes
MASTER=bond0
SLAVE=yes
NM_CONTROLLED=no
EOF


cat > /etc/sysconfig/network-scripts/ifcfg-p1p2<< EOF
DEVICE=p1p2
ONBOOT=yes
MASTER=bond0
SLAVE=yes
NM_CONTROLLED=no
EOF


[root@dell-r210ii-06 ~]# service network restart

[root@dell-r210ii-06 ~]# ip r
10.34.67.32/27 dev bond0  proto kernel  scope link  src 10.34.67.40 
169.254.0.0/16 dev em1  scope link  metric 1004 
169.254.0.0/16 dev bond0  scope link  metric 1006 
default via 10.34.67.62 dev bond0 

[root@dell-r210ii-06 ~]# yum install ovirt-hosted-engine-setup -y

[root@dell-r210ii-06 ~]# hosted-engine --deploy

#somehow it ends up with onboot = no

[root@dell-r210ii-06 ~]# cat /etc/sysconfig/network-scripts/ifcfg-p1p1
# Generated by VDSM version 4.16.2-1.gite8cba75.el6
DEVICE=p1p1
HWADDR=90:e2:ba:04:28:c0
MASTER=bond0
SLAVE=yes
ONBOOT=no
NM_CONTROLLED=no

[root@dell-r210ii-06 ~]# cat /etc/sysconfig/network-scripts/ifcfg-p1p2
# Generated by VDSM version 4.16.2-1.gite8cba75.el6
DEVICE=p1p2
HWADDR=90:e2:ba:04:28:c1
MASTER=bond0
SLAVE=yes
ONBOOT=no
NM_CONTROLLED=no

[root@dell-r210ii-06 ~]# cat /etc/sysconfig/network-scripts/ifcfg-bond0
# Generated by VDSM version 4.16.2-1.gite8cba75.el6
DEVICE=bond0
BONDING_OPTS='mode=active-backup miimon=150'
ONBOOT=no
NM_CONTROLLED=no
HOTPLUG=no

Comment 5 Martin Pavlik 2014-08-26 07:27:03 UTC
Created attachment 930759 [details]
logs_self_hosted_engine

Comment 6 Sandro Bonazzola 2014-08-26 07:36:27 UTC
Antoni, can you check comment #4 ?

Comment 7 Marian Krcmarik 2014-09-05 14:12:57 UTC
I can reproduce the problem when installing a new RHEL6 host:

preinstallation status:
- Two ethernet devices (em1 and em2) bonded in LACP mode (bond0), statically configured in network scripts (Network manager disabled), bond configuration:
DEVICE=bond0
BONDING_OPTS='mode=4 miimon=100'
ONBOOT=yes
IPADDR=10.34.73.80
NETMASK=255.255.255.192
GATEWAY=10.34.73.126
BOOTPROTO=none
MTU=1500
DEFROUTE=yes
NM_CONTROLLED=no

The connection to the host is lost at the point of vdsm deamon start, The status after installation failed is following:
- The network scripts persisted the same except for that item "ONBOOT=yes" changed to ONBOOT=no on ethernet (em1 and em2) devices and on bond0 device, moreover static IP configuration (IPADDR, NETMASK, GATEWAY) was deleted from the bond0 network script.

Then I edited manually network scripts to ONBOOT=yes, stopped vdsm and reinstalled hosts which was successful.

I am attaching whole vdsm and supervdsm host, be aware that I created more logical networks (vm_dynamic, display and display_wanem) after successful host installation.

Comment 10 Martin Pavlik 2014-09-05 14:37:10 UTC
@Toni

the problematic part seems to be here, maybe I overlooked something, but form quick look it does not seem to generate ifcfg-rhevm

MainThread::DEBUG::2014-09-05 14:13:22,875::api::625::root::(setupNetworks) Validating configuration
MainThread::DEBUG::2014-09-05 14:13:22,877::api::637::setupNetworks::(setupNetworks) Applying...
MainThread::DEBUG::2014-09-05 14:13:22,877::netconfpersistence::134::root::(_getConfigs) Non-existing config set.
MainThread::DEBUG::2014-09-05 14:13:22,877::utils::738::root::(execCmd) /sbin/ip route show to 0.0.0.0/0 table all (cwd None)
MainThread::DEBUG::2014-09-05 14:13:22,879::utils::758::root::(execCmd) SUCCESS: <err> = ''; <rc> = 0
MainThread::DEBUG::2014-09-05 14:13:22,883::api::531::_handleBondings::(_handleBondings) Editing bond Bond(bond0: [Nic(em1), Nic(em2)]) with options mode=4 miimon=100
MainThread::DEBUG::2014-09-05 14:13:22,884::ifcfg::541::root::(writeConfFile) Writing to file /etc/sysconfig/network-scripts/ifcfg-bond0 configuration:
# Generated by VDSM version 4.16.3-2.el6
DEVICE=bond0
BONDING_OPTS='mode=4 miimon=100'
ONBOOT=no
NM_CONTROLLED=no
HOTPLUG=no

Comment 11 Antoni Segura Puimedon 2014-09-05 17:39:21 UTC
what is happening is that when vdsm first starts, the init common script is running restore nets and, for some reason, the system in c#8 and c#9 shows that is has a definition for the bond0 in /var/run/vdsm/netconf/bonds/bond0

Now. The only thing we know that writes to it is the setupNetworks flow. So we have to find out what ran setupNetworks for the bond before the vdsm ever ran (since this setupNetworks does not appear in the attached supervdsm.log nor in vdsm.log).

Comment 12 Antoni Segura Puimedon 2014-09-05 17:41:20 UTC
@mkrcmari: Do you think you could try to reproduce again and just before installing vdsm and before starting it for the first time, check the contents of /var/run/vdsm/netconf/{nets,bonds} ?

Comment 15 Martin Pavlik 2014-10-07 14:44:40 UTC
works in vt5

before
[root@dell-r210ii-05 ~]# cat /etc/sysconfig/network-scripts/ifcfg-em1
DEVICE="em1"
BOOTPROTO="dhcp"
HWADDR="D0:67:E5:F0:7F:02"
IPV6INIT="no"
MTU="1500"
NM_CONTROLLED="yes"
ONBOOT="yes"
TYPE="Ethernet"
UUID="6ffbe414-3922-4482-b793-3766058dcedb"


after
[root@dell-r210ii-05 ~]# cat /etc/sysconfig/network-scripts/ifcfg-em1
# Generated by VDSM version 4.16.6-1.el6ev
DEVICE=em1
HWADDR=d0:67:e5:f0:7f:02
BRIDGE=rhevm
ONBOOT=yes
MTU=1500
NM_CONTROLLED=no

Comment 16 Sandro Bonazzola 2014-10-17 12:20:49 UTC
oVirt 3.5 has been released and should include the fix for this issue.

Comment 17 Nicolas Ecarnot 2015-09-16 13:03:27 UTC
Still witnessed on 3.5.1.1-1.el6 and CentOS 6.6 hosts and vdsm-4.16.10-8.

May someone tell us at which precise version it is fixed?

Comment 18 Sandro Bonazzola 2015-09-18 06:57:04 UTC
(In reply to Nicolas Ecarnot from comment #17)
> Still witnessed on 3.5.1.1-1.el6 and CentOS 6.6 hosts and vdsm-4.16.10-8.
> 
> May someone tell us at which precise version it is fixed?

I see Fixed In Version: 4.16.4.

Can you please check using engine 3.5.4.2 / vdsm 4.16.26 and open a new bug against version 3.5.4 if there?

Comment 19 Nicolas Ecarnot 2015-09-20 19:01:20 UTC
(In reply to Sandro Bonazzola from comment #18)
> (In reply to Nicolas Ecarnot from comment #17)
> > Still witnessed on 3.5.1.1-1.el6 and CentOS 6.6 hosts and vdsm-4.16.10-8.
> > 
> > May someone tell us at which precise version it is fixed?
> 
> I see Fixed In Version: 4.16.4.
> 
> Can you please check using engine 3.5.4.2 / vdsm 4.16.26 and open a new bug
> against version 3.5.4 if there?

Sandro,

We indeed plan to upgrade to 3.5.something in the close future, but these are sensitive production systems, so I can't promise this will be done in the next week.

I'll send an answer there when done.

Thank you.


Note You need to log in before you can comment on or make changes to this bug.