Bug 1184497

Summary: SetupNetworks> Can't change network state from dhcp to static ip
Product: [oVirt] vdsm Reporter: Michael Burman <mburman>
Component: GeneralAssignee: Ondřej Svoboda <osvoboda>
Status: CLOSED CURRENTRELEASE QA Contact: Michael Burman <mburman>
Severity: high Docs Contact:
Priority: high    
Version: ---CC: bazulay, bugs, danken, ecohen, eedri, elevi, gklein, lsurette, mburman, mgoldboi, ogofen, osvoboda, rbalakri, yeylon, ylavi
Target Milestone: ovirt-3.6.0-rc3Keywords: Regression
Target Release: ---Flags: rule-engine: ovirt-3.6.0+
rule-engine: blocker+
ylavi: Triaged+
ylavi: planning_ack+
rule-engine: devel_ack+
rule-engine: testing_ack+
Hardware: x86_64   
OS: Linux   
Whiteboard: network
Fixed In Version: vdsm-4.17.0-632.git19a83a2 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1279824 (view as bug list) Environment:
Last Closed: 2015-11-04 12:58:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Network RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1112861, 1242235, 1279824    
Attachments:
Description Flags
logs
none
screenshot
none
vdsCaps_bootproto_update_failed none

Description Michael Burman 2015-01-21 14:45:14 UTC
Created attachment 982352 [details]
logs

Description of problem:
SetupNetworks> Can't change 'ovirtmgmt' from dhcp to static ip.
When trying to change 'ovirtmgmt' from dhcp to static ip via SN, i get in the event log:
Network changes were saved on host orchid-vds1.qa.lab.tlv.redhat.com

But, when checking -
cat /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt
BOOTPROTO=dhcp

And in SN ovirtmgmt stays with dhcp, even if the event log shows that network changes were saved on host.

Version-Release number of selected component (if applicable):
3.6.0-0.0.master.20150105124153.git669ddc1.el6
vdsm-4.17.0-304.gitf191666.el7.x86_64

How reproducible:
100

Steps to Reproduce:
1. Hosts>SN> edit 'ovirtmgmt'(with pencil)
2. Change the Boot Protocol from dhcp to static ip
3. Approve operation

Actual results:
'ovirtmgmt' stays with dhcp BootProto, in SN and in ifcfg-ovirtmgmt

Expected results:
Should succeed to change 'ovirtmgmt' from dhcp to static ip

Comment 1 Michael Burman 2015-04-14 15:07:48 UTC
Verified on - 3.6.0-0.0.master.20150412172306.git55ba764.el6.noarch
and vdsm-4.17.0-633.git7ad88bc.el7.x86_64

Comment 2 Michael Burman 2015-07-08 14:20:52 UTC
I see this behavior again on 3.6.0-0.0.master.20150627185750.git6f063c1.el6 and
vdsm-4.17.0-1054.git562e711.el7.noarch

If trying to change network's bootproto from DHCP to static ip via Setup Networks,(not only ovirtmgmt network) it looks like network configuration has been saved successfully, but ifcfg report that bootproto changed to none with static ip, netmask and GW:

cat /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt 
# Generated by VDSM version 4.17.0-1054.git562e711.el7
DEVICE=ovirtmgmt
TYPE=Bridge
DELAY=0
STP=off
ONBOOT=yes
IPADDR=10.35.128.8
NETMASK=255.255.255.0
GATEWAY=10.35.128.254
BOOTPROTO=none
MTU=1500
DEFROUTE=yes
NM_CONTROLLED=no
IPV6INIT=no
HOTPLUG=no

But Setup Network report that the boot proto for this network is still DHCP.

vdsCaps report:
'ovirtmgmt': {'addr': '10.35.128.8',
                                  'bridged': True,
                                  'cfg': {'BOOTPROTO': 'none',
                                          'DEFROUTE': 'yes',
                                          'DELAY': '0',
                                          'DEVICE': 'ovirtmgmt',
                                          'GATEWAY': '10.35.128.254',
                                          'HOTPLUG': 'no',
                                          'IPADDR': '10.35.128.8',
                                          'IPV6INIT': 'no',
                                          'MTU': '1500',
                                          'NETMASK': '255.255.255.0',
                                          'NM_CONTROLLED': 'no',
                                          'ONBOOT': 'yes',
                                          'STP': 'off',
                                          'TYPE': 'Bridge'},
                                  'dhcpv4': False,
                                  'dhcpv6': False,
                                  'gateway': '10.35.128.254',
                                  'iface': 'ovirtmgmt',
                                  'ipv4addrs': ['10.35.128.8/24'],
                                  'ipv6addrs': ['fe80::21d:9ff:fe68:71c1/64'],
                                  'ipv6gateway': '::',
                                  'mtu': '1500',
                                  'netmask': '255.255.255.0',
                                  'ports': ['eno1'],
                                  'stp': 'off'},

This a bit different from original report, cause now the ifcfg-'network' has been updated to static, but SN still report dhcp, and only manual configuration will bring ifcfg-'network' back to dhcp, Setup Networks command doesn't take any effect here.

So, gonna reopen as regression. Thanks.

Comment 3 Michael Burman 2015-07-08 14:22:10 UTC
Created attachment 1049888 [details]
screenshot

Comment 4 Ido Barkan 2015-07-12 07:06:06 UTC
can you attach ovirtmgmt ifcfg file? I want to find out if this is a race.

Comment 5 Ido Barkan 2015-07-12 07:32:38 UTC
nm. I have found in the logs:

MainProcess|Thread-20::DEBUG::2015-01-19 15:05:35,251::ifcfg::538::root::(writeConfFile) Writing to file /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt configuration:
# Generated by VDSM version 4.16.8.1-5.el7ev
DEVICE=ovirtmgmt
TYPE=Bridge
DELAY=0
STP=off
ONBOOT=yes
BOOTPROTO=dhcp
MTU=1500
DEFROUTE=yes
NM_CONTROLLED=no
HOTPLUG=no

So this is indeed a race that might be resolved providing either blockingdhcp=True by the engine or make it the default behaviour of vdsm.

Dan, so you see any reason why not making it the default behaviour in vdsm? If you do, this might be supplied by the engine when it wants.

Comment 6 Ido Barkan 2015-07-12 07:54:04 UTC
this was my mistake. vdsm is fine here as it reports the proper caps. this is probably an engine bug.

Comment 7 Dan Kenigsberg 2015-07-13 12:36:49 UTC
(In reply to Ido Barkan from comment #5)
> Dan, so you see any reason why not making it the default behaviour in vdsm?
> If you do, this might be supplied by the engine when it wants.

The historic reason is that dhcp server can stall for long minutes, and Engine needs a quicker response (or else it would have suffered a timeout and attempted a revert).

Comment 8 Ori Gofen 2015-07-30 14:45:15 UTC
This bug reproduces on every network configured on host, thus absolute the ability to create an iscsi multipathing bonds, further more, if attempting the creation of such bond, when the network revert back to dhcp, storage becomes inaccessible (because of wrong masking)

Comment 9 Eliraz Levi 2015-08-16 12:47:19 UTC
i believe it is a vdsm bug.

engine is sending host setup network vds command with the following arguments (taken from engine log):
2015-08-16 14:59:19,559 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HostSetupNetworksVDSCommand] (default task-30) [799333a2] START, HostSetupNetworksVDSCommand(HostName = centos7-1, HostSetupNetworksVdsCommandParameters:{runAsync='true', hostId='d836d879-d9e9-4073-aa7b-b74804f0e133', vds='Host[centos7-1,d836d879-d9e9-4073-aa7b-b74804f0e133]', rollbackOnFailure='true', conectivityTimeout='120', hostNetworkQosSupported='true', networks='[HostNetwork:{defaultRoute='false', bonding='false', networkName='NET0', nicName='eth1', vlan='null', mtu='0', vmNetwork='true', stp='false', properties='[]', bootProtocol='STATIC_IP', address='192.168.0.1', netmask='255.255.0.0', gateway=''}]', removedNetworks='[]', bonds='[]', removedBonds='[]'}), log id: 6d595db7

when finished, engine calls getCaps.
the output of vdsGetCaps is attached.
vdsm is reporting  'dhcpv4': True , resulting engine to think bootproto is DHCP.

Comment 10 Eliraz Levi 2015-08-16 12:47:48 UTC
Created attachment 1063519 [details]
vdsCaps_bootproto_update_failed

Comment 11 Ondřej Svoboda 2015-08-18 09:19:15 UTC
I have to clarify here a bit.

While dhclient probably left behind an active DHCP lease (in /var/lib/dhclient/) that VDSM uses to decide the 'dhcpv4' property for devices (here it reports True for the _bridge_ NET0), there is special handling (workaround) for networks:

"If DHCP is not configured now, do not report it even though you found an active lease".

You can see that for the _network_ NET0 'dhcpv4' is False because of this. This behaviour was adopted not to confuse the engine (and the user).

We assumed that the engine used information from the network (and not the bridge) for display in "Edit Management Network". Is this assumption wrong?

Comment 12 Dan Kenigsberg 2015-08-25 13:42:42 UTC
At the moment, Vdsm reports the new static IP address for the bridge, but claims that the bridge has an active dhcpv4 lease. This is misleading, as the reported IP address was not provided by a DHCP server.

I'm afraid that Vdsm should report dhcpv4==True only if there is a running dhclient process bound to the interface (AND has a valid lease).

Can you tweak the vdsm-side logic?

Comment 13 Dan Kenigsberg 2015-09-16 11:45:16 UTC
*** Bug 1192778 has been marked as a duplicate of this bug. ***

Comment 14 Eyal Edri 2015-10-10 14:26:13 UTC
this fix is already in 3.6.0-15, moving to ON_QA.

Comment 15 Michael Burman 2015-10-11 07:54:33 UTC
This bug still exist :

'ovirtmgmt': {'addr': '10.35.128.8',
                                  'bridged': True,
                                  'cfg': {'BOOTPROTO': 'none',
                                          'DEFROUTE': 'yes',
                                          'DELAY': '0',
                                          'DEVICE': 'ovirtmgmt',
                                          'GATEWAY': '10.35.128.254',
                                          'HOTPLUG': 'no',
                                          'IPADDR': '10.35.128.8',
                                          'IPV6INIT': 'no',
                                          'MTU': '1500',
                                          'NETMASK': '255.255.255.0',
                                          'NM_CONTROLLED': 'no',
                                          'ONBOOT': 'yes',
                                          'STP': 'off',
                                          'TYPE': 'Bridge'},
                                  'dhcpv4': True,

Network configured with static ip, but dhcpv4 = true and engine still reports the bootproto as dhcp.

Comment 16 Michael Burman 2015-10-11 14:08:04 UTC
On 3.6 if changing manually the bootproto, vdsm will not recognize the change(not reading from cfg).
It means this bug can be verified.

Changing network state from dhcp to static ip and vise versa via setup networks working now as expected. vdsCaps and ifcfg-* files reporting as expected.

Tested and verified on - 3.6.0-0.18.el6 with vdsm-4.17.8-1.el7ev.noarch

Comment 17 Ondřej Svoboda 2015-10-11 14:18:59 UTC
Thank you, Michael!

The logic we adapted in the end doesn't watch dhclient instances as that would be probably too intrusive for a backport. Instead, the currently active ("running") network configuration is queried and reported also for network devices that represent a given network (most often, bridges).

There is some refactoring work to be merged later, but the behaviour will stay the same.

Inspection of running dhclient instances' cmdlines were actually the first idea I had more than year and half ago. For 4.0 it will be reconsidered.

Comment 18 Sandro Bonazzola 2015-11-04 12:58:59 UTC
oVirt 3.6.0 has been released on November 4th, 2015 and should fix this issue.
If problems still persist, please open a new BZ and reference this one.