Bug 1350763

Summary: Add host failed - failed to configure ovirtmgmt network on host since vdsm is still on recovery
Product: Red Hat Enterprise Virtualization Manager Reporter: Michael Burman <mburman>
Component: vdsmAssignee: Dan Kenigsberg <danken>
Status: CLOSED ERRATA QA Contact: Michael Burman <mburman>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.6.7CC: bazulay, bugs, byarlaga, cshao, danken, edwardh, gklein, huzhao, leiwang, lsurette, mgoldboi, michal.skrivanek, mkalinin, movciari, mperina, nsednev, oourfali, pbrilla, pkliczew, srevivo, weiwang, yaniwang, ycui, ykaul
Target Milestone: ovirt-3.6.8Keywords: AutomationBlocker, Regression, ZStream
Target Release: ---Flags: gklein: needinfo?
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-07-27 14:18:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Network RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1277939, 1348103, 1352452, 1354596    
Attachments:
Description Flags
Logs none

Description Michael Burman 2016-06-28 10:20:05 UTC
Created attachment 1173362 [details]
Logs

Description of problem:
Add host failed - failed to configure ovirtmgmt network on host.

Seems like it caused by the fix for the brain split bug.

Version-Release number of selected component (if applicable):
3.6.7.5-0.1.el6.noarch
vdsm-4.17.31-0.el7ev.noarch

Steps to Reproduce:
1. Add host to rhev-m 3.6.7

Actual results:
Add host failed

Additional info:
Seems to be related to the brain split bug fix.

Comment 1 Piotr Kliczewski 2016-06-28 10:32:02 UTC
This is duplicate of BZ #1348103

Comment 2 Michal Skrivanek 2016-06-28 16:21:14 UTC
Yes, though it's new in 3.6.7 with JSON-RPC

Comment 3 Pavol Brilla 2016-06-29 13:28:33 UTC
I was able to reproduce issue on RHEV-H & RHEL7 hosts as following:

1. Both hosts added to engine - no problem
2. Put hosts to maintanance
3. Re-provision hosts OS from same source ( PXE )
4. Edit hosts to re-fetch new fingerprint of server
5. Reinstall hosts - button in engine
6. Hosts failed to reinstall - 

Packages & versions:
rhevm-3.6.7-6 
vdsm-jsonrpc-java-1.1.12-1.el6ev.noarch

Hosts:
RHEV-H: 7.2-20160627.3.el7ev
RHEL7: vdsm 4.17.31-0.el7ev

Comment 5 Douglas Schilling Landgraf 2016-06-29 16:55:13 UTC
*** Bug 1350718 has been marked as a duplicate of this bug. ***

Comment 8 Oved Ourfali 2016-06-30 11:41:03 UTC
*** Bug 1351226 has been marked as a duplicate of this bug. ***

Comment 10 Simone Tiraboschi 2016-07-04 08:37:27 UTC
*** Bug 1329166 has been marked as a duplicate of this bug. ***

Comment 11 Simone Tiraboschi 2016-07-05 15:16:47 UTC
*** Bug 1352859 has been marked as a duplicate of this bug. ***

Comment 15 movciari 2016-07-14 10:54:53 UTC
failed with:
rhevm-3.6.8-0.1.el6.noarch
vdsm-4.17.33-1.el7ev.noarch

this time it failed adding new host

Comment 19 Oved Ourfali 2016-07-14 11:58:55 UTC
In this case I see in the vdsm.log:
jsonrpc.Executor/4::ERROR::2016-07-14 13:17:37,189::API::1652::vds::(_rollback) connectivity check failed
Traceback (most recent call last):
   File "/usr/share/vdsm/API.py", line 1650, in _rollback
     yield rollbackCtx
   File "/usr/share/vdsm/API.py", line 1502, in setupNetworks
     supervdsm.getProxy().setupNetworks(networks, bondings, options)
   File "/usr/share/vdsm/supervdsm.py", line 50, in __call__
     return callMethod()
   File "/usr/share/vdsm/supervdsm.py", line 48, in <lambda>
     **kwargs)
   File "<string>", line 2, in setupNetworks
   File "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in _callmethod
     raise convert_to_error(kind, result)
ConfigNetworkError: (10, 'connectivity check failed')

Edward - can you take a look?

Comment 21 Gil Klein 2016-07-14 12:25:46 UTC
Moving to assigned based on comment #19

Comment 22 Edward Haas 2016-07-14 12:57:54 UTC
In general, the error 'connectivity check failed' means that VDSM successfully applied the last setup request, however, it can no longer 'hear' the Engine so it fails and reverts back to the last known configuration.

From the vdsm.log:
'nics': {'eth0': {'addr': '10.34.60.4', 'ipv6gateway': 'fe80:52:0:223c::3fe', 'ipv6addrs': ['2620:52:0:223c:21a:4aff:fed0:3009/64', 'fe80::21a:4aff:fed0:3009/64'], 'mtu': '1500', 'dhcpv4': True, 'netmask': '255.255.252.0', 'dhcpv6': False, 'ipv4addrs': ['10.34.60.4/22'], 'cfg': {'PEERROUTES': 'yes', 'IPV6INIT': 'yes', 'NAME': 'eth0', 'IPADDR': '10.34.60.4', 'NETBOOT': 'yes', 'IPV6_PEERDNS': 'yes', 'DEFROUTE': 'yes', 'PEERDNS': 'yes', 'IPV4_FAILURE_FATAL': 'no', 'IPV6_AUTOCONF': 'yes', 'PREFIX': '22', 'BOOTPROTO': 'static', 'IPV6_DEFROUTE': 'yes', 'GATEWAY': '10.34.63.254', 'HWADDR': '00:1A:4A:D0:30:09', 'IPV6_FAILURE_FATAL': 'no', 'DNS1': '10.34.63.229', 'IPV6_PEERROUTES': 'yes', 'TYPE': 'Ethernet', 'ONBOOT': 'yes', 'UUID': 'a11ac764-5abb-4a6d-9892-5ce82b83e12e'}

From some reason BOOTPROTO is set with 'static' instead of 'none'.
And dhcpv4 is reported as True.
So we have a collision here.

How is the host eth0 nic configured before it is added? Is it DHCP or static?
If it was DHCP originally, we need to understand if dhclient request has been answered and if the correct address has been re-assign to it.
Please provide supervdsm.log to look into it further.

Comment 23 Oved Ourfali 2016-07-14 13:02:01 UTC
Regardless, I've verified it on Nelly's env, when vdsm-jsonrpc-java 1.1.12 is installed. So moving back to ON_QA, and if needed open another bug on network on the specific issue.

Comment 24 movciari 2016-07-14 14:05:52 UTC
nic eth0 was manually configured to static in ifcfg-eth0 before installing vdsm

'static' is completely valid bootproto and it can be used for readability... in fact, you either put 'dhcp' in ifcfg for dhcp, or anything else for static IP

Anyway, i tried it with bootproto 'none' and i'm still getting the same error (with vdsm-jsonrpc-java 1.1.12)

I'm not saying this is not an environment issue, but I can't verify this currently. I don't think creating a new bug for with the same title for the same version, with the same reproduction steps and the same error message is a good idea.

Comment 26 Oved Ourfali 2016-07-14 14:10:00 UTC
So I'm moving that to Network.
This is different than the original errors in the log, although the title is identical.

Comment 27 Edward Haas 2016-07-14 15:37:46 UTC
'static' may be valid for the ifcfg scripts end result, but not nessesery how VDSM interprets it. Documentation states what values it should get.
But I am not sure if this is the problem

The problem is that a static IP was set, but VDSM detects it as dynamic (dhcp).

Please collect this info before adding the host:
- 'ip addr'
- A caps report (vdsClient 0 -s getVdsCaps)

Comment 30 Dan Kenigsberg 2016-07-17 06:02:38 UTC
(In reply to movciari from comment #24)
> I don't think creating a new bug for with the same title for the
> same version, with the same reproduction steps and the same error message is
> a good idea.

Michal, if the underlying problem is different, and resolving team is different, it is better to modify the existing summary line to be more specific, and open an fresh bug.

Comment 31 Michael Burman 2016-07-18 10:39:04 UTC
Verified on - 3.6.8-0.1.el6 with vdsm-4.17.33-1.el7ev.noarch and 
vdsm-jsonrpc-java-1.1.12-1.el6ev.noarch. 

The verification done only for the origin report.

Comment 32 Simone Tiraboschi 2016-07-18 16:23:24 UTC
*** Bug 1357615 has been marked as a duplicate of this bug. ***

Comment 35 errata-xmlrpc 2016-07-27 14:18:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1509.html