Bug 1350763 - Add host failed - failed to configure ovirtmgmt network on host since vdsm is still on recovery [NEEDINFO]
Summary: Add host failed - failed to configure ovirtmgmt network on host since vdsm is...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.6.7
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ovirt-3.6.8
: ---
Assignee: Dan Kenigsberg
QA Contact: Michael Burman
URL:
Whiteboard:
: 1329166 1350718 1351226 1352859 (view as bug list)
Depends On:
Blocks: Gluster-HC-2 1348103 1352452 1354596
TreeView+ depends on / blocked
 
Reported: 2016-06-28 10:20 UTC by Michael Burman
Modified: 2016-08-12 07:59 UTC (History)
24 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-07-27 14:18:35 UTC
oVirt Team: Network
Target Upstream Version:
Embargoed:
gklein: needinfo?


Attachments (Terms of Use)
Logs (497.86 KB, application/x-gzip)
2016-06-28 10:20 UTC, Michael Burman
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1352907 0 medium CLOSED Adding additional host to hosted engine rhevm which is installed with rhevm-appliance fails 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHBA-2016:1509 0 normal SHIPPED_LIVE vdsm 3.6.8 bug fix and enhancement update 2016-07-27 18:18:12 UTC
oVirt gerrit 59948 0 None MERGED jsonrpc: recovery error passed as response 2020-07-14 01:14:53 UTC
oVirt gerrit 60006 0 None MERGED jsonrpc: recovery error passed as response 2020-07-14 01:14:53 UTC
oVirt gerrit 60007 0 None MERGED deploy: Update number of detection retries 2020-07-14 01:14:53 UTC
oVirt gerrit 60431 0 ovirt-engine-3.6 MERGED deploy: Update number of detection retries 2020-07-14 01:14:53 UTC

Internal Links: 1352907

Description Michael Burman 2016-06-28 10:20:05 UTC
Created attachment 1173362 [details]
Logs

Description of problem:
Add host failed - failed to configure ovirtmgmt network on host.

Seems like it caused by the fix for the brain split bug.

Version-Release number of selected component (if applicable):
3.6.7.5-0.1.el6.noarch
vdsm-4.17.31-0.el7ev.noarch

Steps to Reproduce:
1. Add host to rhev-m 3.6.7

Actual results:
Add host failed

Additional info:
Seems to be related to the brain split bug fix.

Comment 1 Piotr Kliczewski 2016-06-28 10:32:02 UTC
This is duplicate of BZ #1348103

Comment 2 Michal Skrivanek 2016-06-28 16:21:14 UTC
Yes, though it's new in 3.6.7 with JSON-RPC

Comment 3 Pavol Brilla 2016-06-29 13:28:33 UTC
I was able to reproduce issue on RHEV-H & RHEL7 hosts as following:

1. Both hosts added to engine - no problem
2. Put hosts to maintanance
3. Re-provision hosts OS from same source ( PXE )
4. Edit hosts to re-fetch new fingerprint of server
5. Reinstall hosts - button in engine
6. Hosts failed to reinstall - 

Packages & versions:
rhevm-3.6.7-6 
vdsm-jsonrpc-java-1.1.12-1.el6ev.noarch

Hosts:
RHEV-H: 7.2-20160627.3.el7ev
RHEL7: vdsm 4.17.31-0.el7ev

Comment 5 Douglas Schilling Landgraf 2016-06-29 16:55:13 UTC
*** Bug 1350718 has been marked as a duplicate of this bug. ***

Comment 8 Oved Ourfali 2016-06-30 11:41:03 UTC
*** Bug 1351226 has been marked as a duplicate of this bug. ***

Comment 10 Simone Tiraboschi 2016-07-04 08:37:27 UTC
*** Bug 1329166 has been marked as a duplicate of this bug. ***

Comment 11 Simone Tiraboschi 2016-07-05 15:16:47 UTC
*** Bug 1352859 has been marked as a duplicate of this bug. ***

Comment 15 movciari 2016-07-14 10:54:53 UTC
failed with:
rhevm-3.6.8-0.1.el6.noarch
vdsm-4.17.33-1.el7ev.noarch

this time it failed adding new host

Comment 19 Oved Ourfali 2016-07-14 11:58:55 UTC
In this case I see in the vdsm.log:
jsonrpc.Executor/4::ERROR::2016-07-14 13:17:37,189::API::1652::vds::(_rollback) connectivity check failed
Traceback (most recent call last):
   File "/usr/share/vdsm/API.py", line 1650, in _rollback
     yield rollbackCtx
   File "/usr/share/vdsm/API.py", line 1502, in setupNetworks
     supervdsm.getProxy().setupNetworks(networks, bondings, options)
   File "/usr/share/vdsm/supervdsm.py", line 50, in __call__
     return callMethod()
   File "/usr/share/vdsm/supervdsm.py", line 48, in <lambda>
     **kwargs)
   File "<string>", line 2, in setupNetworks
   File "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in _callmethod
     raise convert_to_error(kind, result)
ConfigNetworkError: (10, 'connectivity check failed')

Edward - can you take a look?

Comment 21 Gil Klein 2016-07-14 12:25:46 UTC
Moving to assigned based on comment #19

Comment 22 Edward Haas 2016-07-14 12:57:54 UTC
In general, the error 'connectivity check failed' means that VDSM successfully applied the last setup request, however, it can no longer 'hear' the Engine so it fails and reverts back to the last known configuration.

From the vdsm.log:
'nics': {'eth0': {'addr': '10.34.60.4', 'ipv6gateway': 'fe80:52:0:223c::3fe', 'ipv6addrs': ['2620:52:0:223c:21a:4aff:fed0:3009/64', 'fe80::21a:4aff:fed0:3009/64'], 'mtu': '1500', 'dhcpv4': True, 'netmask': '255.255.252.0', 'dhcpv6': False, 'ipv4addrs': ['10.34.60.4/22'], 'cfg': {'PEERROUTES': 'yes', 'IPV6INIT': 'yes', 'NAME': 'eth0', 'IPADDR': '10.34.60.4', 'NETBOOT': 'yes', 'IPV6_PEERDNS': 'yes', 'DEFROUTE': 'yes', 'PEERDNS': 'yes', 'IPV4_FAILURE_FATAL': 'no', 'IPV6_AUTOCONF': 'yes', 'PREFIX': '22', 'BOOTPROTO': 'static', 'IPV6_DEFROUTE': 'yes', 'GATEWAY': '10.34.63.254', 'HWADDR': '00:1A:4A:D0:30:09', 'IPV6_FAILURE_FATAL': 'no', 'DNS1': '10.34.63.229', 'IPV6_PEERROUTES': 'yes', 'TYPE': 'Ethernet', 'ONBOOT': 'yes', 'UUID': 'a11ac764-5abb-4a6d-9892-5ce82b83e12e'}

From some reason BOOTPROTO is set with 'static' instead of 'none'.
And dhcpv4 is reported as True.
So we have a collision here.

How is the host eth0 nic configured before it is added? Is it DHCP or static?
If it was DHCP originally, we need to understand if dhclient request has been answered and if the correct address has been re-assign to it.
Please provide supervdsm.log to look into it further.

Comment 23 Oved Ourfali 2016-07-14 13:02:01 UTC
Regardless, I've verified it on Nelly's env, when vdsm-jsonrpc-java 1.1.12 is installed. So moving back to ON_QA, and if needed open another bug on network on the specific issue.

Comment 24 movciari 2016-07-14 14:05:52 UTC
nic eth0 was manually configured to static in ifcfg-eth0 before installing vdsm

'static' is completely valid bootproto and it can be used for readability... in fact, you either put 'dhcp' in ifcfg for dhcp, or anything else for static IP

Anyway, i tried it with bootproto 'none' and i'm still getting the same error (with vdsm-jsonrpc-java 1.1.12)

I'm not saying this is not an environment issue, but I can't verify this currently. I don't think creating a new bug for with the same title for the same version, with the same reproduction steps and the same error message is a good idea.

Comment 26 Oved Ourfali 2016-07-14 14:10:00 UTC
So I'm moving that to Network.
This is different than the original errors in the log, although the title is identical.

Comment 27 Edward Haas 2016-07-14 15:37:46 UTC
'static' may be valid for the ifcfg scripts end result, but not nessesery how VDSM interprets it. Documentation states what values it should get.
But I am not sure if this is the problem

The problem is that a static IP was set, but VDSM detects it as dynamic (dhcp).

Please collect this info before adding the host:
- 'ip addr'
- A caps report (vdsClient 0 -s getVdsCaps)

Comment 30 Dan Kenigsberg 2016-07-17 06:02:38 UTC
(In reply to movciari from comment #24)
> I don't think creating a new bug for with the same title for the
> same version, with the same reproduction steps and the same error message is
> a good idea.

Michal, if the underlying problem is different, and resolving team is different, it is better to modify the existing summary line to be more specific, and open an fresh bug.

Comment 31 Michael Burman 2016-07-18 10:39:04 UTC
Verified on - 3.6.8-0.1.el6 with vdsm-4.17.33-1.el7ev.noarch and 
vdsm-jsonrpc-java-1.1.12-1.el6ev.noarch. 

The verification done only for the origin report.

Comment 32 Simone Tiraboschi 2016-07-18 16:23:24 UTC
*** Bug 1357615 has been marked as a duplicate of this bug. ***

Comment 35 errata-xmlrpc 2016-07-27 14:18:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1509.html


Note You need to log in before you can comment on or make changes to this bug.