Bug 1483935

Summary: [RHSC] Cluster import fails - Unable to create ovirt-mgmt interface on rhgs nodes, .bak file exists!
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Sweta Anandpara <sanandpa>
Component: vdsmAssignee: Sahina Bose <sabose>
Status: CLOSED WONTFIX QA Contact: Sweta Anandpara <sanandpa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: rhgs-3.3CC: rhs-bugs, rhsc-qe-bugs, storage-qa-internal
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-10-24 06:21:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sweta Anandpara 2017-08-22 10:12:28 UTC
Description of problem:
========================
Had a fresh newly installed 4node RHGS cluster (layered install on RHEL 7.4), and it was imported into Console management node. 
The installation failed in network-setup, and the hosts failed to come up. A traceback was seen in vdsm and supervdsm logs.

A .bak file did exist in /etc/sysconfig/network-scripts/. Removal of that file and reinstall of the nodes did work - the nodes became operational. However, there was no mention of the existence of .bak file anywhere in the traceback. 

The reason behind the existence of .bak file is unknown as of now. I would not expect it to be there as it was a freshly installed setup. The knowledge of this bug https://bugzilla.redhat.com/show_bug.cgi?id=1441530 prompted us to remove the .bak file and try it again, which worked.


Version-Release number of selected component (if applicable):
=========================================================
glusterfs-3.8.4-41 and vdsm-4.17.33-1.2.el7rhgs.noarch


How reproducible:
===============
Hit it once


Steps to Reproduce:
===================
1. Have a .bak file present on RHGS node, and import that into Console. 



Additional info:
================

Traceback in vdsm.log:

Traceback (most recent call last):
  File "/usr/share/vdsm/API.py", line 1650, in _rollback
    yield rollbackCtx
  File "/usr/share/vdsm/API.py", line 1502, in setupNetworks
    supervdsm.getProxy().setupNetworks(networks, bondings, options)
  File "/usr/share/vdsm/supervdsm.py", line 50, in __call__
    return callMethod()
  File "/usr/share/vdsm/supervdsm.py", line 48, in <lambda>
    **kwargs)
  File "<string>", line 2, in setupNetworks
  File "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in _callmethod
    raise convert_to_error(kind, result)
ConfigNetworkError: (10, 'connectivity check failed')


Traceback in supervdsm.log:

Traceback (most recent call last):
  File "/usr/share/vdsm/supervdsmServer", line 118, in wrapper
    res = func(*args, **kwargs)
  File "/usr/share/vdsm/supervdsmServer", line 243, in setupNetworks
    return setupNetworks(networks, bondings, **options)
  File "/usr/share/vdsm/network/api.py", line 943, in setupNetworks
    options, logger)
  File "/usr/share/vdsm/network/api.py", line 800, in _check_connectivity
    'connectivity check failed')
ConfigNetworkError: (10, 'connectivity check failed')

Ovirt engine logs:

2017-08-22 15:35:17,790 ERROR [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (DefaultQuartzScheduler_Worker-84) [29d34180] Host dhcp37-94.lab.eng.blr.redhat.com is set to Non-Operational, it is missing the following networks: ovirtmgmt
2017-08-22 15:35:17,865 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-84) [29d34180] Correlation ID: 29d34180, Job ID: 75863589-bc76-413e-8957-88257e8e718e, Call Stack: null, Custom Event ID: -1, Message: Host dhcp37-94.lab.eng.blr.redhat.com does not comply with the cluster rhgs33_rh7_4nodeNew networks, the following networks are missing on host: 'ovirtmgmt'
2017-08-22 15:35:17,984 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (DefaultQuartzScheduler_Worker-84) [29d34180] START, GlusterServersListVDSCommand(HostName = dhcp37-94.lab.eng.blr.redhat.com, HostId = 3407832b-5093-4227-83e3-9726f7b4ed31), log id: 651da351
2017-08-22 15:35:18,307 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (DefaultQuartzScheduler_Worker-84) [29d34180] FINISH, GlusterServersListVDSCommand, return: [10.70.37.94/23:CONNECTED, dhcp37-78.lab.eng.blr.redhat.com:CONNECTED, dhcp37-86.lab.eng.blr.redhat.com:CONNECTED, dhcp37-98.lab.eng.blr.redhat.com:CONNECTED], log id: 651da351
2017-08-22 15:35:18,328 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-84) [29d34180] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Status of host dhcp37-94.lab.eng.blr.redhat.com was set to NonOperational.
2017-08-22 15:35:19,467 INFO  [org.ovirt.engine.core.bll.HandleVdsVersionCommand] (DefaultQuartzScheduler_Worker-84) [4c65801b] Running command: HandleVdsVersionCommand internal: true. Entities affected :  ID: 3407832b-5093-4227-83e3-9726f7b4ed31 Type: VDS
2017-08-22 15:35:19,472 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-84) [4c65801b] Host 3407832b-5093-4227-83e3-9726f7b4ed31 : dhcp37-94.lab.eng.blr.redhat.com is already in NonOperational status for reason NETWORK_UNREACHABLE. SetNonOperationalVds command is skipped.

Comment 2 Sweta Anandpara 2017-08-22 10:42:49 UTC
Sosreports and vdsm and ovirtengine logs are copied @ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/<bugnumber>/


[qe@rhsqe-repo 1483935]$ pwd
/home/repo/sosreports/1483935
[qe@rhsqe-repo 1483935]$ hostname
rhsqe-repo.lab.eng.blr.redhat.com
[qe@rhsqe-repo 1483935]$ 
[qe@rhsqe-repo 1483935]$ ll
total 57476
drwxr-xr-x. 7 qe qe     4096 Aug 22 16:08 ovirt-engine
-rwxr-xr-x. 1 qe qe 14664612 Aug 22 16:07 sosreport-dhcp37-78.lab.eng.blr.redhat.com-20170822154252.tar.xz
-rwxr-xr-x. 1 qe qe 14843032 Aug 22 16:07 sosreport-dhcp37-86.lab.eng.blr.redhat.com-20170822154257.tar.xz
-rwxr-xr-x. 1 qe qe 14641280 Aug 22 16:07 sosreport-dhcp37-94.lab.eng.blr.redhat.com-20170822154336.tar.xz
-rwxr-xr-x. 1 qe qe 14677568 Aug 22 16:07 sosreport-dhcp37-98.lab.eng.blr.redhat.com-20170822154341.tar.xz
drwxr-xr-x. 3 qe qe     4096 Aug 22 16:09 vdsm_dhcp37_78
drwxr-xr-x. 3 qe qe     4096 Aug 22 16:09 vdsm_dhcp37_86
drwxr-xr-x. 3 qe qe     4096 Aug 22 16:09 vdsm_dhcp37_94
drwxr-xr-x. 3 qe qe     4096 Aug 22 16:09 vdsm_dhcp37_98
[qe@rhsqe-repo 1483935]$

Comment 6 Sahina Bose 2018-10-24 06:21:43 UTC
Closing as there's no further enhancements planned on RHGS-C