Description of problem: ======================== Had a fresh newly installed 4node RHGS cluster (layered install on RHEL 7.4), and it was imported into Console management node. The installation failed in network-setup, and the hosts failed to come up. A traceback was seen in vdsm and supervdsm logs. A .bak file did exist in /etc/sysconfig/network-scripts/. Removal of that file and reinstall of the nodes did work - the nodes became operational. However, there was no mention of the existence of .bak file anywhere in the traceback. The reason behind the existence of .bak file is unknown as of now. I would not expect it to be there as it was a freshly installed setup. The knowledge of this bug https://bugzilla.redhat.com/show_bug.cgi?id=1441530 prompted us to remove the .bak file and try it again, which worked. Version-Release number of selected component (if applicable): ========================================================= glusterfs-3.8.4-41 and vdsm-4.17.33-1.2.el7rhgs.noarch How reproducible: =============== Hit it once Steps to Reproduce: =================== 1. Have a .bak file present on RHGS node, and import that into Console. Additional info: ================ Traceback in vdsm.log: Traceback (most recent call last): File "/usr/share/vdsm/API.py", line 1650, in _rollback yield rollbackCtx File "/usr/share/vdsm/API.py", line 1502, in setupNetworks supervdsm.getProxy().setupNetworks(networks, bondings, options) File "/usr/share/vdsm/supervdsm.py", line 50, in __call__ return callMethod() File "/usr/share/vdsm/supervdsm.py", line 48, in <lambda> **kwargs) File "<string>", line 2, in setupNetworks File "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in _callmethod raise convert_to_error(kind, result) ConfigNetworkError: (10, 'connectivity check failed') Traceback in supervdsm.log: Traceback (most recent call last): File "/usr/share/vdsm/supervdsmServer", line 118, in wrapper res = func(*args, **kwargs) File "/usr/share/vdsm/supervdsmServer", line 243, in setupNetworks return setupNetworks(networks, bondings, **options) File "/usr/share/vdsm/network/api.py", line 943, in setupNetworks options, logger) File "/usr/share/vdsm/network/api.py", line 800, in _check_connectivity 'connectivity check failed') ConfigNetworkError: (10, 'connectivity check failed') Ovirt engine logs: 2017-08-22 15:35:17,790 ERROR [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (DefaultQuartzScheduler_Worker-84) [29d34180] Host dhcp37-94.lab.eng.blr.redhat.com is set to Non-Operational, it is missing the following networks: ovirtmgmt 2017-08-22 15:35:17,865 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-84) [29d34180] Correlation ID: 29d34180, Job ID: 75863589-bc76-413e-8957-88257e8e718e, Call Stack: null, Custom Event ID: -1, Message: Host dhcp37-94.lab.eng.blr.redhat.com does not comply with the cluster rhgs33_rh7_4nodeNew networks, the following networks are missing on host: 'ovirtmgmt' 2017-08-22 15:35:17,984 INFO [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (DefaultQuartzScheduler_Worker-84) [29d34180] START, GlusterServersListVDSCommand(HostName = dhcp37-94.lab.eng.blr.redhat.com, HostId = 3407832b-5093-4227-83e3-9726f7b4ed31), log id: 651da351 2017-08-22 15:35:18,307 INFO [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (DefaultQuartzScheduler_Worker-84) [29d34180] FINISH, GlusterServersListVDSCommand, return: [10.70.37.94/23:CONNECTED, dhcp37-78.lab.eng.blr.redhat.com:CONNECTED, dhcp37-86.lab.eng.blr.redhat.com:CONNECTED, dhcp37-98.lab.eng.blr.redhat.com:CONNECTED], log id: 651da351 2017-08-22 15:35:18,328 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-84) [29d34180] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Status of host dhcp37-94.lab.eng.blr.redhat.com was set to NonOperational. 2017-08-22 15:35:19,467 INFO [org.ovirt.engine.core.bll.HandleVdsVersionCommand] (DefaultQuartzScheduler_Worker-84) [4c65801b] Running command: HandleVdsVersionCommand internal: true. Entities affected : ID: 3407832b-5093-4227-83e3-9726f7b4ed31 Type: VDS 2017-08-22 15:35:19,472 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-84) [4c65801b] Host 3407832b-5093-4227-83e3-9726f7b4ed31 : dhcp37-94.lab.eng.blr.redhat.com is already in NonOperational status for reason NETWORK_UNREACHABLE. SetNonOperationalVds command is skipped.
Sosreports and vdsm and ovirtengine logs are copied @ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/<bugnumber>/ [qe@rhsqe-repo 1483935]$ pwd /home/repo/sosreports/1483935 [qe@rhsqe-repo 1483935]$ hostname rhsqe-repo.lab.eng.blr.redhat.com [qe@rhsqe-repo 1483935]$ [qe@rhsqe-repo 1483935]$ ll total 57476 drwxr-xr-x. 7 qe qe 4096 Aug 22 16:08 ovirt-engine -rwxr-xr-x. 1 qe qe 14664612 Aug 22 16:07 sosreport-dhcp37-78.lab.eng.blr.redhat.com-20170822154252.tar.xz -rwxr-xr-x. 1 qe qe 14843032 Aug 22 16:07 sosreport-dhcp37-86.lab.eng.blr.redhat.com-20170822154257.tar.xz -rwxr-xr-x. 1 qe qe 14641280 Aug 22 16:07 sosreport-dhcp37-94.lab.eng.blr.redhat.com-20170822154336.tar.xz -rwxr-xr-x. 1 qe qe 14677568 Aug 22 16:07 sosreport-dhcp37-98.lab.eng.blr.redhat.com-20170822154341.tar.xz drwxr-xr-x. 3 qe qe 4096 Aug 22 16:09 vdsm_dhcp37_78 drwxr-xr-x. 3 qe qe 4096 Aug 22 16:09 vdsm_dhcp37_86 drwxr-xr-x. 3 qe qe 4096 Aug 22 16:09 vdsm_dhcp37_94 drwxr-xr-x. 3 qe qe 4096 Aug 22 16:09 vdsm_dhcp37_98 [qe@rhsqe-repo 1483935]$
Closing as there's no further enhancements planned on RHGS-C