Hide Forgot
Created attachment 1409540 [details] sosreport from alma03 Description of problem: ovirt-hosted-engine-cleanup takes too much time, over 20 minutes and prints errors during execution. alma03 ~]# ovirt-hosted-engine-cleanup This will de-configure the host to run ovirt-hosted-engine-setup from scratch. Caution, this operation should be used with care. Are you sure you want to proceed? [y/n] y -=== Destroy hosted-engine VM ===- -=== Stop HA services ===- -=== Shutdown sanlock ===- shutdown force 1 wait 0 shutdown done 0 -=== Disconnecting the hosted-engine storage domain ===- ******************************************************************************** Stuck in this way for more than 20 minutes... MainThread::INFO::2018-03-18 18:27:43,227::states::413::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine vm was unexpectedly shut down MainThread::INFO::2018-03-18 18:27:45,336::hosted_engine::614::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor) Stopped VDSM domain monitor MainThread::INFO::2018-03-18 18:27:45,336::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down alma03 ~]# date Sun Mar 18 18:43:03 IST 2018 ******************************************************************************** Traceback (most recent call last): File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/disconnect_storage_server.py", line 27, in <module> ha_cli.disconnect_storage_server() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 294, in disconnect_storage_server sserver.disconnect_storage_server() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_server.py", line 325, in disconnect_storage_server connectionParams=conList, File "/usr/lib/python2.7/site-packages/vdsm/client.py", line 278, in _call raise TimeoutError(method, kwargs, timeout) vdsm.client.TimeoutError: Request StoragePool.disconnectStorageServer with args {'connectionParams': [{'port': '3260', 'connection': '10.35.146.129', 'iqn': 'iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c00', 'user': '', 'tpgt': '1', 'password': '', 'id': '9e177df8-91db-4b8b-81af-28d56d856dba'}], 'storagepoolID': '00000000-0000-0000-0000-000000000000', 'domainType': 3} timed out after 900 seconds -=== De-configure VDSM networks ===- -=== Stop other services ===- -=== De-configure external daemons ===- -=== Removing configuration files ===- ? /etc/init/libvirtd.conf already missing - removing /etc/libvirt/nwfilter/vdsm-no-mac-spoofing.xml - removing /etc/ovirt-hosted-engine/answers.conf - removing /etc/ovirt-hosted-engine/hosted-engine.conf - removing /etc/vdsm/vdsm.conf - removing /etc/pki/vdsm/certs/cacert.pem - removing /etc/pki/vdsm/certs/vdsmcert.pem - removing /etc/pki/vdsm/keys/vdsmkey.pem - removing /etc/pki/vdsm/libvirt-spice/ca-cert.pem - removing /etc/pki/vdsm/libvirt-spice/server-cert.pem - removing /etc/pki/vdsm/libvirt-spice/server-key.pem - removing /etc/pki/CA/cacert.pem - removing /etc/pki/libvirt/clientcert.pem - removing /etc/pki/libvirt/private/clientkey.pem ? /etc/pki/ovirt-vmconsole/*.pem already missing - removing /var/cache/libvirt/qemu ? /var/run/ovirt-hosted-engine-ha/* already missing You have new mail in /var/spool/mail/root [root@alma03 ~]# Version-Release number of selected component (if applicable): ovirt-hosted-engine-ha-2.2.7-1.el7ev.noarch ovirt-hosted-engine-setup-2.2.13-1.el7ev.noarch rhvm-appliance-4.2-20180202.0.el7.noarch Linux 3.10.0-861.el7.x86_64 #1 SMP Wed Mar 14 10:21:01 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.5 (Maipo) How reproducible: 100% Steps to Reproduce: 1.Deploy SHE Node 0 over iSCSI. 2.Execute "ovirt-hosted-engine-cleanup" on ha-host. Actual results: Undeployment takes too much time and prints errors. Expected results: Undeployment should finish without any exceptions and in less time. Additional info: Sosreport from host is attached.
It should timeout. It may or may not succeed unmounting or whatever it's trying to do to the storage. You can't expect it to always fully succeed cleaning up messy configuration. Do you know why it failed to disconnect the iSCSI connection?
(In reply to Yaniv Kaul from comment #1) > It should timeout. It may or may not succeed unmounting or whatever it's > trying to do to the storage. You can't expect it to always fully succeed > cleaning up messy configuration. > > Do you know why it failed to disconnect the iSCSI connection? I have no idea on why it behaved the way it did. It took much more time with iSCSI vs. NFS. In both scenarios (NFS&iSCSI) after waiting for cleanup to finish, I could normally redeploy. I did not stated that cleanup had failed, I've said that in iSCSI flow there were errors printed and it took a way longer time frame in order to finish.
Let's understand why iSCSI take so much time.
" timed out after 900 seconds " These are 15 minutes out of 20. The questions is what was done to the iscsi to cause it to stop responding?
Works for me on these components: ovirt-hosted-engine-ha-2.2.13-1.el7ev.noarch ovirt-hosted-engine-setup-2.2.22-1.el7ev.noarch rhvm-appliance-4.2-20180601.0.el7.noarch Red Hat Enterprise Linux Server release 7.5 (Maipo) Linux 3.10.0-862.3.2.el7.x86_64 #1 SMP Tue May 15 18:22:15 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux there was no delay during "ovirt-hosted-engine-cleanup" and deployment was cleaned: alma03 ~]# ovirt-hosted-engine-cleanup This will de-configure the host to run ovirt-hosted-engine-setup from scratch. Caution, this operation should be used with care. Are you sure you want to proceed? [y/n] y -=== Destroy hosted-engine VM ===- error: failed to get domain 'HostedEngine' error: Domain not found: no domain with matching name 'HostedEngine' -=== Stop HA services ===- -=== Shutdown sanlock ===- shutdown force 1 wait 0 shutdown done 0 -=== Disconnecting the hosted-engine storage domain ===- Traceback (most recent call last): File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/disconnect_storage_server.py", line 30, in <module> timeout=ohostedcons.Const.STORAGE_SERVER_TIMEOUT, File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 313, in disconnect_storage_server sserver.disconnect_storage_server(timeout=timeout) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_server.py", line 325, in disconnect_storage_server connectionParams=conList, File "/usr/lib/python2.7/site-packages/vdsm/client.py", line 278, in _call raise TimeoutError(method, kwargs, timeout) vdsm.client.TimeoutError: Request StoragePool.disconnectStorageServer with args {'connectionParams': [{'port': '3260', 'connection': '10.35.146.225', 'iqn': 'iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c05', 'user': '', 'tpgt': '1', 'password': '', 'id': 'ae2bde39-253a-486c-9479-9046a07a0c65'}], 'storagepoolID': '00000000-0000-0000-0000-000000000000', 'domainType': 3} timed out after 60 seconds -=== De-configure VDSM networks ===- -=== Stop other services ===- -=== De-configure external daemons ===- -=== Removing configuration files ===- ? /etc/init/libvirtd.conf already missing - removing /etc/libvirt/nwfilter/vdsm-no-mac-spoofing.xml - removing /etc/ovirt-hosted-engine/answers.conf - removing /etc/ovirt-hosted-engine/hosted-engine.conf - removing /etc/vdsm/vdsm.conf - removing /etc/pki/vdsm/certs/cacert.pem - removing /etc/pki/vdsm/certs/vdsmcert.pem - removing /etc/pki/vdsm/keys/vdsmkey.pem - removing /etc/pki/vdsm/libvirt-spice/ca-cert.pem - removing /etc/pki/vdsm/libvirt-spice/server-cert.pem - removing /etc/pki/vdsm/libvirt-spice/server-key.pem - removing /etc/pki/CA/cacert.pem - removing /etc/pki/libvirt/clientcert.pem - removing /etc/pki/libvirt/private/clientkey.pem ? /etc/pki/ovirt-vmconsole/*.pem already missing - removing /var/cache/libvirt/qemu ? /var/run/ovirt-hosted-engine-ha/* already missing You have new mail in /var/spool/mail/root [root@alma03 ~]# hosted-engine --vm-status You must run deploy first Moving to verified.
This bugzilla is included in oVirt 4.2.4 release, published on June 26th 2018. Since the problem described in this bug report should be resolved in oVirt 4.2.4 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.