Description of problem: The setup fails with: [ ERROR ] Failed to execute stage 'Environment setup': Failed to reconfigure libvirt for VDSM when trying ovirt-hosted-engine-setup on a host that had VDSM running. Version-Release number of selected component (if applicable): ovirt-hosted-engine-setup-1.2.0-0.0.master.20140319143545 with my git change to plugins/sanlock only. How reproducible: Always. Kill all existing VMs, stop VDSM and try to run ovirt-hosted-engine-setup. Actual results: [ ERROR ] Failed to execute stage 'Environment setup': Failed to reconfigure libvirt for VDSM Expected results: Setup understands that the environment is already prepared for VDSM and continues.
Can you please attach vdsm, libvirt and hosted-engine-setup logs? Thanks
Created attachment 876773 [details] ovirt-hosted-engine-setup log
2014-03-19 16:23:17 DEBUG otopi.plugins.ovirt_hosted_engine_setup.system.vdsmenv plugin.executeRaw:383 execute-result: ('/bin/vdsm-tool', 'configure', '--force'), rc=1 2014-03-19 16:23:17 DEBUG otopi.plugins.ovirt_hosted_engine_setup.system.vdsmenv plugin.execute:441 execute-output: ('/bin/vdsm-tool', 'configure', '--force') stdout: Checking configuration status... SUCCESS: ssl configured to true. No conflicts 2014-03-19 16:23:17 DEBUG otopi.plugins.ovirt_hosted_engine_setup.system.vdsmenv plugin.execute:446 execute-output: ('/bin/vdsm-tool', 'configure', '--force') stderr: Traceback (most recent call last): File "/bin/vdsm-tool", line 153, in <module> sys.exit(main()) File "/bin/vdsm-tool", line 150, in main return tool_command[cmd]["command"](*args[1:]) File "/usr/lib64/python2.7/site-packages/vdsm/tool/configurator.py", line 221, in configure service.service_stop(s) File "/usr/lib64/python2.7/site-packages/vdsm/tool/service.py", line 369, in service_stop return _runAlts(_srvStopAlts, srvName) File "/usr/lib64/python2.7/site-packages/vdsm/tool/service.py", line 350, in _runAlts "%s failed" % alt.func_name, out, err) vdsm.tool.service.ServiceOperationError: ServiceOperationError: _systemctlStop failed Job for sanlock.service canceled. looks like an issue between sanlock and vdsm-tool. danken, federico can you help figuring out what should be done for avoiding above condition? Martin can you attach sanlock logs?
It is intentionally impossible to restart sanlock when it is holding a lock. Does the issue persists if the host is moved to maintenance mode before killing vdsm?
(In reply to Dan Kenigsberg from comment #4) > It is intentionally impossible to restart sanlock when it is holding a lock. > Does the issue persists if the host is moved to maintenance mode before > killing vdsm? Since the user is installing hosted-engine here I assume there's no engine around able to move the host to maintenance. How this situation can be detected in order to abort or wait for the right moment in order to restart it?
hosted-engine should call spmStop and later disconnectStoragePool in order to shutdown its storage usage cleanly. If you cannot do this, but you are sure that nothing on this host uses the protected resource currently held by sanlock or would ever need it, you can follow the script in https://bugzilla.redhat.com/show_bug.cgi?id=1035847#c23 .
(In reply to Dan Kenigsberg from comment #6) > hosted-engine should call spmStop and later disconnectStoragePool in order > to shutdown its storage usage cleanly. > > If you cannot do this, but you are sure that nothing on this host uses the > protected resource currently held by sanlock or would ever need it, you can > follow the script in https://bugzilla.redhat.com/show_bug.cgi?id=1035847#c23 > . No storage pool can exist in that stage, if a storage pool is detected, hosted-engine will abort the setup. So sanlock is holding locks without a storage pool around.
How are you sure that no storage pool exists? The reproduction steps ("Kill all existing VMs, stop VDSM and try to run ovirt-hosted-engine-setup") suggest that there has been a storage pool, and that it was never torn down properly. Anyway, what is your `sanlock client status`? Let's be certain that locks are being held.
Uh.. I do not have the setup anymore. There might have been a storage pool from the previous installation. Is there an official way to clean up a host so when VDSM starts again it knows nothing and behaves as if it was a brand new host (except reinstalling the OS)?
(In reply to Dan Kenigsberg from comment #8) > How are you sure that no storage pool exists? The reproduction steps ("Kill > all existing VMs, stop VDSM and try to run ovirt-hosted-engine-setup") > suggest that there has been a storage pool, and that it was never torn down > properly. Mmm, now that I look better at the logs, the error occur at late_setup stage and the check on existing storage pools is done calling getConnectedStoragePoolsList in customization stage, so you may be right. > Anyway, what is your `sanlock client status`? Let's be certain that locks > are being held.
(In reply to Martin Sivák from comment #9) > Is there an official way to clean up a host so when VDSM starts again it > knows nothing and behaves as if it was a brand new host (except reinstalling > the OS)? I believe that killing all VMs, stopping spm and disconnecting from the pool is all that's needed - that's what happens when Engine moves a host to maintenance. In any case, reboot is as good as re-install.
This is an automated message: This bug has been re-targeted from 3.4.2 to 3.5.0 since neither priority nor severity were high or urgent. Please re-target to 3.4.3 if relevant.
I'm not able to reproduce anymore with ovirt-hosted-engine-setup-1.2.3-0.0.master.20150213134326.gitbd6a4ea.fc20.noarch vdsm-4.16.11-11.git39f1c15.fc20.x86_64 Moving to QA for further testing.
ok, vt14.1 ovirt-hosted-engine-setup-1.2.2-2.el7ev.noarch vdsm-4.16.12.1-3.el7ev.x86_64 host was part of rhevm35 as spm, maintenance, and then hosted-engine --deploy went ok.
ovirt 3.5.2 was GA'd. closing current release.