Created attachment 1748710 [details] engine log Description of problem: I'm using the new feature to reboot a host after reinstall, even though the reboot is initiated, host ends up in install failed and reports the ssh reboot cmd as failed. This issue already appeared from time to time with regular upgrade but only occasionally and only on some specific envs (maybe HW related?). Version-Release number of selected component (if applicable): ovirt-engine-4.4.5-0.11.el8ev.noarch How reproducible: on specific envs 100% Steps to Reproduce: 1. reboot after upgrade/reinstall/install 2. 3. Actual results: host ends up in install failed Expected results: host rebooted successfully Additional info: 2021-01-19 14:12:58,926+02 ERROR [org.ovirt.engine.core.bll.SshHostRebootCommand] (EE-ManagedThreadFactory-engine-Thread-2172) [455be915-6f4b-4731-bdf5-a3098d1a38d1] SSH reboot command failed on host 'aqua-vds2': SSH session timeout host 'root@aqua-vds2' Stdout: Stderr: 2021-01-19 14:12:58,949+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-2172) [455be915-6f4b-4731-bdf5-a3098d1a38d1] EVENT_ID: SYSTEM_FAILED_SSH_HOST_RESTART(198), A restart using SSH initiated by the engine to Host host_mixed_1 has failed. 2021-01-19 14:12:58,955+02 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (EE-ManagedThreadFactory-engine-Thread-2172) [455be915-6f4b-4731-bdf5-a3098d1a38d1] START, SetVdsStatusVDSCommand(HostName = host_mixed_1, SetVdsStatusVDSCommandParameters:{hostId='9922639d-dcb6-4f06-ba82-5e52ea984502', status='InstallFailed', nonOperationalReason='NONE', stopSpmFailureLogged='false', maintenanceReason='null'}), log id: 407bb397 2021-01-19 14:12:58,958+02 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (EE-ManagedThreadFactory-engine-Thread-2172) [455be915-6f4b-4731-bdf5-a3098d1a38d1] FINISH, SetVdsStatusVDSCommand, return: , log id: 407bb397 2021-01-19 14:12:58,958+02 ERROR [org.ovirt.engine.core.bll.hostdeploy.InstallVdsInternalCommand] (EE-ManagedThreadFactory-engine-Thread-2172) [455be915-6f4b-4731-bdf5-a3098d1a38d1] Engine failed to restart via ssh host 'host_mixed_1' ('9922639d-dcb6-4f06-ba82-5e52ea984502') after host install 2021-01-19 14:12:58,962+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-2172) [455be915-6f4b-4731-bdf5-a3098d1a38d1] EVENT_ID: VDS_INSTALL_FAILED(505), Host host_mixed_1 installation failed. Please refer to /var/log/ovirt-engine/engine.log and log logs under /var/log/ovirt-engine/host-deploy/ for further details..
Using ovirt-engine-4.4.5.5-0.13.el8ev.noarch Also there seems to be a brand new WARN that looks related, although I might have just missed it last time. 2021-02-12 14:03:14,736+01 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedExecutorService-commandCoordinator-Thread-1) [a214d28b-c63f-429a-9522-7d7d0ac4ef1e] EVENT_ID: ANSIBLE_RUNNER_EVENT_NOTIFICATION(559), Update of host dell-r210ii-14. Remove temporary yum configuration file. 2021-02-12 14:03:14,747+01 WARN [org.ovirt.engine.core.dal.job.ExecutionMessageDirector] (EE-ManagedExecutorService-commandCoordinator-Thread-1) [1402e89] The message key 'SshHostReboot' is missing from 'bundles/ExecutionMessages' 2021-02-12 14:03:14,777+01 INFO [org.ovirt.engine.core.bll.SshHostRebootCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-1) [1402e89] Running command: SshHostRebootCommand internal: true. Entities affected : ID: fc1136af-cf95-44b0-a011-e33eb35505cc Type: VDSAction group MANIPULATE_HOST with role type ADMIN 2021-02-12 14:03:14,784+01 INFO [org.ovirt.engine.core.bll.SshHostRebootCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-1) [1402e89] Opening SSH reboot session on host dell-r210ii-14.dn 2021-02-12 14:03:15,259+01 ERROR [org.ovirt.engine.core.bll.SshHostRebootCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-1) [1402e89] SSH reboot command failed on host 'dell-r210ii-14.dn': SSH session closed during connection 'root' Stdout: Stderr: 2021-02-12 14:03:15,278+01 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedExecutorService-commandCoordinator-Thread-1) [1402e89] EVENT_ID: SYSTEM_FAILED_SSH_HOST_RESTART(198), A restart using SSH initiated by the engine to Host dell-r210ii-14 has failed. 2021-02-12 14:03:15,289+01 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-1) [1402e89] START, SetVdsStatusVDSCommand(HostName = dell-r210ii-14, SetVdsStatusVDSCommandParameters:{hostId='fc1136af-cf95-44b0-a011-e33eb35505cc', status='InstallFailed', nonOperationalReason='NONE', stopSpmFailureLogged='false', maintenanceReason='null'}), log id: 434b6a83 2021-02-12 14:03:15,296+01 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-1) [1402e89] FINISH, SetVdsStatusVDSCommand, return: , log id: 434b6a83 2021-02-12 14:03:15,296+01 ERROR [org.ovirt.engine.core.bll.hostdeploy.UpgradeHostInternalCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-1) [1402e89] Engine failed to restart via ssh host 'dell-r210ii-14' ('fc1136af-cf95-44b0-a011-e33eb35505cc') after upgrade 2021-02-12 14:03:15,320+01 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedExecutorService-commandCoordinator-Thread-1) [a214d28b-c63f-429a-9522-7d7d0ac4ef1e] EVENT_ID: HOST_UPGRADE_FAILED(841), Failed to upgrade Host dell-r210ii-14 (User: admin@internal-authz).
Forgot to mention the host was not rebooted at all.
So something strange is going on with your host, is there anything in journalctl, audit.log or other system logs on the host which could explain why engine cannot connect to your host?
As I was investigating this on Petr's env. I found that I couldn't even reinstall the host using password (it immediately failed on ssh-copy-id). After removing the host from the env. and installing from fresh reinstall and upgrade completed successfully with reboot
Verified on ovirt-engine-4.4.5.7-0.1.el8ev.noarch If this bug still appears for you, just remove the host and add it back.
This bugzilla is included in oVirt 4.4.5 release, published on March 18th 2021. Since the problem described in this bug report should be resolved in oVirt 4.4.5 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.