Description of problem: - Engine upgraded to 4.4.4.7 along with all updates to CentOS 8.3 - Engine reports cluster level 4.4 and updates available to the host which is running CentOS 8.2 with latest 4.4.3 - Moved host to maintenance and started upgrade process - Upgrade fails Version-Release number of selected component (if applicable): ovirt 4.4.4 Additional info: (host name has been replaced with '***********************' Within engine logs: 2021-01-29 08:06:59,143Z INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] No interaction with host '***********************' for 20000 ms. 2021-01-29 08:07:01,643Z ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connection timeout for host '***********************', last response arrived 22501 ms ago. and later: 2021-01-29 08:41:46,123Z ERROR [org.ovirt.engine.core.bll.SshHostRebootCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-1) [7145d526] SSH reboot command failed on host '***********************': SSH session timeout host 'root@***********************' Stdout: Stderr: 2021-01-29 08:41:46,185Z ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedExecutorService-commandCoordinator-Thread-1) [7145d526] EVENT_ID: SYSTEM_FAILED_SSH_HOST_RESTART(198), A restart using SSH initiated by the engine to Host node1 has failed. 2021-01-29 08:41:46,195Z INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-1) [7145d526] START, SetVdsStatusVDSCommand(HostName = node1, SetVdsStatusVDSCommandParameters:{hostId='25133933-f7c5-49bc-be67-49fd32bfbd27', status='InstallFailed', nonOperationalReason='NONE', stopSpmFailureLogged='false', maintenanceReason='null'}), log id: 1b5daf78 2021-01-29 08:41:46,200Z INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-1) [7145d526] FINISH, SetVdsStatusVDSCommand, return: , log id: 1b5daf78 2021-01-29 08:41:46,200Z ERROR [org.ovirt.engine.core.bll.hostdeploy.UpgradeHostInternalCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-1) [7145d526] Engine failed to restart via ssh host 'node1' ('25133933-f7c5-49bc-be67-49fd32bfbd27') after upgrade 2021-01-29 08:41:46,217Z ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedExecutorService-commandCoordinator-Thread-1) [2725cfaf-5397-4a40-9b36-74ba6d18a085] EVENT_ID: HOST_UPGRADE_FAILED(841), Failed to upgrade Host node1 (User: admin@internal-authz). 2021-01-29 08:41:53,874Z ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-50) [2725cfaf-5397-4a40-9b36-74ba6d18a085] EVENT_ID: HOST_UPGRADE_FAILED(841), Failed to upgrade Host node1 (User: admin@internal-authz).
This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.
Reducing severity as this happening only on some systems and we don't have clear reproducer, just a few ideas which could prevent this issue
IMO this should be marked as duplicate of bug 1917809 Otherwise this is also FailedQA with literally the same steps as in bug 1917809#c1
Does it fail on any host or only specific hosts? Because I haven't been able to reproduce on any my servers?
This fails consistently on my upgraded engine with any host I have in there. (Only running the SSH restart action is enough for this to reproduce)
https://bugzilla.redhat.com/show_bug.cgi?id=1917809#c6
This bugzilla is included in oVirt 4.4.5 release, published on March 18th 2021. Since the problem described in this bug report should be resolved in oVirt 4.4.5 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.