Bug 1922094 - Upgrading host with reboot after upgrade option failes
Summary: Upgrading host with reboot after upgrade option failes
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Infra
Version: 4.4.4.7
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ovirt-4.4.5
: ---
Assignee: Dana
QA Contact: Petr Matyáš
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-29 08:53 UTC by Sandro Bonazzola
Modified: 2021-03-18 15:14 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-03-18 15:14:32 UTC
oVirt Team: Infra
Embargoed:
pm-rhel: ovirt-4.4+
pm-rhel: blocker?
gdeolive: testing_ack+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 113227 0 master MERGED engine: add heartbeat interval to SSHClient 2021-02-15 20:48:25 UTC

Description Sandro Bonazzola 2021-01-29 08:53:54 UTC
Description of problem:
- Engine upgraded to 4.4.4.7 along with all updates to CentOS 8.3
- Engine reports cluster level 4.4 and updates available to the host which is running CentOS 8.2 with latest 4.4.3
- Moved host to maintenance and started upgrade process
- Upgrade fails


Version-Release number of selected component (if applicable):
ovirt 4.4.4


Additional info: (host name has been replaced with '***********************'

Within engine logs:
2021-01-29 08:06:59,143Z INFO  [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] No interaction with host '***********************' for 20000 ms.
2021-01-29 08:07:01,643Z ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connection timeout for host '***********************', last response arrived 22501 ms ago.

and later:
2021-01-29 08:41:46,123Z ERROR [org.ovirt.engine.core.bll.SshHostRebootCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-1) [7145d526] SSH reboot command failed on host '***********************': SSH session timeout host 'root@***********************'
Stdout: 
Stderr: 
2021-01-29 08:41:46,185Z ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedExecutorService-commandCoordinator-Thread-1) [7145d526] EVENT_ID: SYSTEM_FAILED_SSH_HOST_RESTART(198), A restart using SSH initiated by the engine to Host node1 has failed.
2021-01-29 08:41:46,195Z INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-1) [7145d526] START, SetVdsStatusVDSCommand(HostName = node1, SetVdsStatusVDSCommandParameters:{hostId='25133933-f7c5-49bc-be67-49fd32bfbd27', status='InstallFailed', nonOperationalReason='NONE', stopSpmFailureLogged='false', maintenanceReason='null'}), log id: 1b5daf78
2021-01-29 08:41:46,200Z INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-1) [7145d526] FINISH, SetVdsStatusVDSCommand, return: , log id: 1b5daf78
2021-01-29 08:41:46,200Z ERROR [org.ovirt.engine.core.bll.hostdeploy.UpgradeHostInternalCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-1) [7145d526] Engine failed to restart via ssh host 'node1' ('25133933-f7c5-49bc-be67-49fd32bfbd27') after upgrade
2021-01-29 08:41:46,217Z ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedExecutorService-commandCoordinator-Thread-1) [2725cfaf-5397-4a40-9b36-74ba6d18a085] EVENT_ID: HOST_UPGRADE_FAILED(841), Failed to upgrade Host node1 (User: admin@internal-authz).
2021-01-29 08:41:53,874Z ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-50) [2725cfaf-5397-4a40-9b36-74ba6d18a085] EVENT_ID: HOST_UPGRADE_FAILED(841), Failed to upgrade Host node1 (User: admin@internal-authz).

Comment 1 RHEL Program Management 2021-01-29 08:54:02 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 3 Martin Perina 2021-01-29 09:20:11 UTC
Reducing severity as this happening only on some systems and we don't have clear reproducer, just a few ideas which could prevent this issue

Comment 4 RHEL Program Management 2021-01-29 09:20:20 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 5 Petr Matyáš 2021-02-16 09:34:28 UTC
IMO this should be marked as duplicate of bug 1917809

Otherwise this is also FailedQA with literally the same steps as in bug 1917809#c1

Comment 6 Martin Perina 2021-02-16 13:45:57 UTC
Does it fail on any host or only specific hosts? Because I haven't been able to reproduce on any my servers?

Comment 7 Petr Matyáš 2021-02-16 13:48:17 UTC
This fails consistently on my upgraded engine with any host I have in there. (Only running the SSH restart action is enough for this to reproduce)

Comment 10 Sandro Bonazzola 2021-03-18 15:14:32 UTC
This bugzilla is included in oVirt 4.4.5 release, published on March 18th 2021.

Since the problem described in this bug report should be resolved in oVirt 4.4.5 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.