Description of problem: ----------------------- When upgrade of RHVH host is initiated from RHV Manager UI, the host is first moved in to maintenance, redhat-virtualization-host-image-update is updated, then the host is rebooted. As part of this upgrade/update procedure, in case, if moving the host in to maintenance fails, there are no proper events or messages notified to the user Version-Release number of selected component (if applicable): ------------------------------------------------------------- RHHI-V 1.5 ( RHV 4.2.7 & RHGS 3.4.1 ) How reproducible: ----------------- Always Steps to Reproduce: ------------------- 1. Stop one of the brick of the volume in host1 2. Try to upgrade host2 from RHV Manager UI Actual results: --------------- Upgrade initiated from RHV Manager UI silently fails without throwing any errors or events Expected results: ----------------- Upgrade should fail with meaningful error or event, so that user will be aware the reason behind the failure. Additional info: ---------------- While moving the host in to maintenance, by stopping gluster service, there are proper error messages that were thrown. The same should be implemented for upgrade/update procedure initiated from RHV Manager UI
Since the verification failed reassigning the bug. Steps executed: ============== 1.Killed a brick(data) on Host1 2.Clicked Host2 --> Installation --> Upgrade After executing the above, the upgrade failed without any errors nor any pop up alert's. Would expect some error like quorum would be lost if the upgrade succeeds or something like that. It just failed with the generic error like: 2019-02-26 15:18:05,696+05 ERROR [org.ovirt.engine.core.bll.hostdeploy.HostUpgradeCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-65) [d4185969-4c15-435b-b9d6-220e84def4c8] Host 'rhsqa-grafton8-nic2.lab.eng.blr.redhat.com' failed to move to maintenance mode. Upgrade process is terminated. 2019-02-26 15:18:05,719+05 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-65) [d4185969-4c15-435b-b9d6-220e84def4c8] EVENT_ID: VDS_MAINTENANCE_FAILED(17), Failed to switch Host rhsqa-grafton8-nic2.lab.eng.blr.redhat.com to Maintenance mode. 2019-02-26 15:18:06,780+05 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-79) [d4185969-4c15-435b-b9d6-220e84def4c8] EVENT_ID: HOST_UPGRADE_FAILED(841), Failed to upgrade Host rhsqa-grafton8-nic2.lab.eng.blr.redhat.com (User: admin@internal-authz).
Created attachment 1538724 [details] UI_Screenshot
The original bug was about no error message being returned on upgrade failure. the only event was Upgrade started, and no indication of failure. I think there's an event indicating failure to move to maintenance now? What you're asking is for specific error message on why moving to maintenance failed?
Sahina, So as mentioned in the steps executed, we can clearly see that the quorum will be lost if we upgrade Host2 (since already brick in Host 1 is down). But in UI when we try upgrading i don't see any warning while upgrading like say "The quorum will be lost" so something like that sort, and also it fails without any specific error event through which user can assume. Let me know if you think otherwise
(In reply to bipin from comment #7) > Sahina, > > So as mentioned in the steps executed, we can clearly see that the quorum > will be lost if we upgrade Host2 (since already brick in Host 1 is down). > But in UI when we try upgrading i don't see any warning while upgrading > like say "The quorum will be lost" so something like that sort, and also it > fails without any specific error event through which user can assume. Let me > know if you think otherwise Currently there's an error message provide when a host cannot be moved to maintenance, the specific error message regarding heal etc is thrown as part of validations and not logged in audit log. Providing specific error message will required the commands to be changed, so cannot be targeted for 1.6
(In reply to Sahina Bose from comment #8) > (In reply to bipin from comment #7) > > Sahina, > > > > So as mentioned in the steps executed, we can clearly see that the quorum > > will be lost if we upgrade Host2 (since already brick in Host 1 is down). > > But in UI when we try upgrading i don't see any warning while upgrading > > like say "The quorum will be lost" so something like that sort, and also it > > fails without any specific error event through which user can assume. Let me > > know if you think otherwise > > Currently there's an error message provide when a host cannot be moved to > maintenance, the specific error message regarding heal etc is thrown as part > of validations and not logged in audit log. Providing specific error message > will required the commands to be changed, so cannot be targeted for 1.6 Yes, that makes sense. But the user should be informed about the fact that,if during upgrade/update of RHVH hosts, if the host doesn't move in to upgrade phase, then one of problem could be that if that host moves in to maintenance, the cluster quorum is lost or self-heal is in progress. User have to make sure all the bricks in the volume are shown up in RHV Manager UI and there are no pending entries in the brick to heal. With these fact, I am marking this bug for the known_issue and can be deferred out of RHHI-V 1.6 scope
I've modified text. please check
Pasting the output from base bug : Tested with ovirt-engine-4.3.4.3-0.1.el7.noarch and the fix works , so moving the bug to verified state. Steps: ===== 1.Deploy RHHI-V 2.Bring a brick down from one of the host say host1 3.Now try to upgrade host2 and it fails Logs: ==== 2019-06-07 12:38:29,371+05 INFO [org.ovirt.engine.core.bll.hostdeploy.UpgradeHostCommand] (default task-126) [682b85e9-8120-4f7e-bca8-4e80a0eb7843] Running command: UpgradeHostCommand internal: false. Entities affected : ID: d5cb1684-e96d-49ab-a095-0234f4c1a017 Type: VDSAction group EDIT_HOST_CONFIGURATION with role type ADMIN 2019-06-07 12:38:29,395+05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-126) [] EVENT_ID: HOST_UPGRADE_STARTED(840), Host rhsqa-grafton8-nic2.lab.eng.blr.redhat.com upgrade was started (User: admin@internal-authz). 2019-06-07 12:38:29,460+05 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-commandCoordinator-Thread-6) [682b85e9-8120-4f7e-bca8-4e80a0eb7843] EVENT_ID: GENERIC_ERROR_MESSAGE(14,001), Cannot switch the following Host(s) to Maintenance mode: rhsqa-grafton8-nic2.lab.eng.blr.redhat.com. Gluster quorum will be lost for the following Volumes: vmstore. 2019-06-07 12:38:29,460+05 WARN [org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-6) [682b85e9-8120-4f7e-bca8-4e80a0eb7843] Validation of action 'MaintenanceNumberOfVdss' failed for user admin@internal-authz. Reasons: VAR__TYPE__HOST,VAR__ACTION__MAINTENANCE,VDS_CANNOT_MAINTENANCE_GLUSTER_QUORUM_CANNOT_BE_MET,$VolumesList vmstore,$HostsList rhsqa-grafton8-nic2.lab.eng.blr.redhat.com 2019-06-07 12:38:29,948+05 ERROR [org.ovirt.engine.core.bll.hostdeploy.HostUpgradeCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-67) [682b85e9-8120-4f7e-bca8-4e80a0eb7843] Host 'rhsqa-grafton8-nic2.lab.eng.blr.redhat.com' failed to move to maintenance mode. Upgrade process is terminated. 2019-06-07 12:38:31,104+05 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-90) [682b85e9-8120-4f7e-bca8-4e80a0eb7843] EVENT_ID: HOST_UPGRADE_FAILED(841), Failed to upgrade Host rhsqa-grafton8-nic2.lab.eng.blr.redhat.com (User: admin@internal-authz). 2019-06-07 12:38:31,505+05 INFO [org.ovirt.engine.core.vdsbroker.gluster.GlusterTasksListVDSCommand] (DefaultQuartzScheduler2) [98d833e] START, GlusterTasksListVDSCommand(HostName = rhsqa-grafton8-nic2.lab.eng.blr.redhat.com, VdsIdVDSCommandParametersBase:{hostId='d5cb1684-e96d-49ab-a095-0234f4c1a017'}), log id: 316ed0f0 019-06-07 13:08:42,481+05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-132) [] EVENT_ID: HOST_UPGRADE_STARTED(840), Host rhsqa-grafton8-nic2.lab.eng.blr.redhat.com upgrade was started (User: admin@internal-authz). 2019-06-07 13:08:42,521+05 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-commandCoordinator-Thread-7) [e496c065-0c69-4f16-ba2c-3219fd5a8cc6] EVENT_ID: GENERIC_ERROR_MESSAGE(14,001), Cannot switch the following Host(s) to Maintenance mode: rhsqa-grafton8-nic2.lab.eng.blr.redhat.com. Gluster quorum will be lost for the following Volumes: data. 2019-06-07 13:08:42,521+05 WARN [org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-7) [e496c065-0c69-4f16-ba2c-3219fd5a8cc6] Validation of action 'MaintenanceNumberOfVdss' failed for user admin@internal-authz. Reasons: VAR__TYPE__HOST,VAR__ACTION__MAINTENANCE,VDS_CANNOT_MAINTENANCE_GLUSTER_QUORUM_CANNOT_BE_MET,$VolumesList data,$HostsList rhsqa-grafton8-nic2.lab.eng.blr.redhat.com 2019-06-07 13:08:43,428+05 ERROR [org.ovirt.engine.core.bll.hostdeploy.HostUpgradeCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-74) [e496c065-0c69-4f16-ba2c-3219fd5a8cc6] Host 'rhsqa-grafton8-nic2.lab.eng.blr.redhat.com' failed to move to maintenance mode. Upgrade process is terminated. 2019-06-07 13:08:43,435+05 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-74) [e496c065-0c69-4f16-ba2c-3219fd5a8cc6] EVENT_ID: VDS_MAINTENANCE_FAILED(17), Failed to switch Host rhsqa-grafton8-nic2.lab.eng.blr.redhat.com to Maintenance mode. 2019-06-07 13:08:44,451+05 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-52) [e496c065-0c69-4f16-ba2c-3219fd5a8cc6] EVENT_ID: HOST_UPGRADE_FAILED(841), Failed to upgrade Host rhsqa-grafton8-nic2.lab.eng.blr.redhat.com (User: admin@internal-authz).
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2963