Description of problem: ----------------------- When upgrade of RHVH host is initiated from RHV Manager UI, the host is first moved in to maintenance, redhat-virtualization-host-image-update is updated, then the host is rebooted. As part of this upgrade/update procedure, in case, if moving the host in to maintenance fails, there are no proper events or messages notified to the user Version-Release number of selected component (if applicable): ------------------------------------------------------------- RHHI-V 1.5 ( RHV 4.2.7 & RHGS 3.4.1 ) How reproducible: ----------------- Always Steps to Reproduce: ------------------- 1. Stop one of the brick of the volume in host1 2. Try to upgrade host2 from RHV Manager UI Actual results: --------------- Upgrade initiated from RHV Manager UI silently fails without throwing any errors or events Expected results: ----------------- Upgrade should fail with meaningful error or event, so that user will be aware the reason behind the failure. Additional info: ---------------- While moving the host in to maintenance, by stopping gluster service, there are proper error messages that were thrown. The same should be implemented for upgrade/update procedure initiated from RHV Manager UI
The issue was most probably fixed by BZ1631215, which was released as a part of RHV 4.2.8, so could you please retest with updated version?
Hi Martin, I tested in rhvh-4.3.0.5-0.20190305, still see the issue persisting. Pasting the comments from cloned bug: So as mentioned in the steps executed, we can clearly see that the quorum will be lost if we upgrade Host2 (since already brick in Host 1 is down). But in UI when we try upgrading i don't see any warning while upgrading like say "The quorum will be lost" so something like that sort, and also it fails without any specific error event through which user can assume.
(In reply to bipin from comment #2) > Hi Martin, > > I tested in rhvh-4.3.0.5-0.20190305, still see the issue persisting. > > Pasting the comments from cloned bug: > > So as mentioned in the steps executed, we can clearly see that the quorum > will be lost if we upgrade Host2 (since already brick in Host 1 is down). > But in UI when we try upgrading i don't see any warning while upgrading > like say "The quorum will be lost" so something like that sort, and also it > fails without any specific error event through which user can assume. We don't have such warnings and we even can't have. Those warnings/errors are part of moving host to maintenance, where quorum/healing status should be checked [1]. But moving host to maintenance is asynchronous operation which can be executed for a long time, so users should just start the upgrade and if moving host to maintenance fails, then they should be able to find reasons of failure in the Events (audit log). If above operation is possible (moving 2 host to maintenance which would make gluster storage read only), then there is a bug in gluster part of engine, which should not allow such operation. If you want to upgrade multiple hosts at once then I recommend to use ovirt.cluster-upgrade Ansible role [2], which performs upgrade of hosts serially (we are not going to move 2nd host to Maintenance untill the 1st one is already Up). [1] https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/MaintenanceNumberOfVdssCommand.java#L455 [2] https://github.com/ovirt/ovirt-ansible-cluster-upgrade
(In reply to Martin Perina from comment #3) > (In reply to bipin from comment #2) > > Hi Martin, > > > > I tested in rhvh-4.3.0.5-0.20190305, still see the issue persisting. > > > > Pasting the comments from cloned bug: > > > > So as mentioned in the steps executed, we can clearly see that the quorum > > will be lost if we upgrade Host2 (since already brick in Host 1 is down). > > But in UI when we try upgrading i don't see any warning while upgrading > > like say "The quorum will be lost" so something like that sort, and also it > > fails without any specific error event through which user can assume. > > We don't have such warnings and we even can't have. Those warnings/errors > are part of moving host to maintenance, where quorum/healing status should > be checked [1]. But moving host to maintenance is asynchronous operation > which can be executed for a long time, so users should just start the > upgrade and if moving host to maintenance fails, then they should be able to > find reasons of failure in the Events (audit log). If above operation is > possible (moving 2 host to maintenance which would make gluster storage read > only), then there is a bug in gluster part of engine, which should not allow > such operation. > > If you want to upgrade multiple hosts at once then I recommend to use > ovirt.cluster-upgrade Ansible role [2], which performs upgrade of hosts > serially (we are not going to move 2nd host to Maintenance untill the 1st > one is already Up). > > [1] > https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/ > bll/src/main/java/org/ovirt/engine/core/bll/MaintenanceNumberOfVdssCommand. > java#L455 > [2] https://github.com/ovirt/ovirt-ansible-cluster-upgrade From the logs in Bug 1649502, in engine.log 2019-02-26 15:18:03,333+05 WARN [org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-3) [d4185969-4c15-435b-b9d6-220e84def4c8] Validation of action 'MaintenanceNumberOfVdss' failed for user admin@internal-authz. Reasons: VAR__TYPE__HOST,VAR__ACTION__MAINTENANCE,VDS_CANNOT_MAINTENANCE_UNSYNCED_ENTRIES_PRESENT_IN_GLUSTER_BRICKS,$BricksList [rhsqa-grafton8-nic2.lab.eng.blr.redhat.com:/gluster_bricks/data/data],$HostsList rhsqa-grafton8-nic2.lab.eng.blr.redhat.com 2019-02-26 15:18:04,201+05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-805) [] EVENT_ID: HOST_UPGRADE_STARTED(840), Host rhsqa-grafton8-nic2.lab.eng.blr.redhat.com upgrade was started (User: admin@internal-authz). 2019-02-26 15:18:05,696+05 ERROR [org.ovirt.engine.core.bll.hostdeploy.HostUpgradeCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-65) [d4185969-4c15-435b-b9d6-220e84def4c8] Host 'rhsqa-grafton8-nic2.lab.eng.blr.redhat.com' failed to move to maintenance mode. Upgrade process is terminated. 2019-02-26 15:18:05,719+05 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-65) [d4185969-4c15-435b-b9d6-220e84def4c8] EVENT_ID: VDS_MAINTENANCE_FAILED(17), Failed to switch Host rhsqa-grafton8-nic2.lab.eng.blr.redhat.com to Maintenance mode. 2019-02-26 15:18:06,780+05 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-79) [d4185969-4c15-435b-b9d6-220e84def4c8] EVENT_ID: HOST_UPGRADE_FAILED(841), Failed to upgrade Host rhsqa-grafton8-nic2.lab.eng.blr.redhat.com (User: admin@internal-authz). the issue here seems to be that Validation failures are not logged in audit log?
Yes, that make sense, we will need to tight up UpgradeHost and MaintenanceHost commands much more.
I reproduced the issue - both failures appear in the Events with the following errors: Failed to upgrade Host <hostname> (User: <username>). Failed to switch Host <hostname> to Maintenance mode.
*** This bug has been marked as a duplicate of bug 1679399 ***
Tested with ovirt-engine-4.3.4.3-0.1.el7.noarch and the fix works , so moving the bug to verified state. Steps: ===== 1.Deploy RHHI-V 2.Bring a brick down from one of the host say host1 3.Now try to upgrade host2 and it fails Logs: ==== 2019-06-07 12:38:29,371+05 INFO [org.ovirt.engine.core.bll.hostdeploy.UpgradeHostCommand] (default task-126) [682b85e9-8120-4f7e-bca8-4e80a0eb7843] Running command: UpgradeHostCommand internal: false. Entities affected : ID: d5cb1684-e96d-49ab-a095-0234f4c1a017 Type: VDSAction group EDIT_HOST_CONFIGURATION with role type ADMIN 2019-06-07 12:38:29,395+05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-126) [] EVENT_ID: HOST_UPGRADE_STARTED(840), Host rhsqa-grafton8-nic2.lab.eng.blr.redhat.com upgrade was started (User: admin@internal-authz). 2019-06-07 12:38:29,460+05 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-commandCoordinator-Thread-6) [682b85e9-8120-4f7e-bca8-4e80a0eb7843] EVENT_ID: GENERIC_ERROR_MESSAGE(14,001), Cannot switch the following Host(s) to Maintenance mode: rhsqa-grafton8-nic2.lab.eng.blr.redhat.com. Gluster quorum will be lost for the following Volumes: vmstore. 2019-06-07 12:38:29,460+05 WARN [org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-6) [682b85e9-8120-4f7e-bca8-4e80a0eb7843] Validation of action 'MaintenanceNumberOfVdss' failed for user admin@internal-authz. Reasons: VAR__TYPE__HOST,VAR__ACTION__MAINTENANCE,VDS_CANNOT_MAINTENANCE_GLUSTER_QUORUM_CANNOT_BE_MET,$VolumesList vmstore,$HostsList rhsqa-grafton8-nic2.lab.eng.blr.redhat.com 2019-06-07 12:38:29,948+05 ERROR [org.ovirt.engine.core.bll.hostdeploy.HostUpgradeCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-67) [682b85e9-8120-4f7e-bca8-4e80a0eb7843] Host 'rhsqa-grafton8-nic2.lab.eng.blr.redhat.com' failed to move to maintenance mode. Upgrade process is terminated. 2019-06-07 12:38:31,104+05 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-90) [682b85e9-8120-4f7e-bca8-4e80a0eb7843] EVENT_ID: HOST_UPGRADE_FAILED(841), Failed to upgrade Host rhsqa-grafton8-nic2.lab.eng.blr.redhat.com (User: admin@internal-authz). 2019-06-07 12:38:31,505+05 INFO [org.ovirt.engine.core.vdsbroker.gluster.GlusterTasksListVDSCommand] (DefaultQuartzScheduler2) [98d833e] START, GlusterTasksListVDSCommand(HostName = rhsqa-grafton8-nic2.lab.eng.blr.redhat.com, VdsIdVDSCommandParametersBase:{hostId='d5cb1684-e96d-49ab-a095-0234f4c1a017'}), log id: 316ed0f0 019-06-07 13:08:42,481+05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-132) [] EVENT_ID: HOST_UPGRADE_STARTED(840), Host rhsqa-grafton8-nic2.lab.eng.blr.redhat.com upgrade was started (User: admin@internal-authz). 2019-06-07 13:08:42,521+05 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-commandCoordinator-Thread-7) [e496c065-0c69-4f16-ba2c-3219fd5a8cc6] EVENT_ID: GENERIC_ERROR_MESSAGE(14,001), Cannot switch the following Host(s) to Maintenance mode: rhsqa-grafton8-nic2.lab.eng.blr.redhat.com. Gluster quorum will be lost for the following Volumes: data. 2019-06-07 13:08:42,521+05 WARN [org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-7) [e496c065-0c69-4f16-ba2c-3219fd5a8cc6] Validation of action 'MaintenanceNumberOfVdss' failed for user admin@internal-authz. Reasons: VAR__TYPE__HOST,VAR__ACTION__MAINTENANCE,VDS_CANNOT_MAINTENANCE_GLUSTER_QUORUM_CANNOT_BE_MET,$VolumesList data,$HostsList rhsqa-grafton8-nic2.lab.eng.blr.redhat.com 2019-06-07 13:08:43,428+05 ERROR [org.ovirt.engine.core.bll.hostdeploy.HostUpgradeCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-74) [e496c065-0c69-4f16-ba2c-3219fd5a8cc6] Host 'rhsqa-grafton8-nic2.lab.eng.blr.redhat.com' failed to move to maintenance mode. Upgrade process is terminated. 2019-06-07 13:08:43,435+05 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-74) [e496c065-0c69-4f16-ba2c-3219fd5a8cc6] EVENT_ID: VDS_MAINTENANCE_FAILED(17), Failed to switch Host rhsqa-grafton8-nic2.lab.eng.blr.redhat.com to Maintenance mode. 2019-06-07 13:08:44,451+05 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-52) [e496c065-0c69-4f16-ba2c-3219fd5a8cc6] EVENT_ID: HOST_UPGRADE_FAILED(841), Failed to upgrade Host rhsqa-grafton8-nic2.lab.eng.blr.redhat.com (User: admin@internal-authz).
This bugzilla is included in oVirt 4.3.4 release, published on June 11th 2019. Since the problem described in this bug report should be resolved in oVirt 4.3.4 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.
*** Bug 1721111 has been marked as a duplicate of this bug. ***