Description of problem ====================== Given I have a cluster with three hosts And all hosts are HE deployed When I pick a host that is not hosting the HE VM And I put it into maintenance And I reinstall it (undeploy HE) And when it is activated after the undeploy I put it back into maintenance And I reinstall it (deploy HE) Then when it is activated it remains with a score of 0 And the hosted-engine --vm-status command reports the host is in score=0 maintenance=True state=LocalMaintenance Version-Release number of selected component (if applicable) ============================================================ ovirt-engine-4.4.1.8-0.7.el8ev.noarch vdsm-4.40.22-1.el8ev.x86_64 ovirt-hosted-engine-ha-2.4.4-1.el8ev.noarch How reproducible ================ 100% Steps to Reproduce ================== 1. Pick a host that is HE deployed and is not running the HE VM and put it into maintenance. 2. Reinstall the host (undeploy HE + activate after reinstall). 3. When the host finishes reinstall and it's active put it back into maintenance. 4. Reinstall the host (deploy HE + activate after reinstall). Actual results ============== The host is activated once the deploy finishes, but the host has a score of 0, and the hosted-engine --vm-status command reports the host is in: score=0 maintenance=True state=LocalMaintenance Expected results ================ The host should not have a score of 0 and should not be reported as in a state of local maintenance (this is in contrast to the fact RHV reports the host is up). Additional info =============== A WA for this bug is putting the host into maintenance and then activating it, which results in (after several seconds): score=3400 maintenance=False Or simply running: # hosted-engine --set-maintenance --mode=none
didn't make it in time for 4.4.2, deferring
I'm closing this as not a bug. Hosted Engine local maintenance and Host Maintenance mode form oVirt Engine are separate concepts. If a host was put to local maintenance for hosted engine, it will be kept rightfully in that state until hosted local maintenance is lifted. This is by design.
Reference documentation: https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.4/html/administration_guide/chap-administering_the_self-hosted_engine
Actually it seems a bug is there. This step: 2. Reinstall the host (undeploy HE + activate after reinstall). should have removed the maintenance state of the host from the storage and engine. This step is done by ansible deploy, moving to engine.
(In reply to Sandro Bonazzola from comment #6) > Actually it seems a bug is there. > This step: > 2. Reinstall the host (undeploy HE + activate after reinstall). > should have removed the maintenance state of the host from the storage and > engine. > > This step is done by ansible deploy, moving to engine. On top of the undeploy leftover issue, I would like to point out that the stage {4. Reinstall the host (deploy HE + activate after reinstall).} seems un-consistent Normally when a host which is HE deployed is put into maintenance via oVirt then HE local maintenance complies, but for this scenario, it does not. If (maintenance in oVirt) -> (HE maintenance=True) and (activate in oVirt) -> (HE maintenance=False) then I find it that 'activate' post reinstall (deploy HE) should work the same, meaning the HE agent should be informed that the host is no longer in maintenance (as happens in the mentioned flows). It seems this specific flow is inconsistent with the other mentioned flows.
Unless I'm missing something local maintenance of the host should be automatically turned off during activation: https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/InitVdsOnUpCommand.java#L161 https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/hostedengine/HostedEngineHelper.java#L120 https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/vdsbroker/src/main/java/org/ovirt/engine/core/vdsbroker/vdsbroker/SetHaMaintenanceModeVDSCommand.java#L29 https://github.com/oVirt/vdsm/blob/master/lib/vdsm/API.py#L1658 So it seems to me like some issue in hosted engine HA part, but OK, let's verify first, that a call to disable local maintenance is passing from engine through VDSM to HA daemon
(In reply to Martin Perina from comment #8) > Unless I'm missing something local maintenance of the host should be > automatically turned off during activation: > > https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/ > bll/src/main/java/org/ovirt/engine/core/bll/InitVdsOnUpCommand.java#L161 > https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/ > bll/src/main/java/org/ovirt/engine/core/bll/hostedengine/HostedEngineHelper. > java#L120 > https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/ > vdsbroker/src/main/java/org/ovirt/engine/core/vdsbroker/vdsbroker/ > SetHaMaintenanceModeVDSCommand.java#L29 What happens here is that getVds().getHighlyAvailableIsConfigured() returns 'false'. Now, I need to understand why. Compensation, perhaps? Will see. > https://github.com/oVirt/vdsm/blob/master/lib/vdsm/API.py#L1658 > > So it seems to me like some issue in hosted engine HA part, but OK, let's > verify first, that a call to disable local maintenance is passing from > engine through VDSM to HA daemon
Verified on =========== vdsm-4.40.32-1.el8ev.x86_64 ovirt-ansible-engine-setup-1.2.4-1.el8ev.noarch ovirt-ansible-hosted-engine-setup-1.1.8-1.el8ev.noarch ovirt-engine-4.4.3.5-0.5.el8ev.noarch
This bugzilla is included in oVirt 4.4.3 release, published on November 10th 2020. Since the problem described in this bug report should be resolved in oVirt 4.4.3 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.
About comment #7 and comment #8, was a bug opened? If not, before opening please note that hosted engine local maintenance and host maintenance are 2 completely different concepts. local maintenance[1]: he high-availability agent on the node issuing the command is disabled from monitoring the state of the Manager virtual machine. The node is exempt from hosting the Manager virtual machine while in local maintenance mode; if hosting the Manager virtual machine when placed into this mode, the Manager will migrate to another node, provided there is one available. The local maintenance mode is recommended when applying system changes or updates to a self-hosted engine node. host maintenance[2]: Many common maintenance tasks, including network configuration and deployment of software updates, require that hosts be placed into maintenance mode. Hosts should be placed into maintenance mode before any event that might cause VDSM to stop working properly, such as a reboot, or issues with networking or storage. When a host is placed into maintenance mode the Red Hat Virtualization Manager attempts to migrate all running virtual machines to alternative hosts. The standard prerequisites for live migration apply, in particular there must be at least one active host in the cluster with capacity to run the migrated virtual machines. [1] https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/self-hosted_engine_guide/chap-maintenance_and_upgrading_resources [2] https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.4/html-single/administration_guide/index#Moving_a_host_to_maintenance_mode
(In reply to Sandro Bonazzola from comment #12) > About comment #7 and comment #8, was a bug opened? > If not, before opening please note that hosted engine local maintenance and > host maintenance are 2 completely different concepts. > > local maintenance[1]: > he high-availability agent on the node issuing the command is disabled from > monitoring the state of the Manager virtual machine. The node is exempt from > hosting the Manager virtual machine while in local maintenance mode; if > hosting the Manager virtual machine when placed into this mode, the Manager > will migrate to another node, provided there is one available. The local > maintenance mode is recommended when applying system changes or updates to a > self-hosted engine node. > > host maintenance[2]: > Many common maintenance tasks, including network configuration and > deployment of software updates, require that hosts be placed into > maintenance mode. Hosts should be placed into maintenance mode before any > event that might cause VDSM to stop working properly, such as a reboot, or > issues with networking or storage. > When a host is placed into maintenance mode the Red Hat Virtualization > Manager attempts to migrate all running virtual machines to alternative > hosts. The standard prerequisites for live migration apply, in particular > there must be at least one active host in the cluster with capacity to run > the migrated virtual machines. > > > [1] > https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/ > html/self-hosted_engine_guide/chap-maintenance_and_upgrading_resources > [2] > https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.4/ > html-single/administration_guide/index#Moving_a_host_to_maintenance_mode We synchronizing host Maintenance status with HA Local Maintenance status (each time when host enter Maintenance mode we are setting also Local Maintenace for it, and each time host is activated, we are disabling Local Maintenance), but there was an issue on engine which was fixed by https://gerrit.ovirt.org/111367
(In reply to Sandro Bonazzola from comment #12) > About comment #7 and comment #8, was a bug opened? > If not, before opening please note that hosted engine local maintenance and > host maintenance are 2 completely different concepts. > > local maintenance[1]: > he high-availability agent on the node issuing the command is disabled from > monitoring the state of the Manager virtual machine. The node is exempt from > hosting the Manager virtual machine while in local maintenance mode; if > hosting the Manager virtual machine when placed into this mode, the Manager > will migrate to another node, provided there is one available. The local > maintenance mode is recommended when applying system changes or updates to a > self-hosted engine node. > > host maintenance[2]: > Many common maintenance tasks, including network configuration and > deployment of software updates, require that hosts be placed into > maintenance mode. Hosts should be placed into maintenance mode before any > event that might cause VDSM to stop working properly, such as a reboot, or > issues with networking or storage. > When a host is placed into maintenance mode the Red Hat Virtualization > Manager attempts to migrate all running virtual machines to alternative > hosts. The standard prerequisites for live migration apply, in particular > there must be at least one active host in the cluster with capacity to run > the migrated virtual machines. > > > [1] > https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/ > html/self-hosted_engine_guide/chap-maintenance_and_upgrading_resources > [2] > https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.4/ > html-single/administration_guide/index#Moving_a_host_to_maintenance_mode I'm quite aware these are two different concepts, however, once oVirt chose to create a coupling between the two (see comment #7), in my opinion, there should be some measure of accountability for this described 'bug'. The flow for activating a host post reinstall should be no different than activating a host after it has been in local-maintenance. From the customer perspective, this is confusing to expect the host to have a score after HE was deployed, and then realizing the host will continue to have a score of 0 until an active action is taken by the admin - this gives a new meaning to 'activate post reinstall' in the HE path.