Bug 1859586
Summary: | [HE] Re-deploying HE on host leaves host in state=LocalMaintenance with score=0 indefinitely | ||
---|---|---|---|
Product: | [oVirt] ovirt-engine | Reporter: | msheena |
Component: | BLL.Infra | Assignee: | Artur Socha <asocha> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | msheena |
Severity: | medium | Docs Contact: | |
Priority: | high | ||
Version: | 4.4.1.8 | CC: | bugs, mburman, mperina, pmatyas, sbonazzo |
Target Milestone: | ovirt-4.4.3 | Keywords: | Automation, Reopened |
Target Release: | --- | Flags: | pm-rhel:
ovirt-4.4+
pmatyas: testing_plan_complete+ |
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | ovirt-engine-4.4.3.5 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-11-11 06:39:32 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | Infra | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1702016 |
Description
msheena
2020-07-22 14:05:54 UTC
didn't make it in time for 4.4.2, deferring I'm closing this as not a bug. Hosted Engine local maintenance and Host Maintenance mode form oVirt Engine are separate concepts. If a host was put to local maintenance for hosted engine, it will be kept rightfully in that state until hosted local maintenance is lifted. This is by design. Reference documentation: https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.4/html/administration_guide/chap-administering_the_self-hosted_engine Actually it seems a bug is there. This step: 2. Reinstall the host (undeploy HE + activate after reinstall). should have removed the maintenance state of the host from the storage and engine. This step is done by ansible deploy, moving to engine. (In reply to Sandro Bonazzola from comment #6) > Actually it seems a bug is there. > This step: > 2. Reinstall the host (undeploy HE + activate after reinstall). > should have removed the maintenance state of the host from the storage and > engine. > > This step is done by ansible deploy, moving to engine. On top of the undeploy leftover issue, I would like to point out that the stage {4. Reinstall the host (deploy HE + activate after reinstall).} seems un-consistent Normally when a host which is HE deployed is put into maintenance via oVirt then HE local maintenance complies, but for this scenario, it does not. If (maintenance in oVirt) -> (HE maintenance=True) and (activate in oVirt) -> (HE maintenance=False) then I find it that 'activate' post reinstall (deploy HE) should work the same, meaning the HE agent should be informed that the host is no longer in maintenance (as happens in the mentioned flows). It seems this specific flow is inconsistent with the other mentioned flows. Unless I'm missing something local maintenance of the host should be automatically turned off during activation: https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/InitVdsOnUpCommand.java#L161 https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/hostedengine/HostedEngineHelper.java#L120 https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/vdsbroker/src/main/java/org/ovirt/engine/core/vdsbroker/vdsbroker/SetHaMaintenanceModeVDSCommand.java#L29 https://github.com/oVirt/vdsm/blob/master/lib/vdsm/API.py#L1658 So it seems to me like some issue in hosted engine HA part, but OK, let's verify first, that a call to disable local maintenance is passing from engine through VDSM to HA daemon (In reply to Martin Perina from comment #8) > Unless I'm missing something local maintenance of the host should be > automatically turned off during activation: > > https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/ > bll/src/main/java/org/ovirt/engine/core/bll/InitVdsOnUpCommand.java#L161 > https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/ > bll/src/main/java/org/ovirt/engine/core/bll/hostedengine/HostedEngineHelper. > java#L120 > https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/ > vdsbroker/src/main/java/org/ovirt/engine/core/vdsbroker/vdsbroker/ > SetHaMaintenanceModeVDSCommand.java#L29 What happens here is that getVds().getHighlyAvailableIsConfigured() returns 'false'. Now, I need to understand why. Compensation, perhaps? Will see. > https://github.com/oVirt/vdsm/blob/master/lib/vdsm/API.py#L1658 > > So it seems to me like some issue in hosted engine HA part, but OK, let's > verify first, that a call to disable local maintenance is passing from > engine through VDSM to HA daemon Verified on =========== vdsm-4.40.32-1.el8ev.x86_64 ovirt-ansible-engine-setup-1.2.4-1.el8ev.noarch ovirt-ansible-hosted-engine-setup-1.1.8-1.el8ev.noarch ovirt-engine-4.4.3.5-0.5.el8ev.noarch This bugzilla is included in oVirt 4.4.3 release, published on November 10th 2020. Since the problem described in this bug report should be resolved in oVirt 4.4.3 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report. About comment #7 and comment #8, was a bug opened? If not, before opening please note that hosted engine local maintenance and host maintenance are 2 completely different concepts. local maintenance[1]: he high-availability agent on the node issuing the command is disabled from monitoring the state of the Manager virtual machine. The node is exempt from hosting the Manager virtual machine while in local maintenance mode; if hosting the Manager virtual machine when placed into this mode, the Manager will migrate to another node, provided there is one available. The local maintenance mode is recommended when applying system changes or updates to a self-hosted engine node. host maintenance[2]: Many common maintenance tasks, including network configuration and deployment of software updates, require that hosts be placed into maintenance mode. Hosts should be placed into maintenance mode before any event that might cause VDSM to stop working properly, such as a reboot, or issues with networking or storage. When a host is placed into maintenance mode the Red Hat Virtualization Manager attempts to migrate all running virtual machines to alternative hosts. The standard prerequisites for live migration apply, in particular there must be at least one active host in the cluster with capacity to run the migrated virtual machines. [1] https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/self-hosted_engine_guide/chap-maintenance_and_upgrading_resources [2] https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.4/html-single/administration_guide/index#Moving_a_host_to_maintenance_mode (In reply to Sandro Bonazzola from comment #12) > About comment #7 and comment #8, was a bug opened? > If not, before opening please note that hosted engine local maintenance and > host maintenance are 2 completely different concepts. > > local maintenance[1]: > he high-availability agent on the node issuing the command is disabled from > monitoring the state of the Manager virtual machine. The node is exempt from > hosting the Manager virtual machine while in local maintenance mode; if > hosting the Manager virtual machine when placed into this mode, the Manager > will migrate to another node, provided there is one available. The local > maintenance mode is recommended when applying system changes or updates to a > self-hosted engine node. > > host maintenance[2]: > Many common maintenance tasks, including network configuration and > deployment of software updates, require that hosts be placed into > maintenance mode. Hosts should be placed into maintenance mode before any > event that might cause VDSM to stop working properly, such as a reboot, or > issues with networking or storage. > When a host is placed into maintenance mode the Red Hat Virtualization > Manager attempts to migrate all running virtual machines to alternative > hosts. The standard prerequisites for live migration apply, in particular > there must be at least one active host in the cluster with capacity to run > the migrated virtual machines. > > > [1] > https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/ > html/self-hosted_engine_guide/chap-maintenance_and_upgrading_resources > [2] > https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.4/ > html-single/administration_guide/index#Moving_a_host_to_maintenance_mode We synchronizing host Maintenance status with HA Local Maintenance status (each time when host enter Maintenance mode we are setting also Local Maintenace for it, and each time host is activated, we are disabling Local Maintenance), but there was an issue on engine which was fixed by https://gerrit.ovirt.org/111367 (In reply to Sandro Bonazzola from comment #12) > About comment #7 and comment #8, was a bug opened? > If not, before opening please note that hosted engine local maintenance and > host maintenance are 2 completely different concepts. > > local maintenance[1]: > he high-availability agent on the node issuing the command is disabled from > monitoring the state of the Manager virtual machine. The node is exempt from > hosting the Manager virtual machine while in local maintenance mode; if > hosting the Manager virtual machine when placed into this mode, the Manager > will migrate to another node, provided there is one available. The local > maintenance mode is recommended when applying system changes or updates to a > self-hosted engine node. > > host maintenance[2]: > Many common maintenance tasks, including network configuration and > deployment of software updates, require that hosts be placed into > maintenance mode. Hosts should be placed into maintenance mode before any > event that might cause VDSM to stop working properly, such as a reboot, or > issues with networking or storage. > When a host is placed into maintenance mode the Red Hat Virtualization > Manager attempts to migrate all running virtual machines to alternative > hosts. The standard prerequisites for live migration apply, in particular > there must be at least one active host in the cluster with capacity to run > the migrated virtual machines. > > > [1] > https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/ > html/self-hosted_engine_guide/chap-maintenance_and_upgrading_resources > [2] > https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.4/ > html-single/administration_guide/index#Moving_a_host_to_maintenance_mode I'm quite aware these are two different concepts, however, once oVirt chose to create a coupling between the two (see comment #7), in my opinion, there should be some measure of accountability for this described 'bug'. The flow for activating a host post reinstall should be no different than activating a host after it has been in local-maintenance. From the customer perspective, this is confusing to expect the host to have a score after HE was deployed, and then realizing the host will continue to have a score of 0 until an active action is taken by the admin - this gives a new meaning to 'activate post reinstall' in the HE path. |