Bug 1162998
Summary: | Unable to remove a failed node from a hosted-engine setup | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Retired] oVirt | Reporter: | Joop van de Wege <jvandewege> | ||||||
Component: | ovirt-hosted-engine-ha | Assignee: | Doron Fediuck <dfediuck> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Nikolai Sednev <nsednev> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 3.5 | CC: | abhishek, agkesos, bugs, dfediuck, ecohen, iheim, istein, lsurette, rbalakri, rgolan, sbonazzo, thomas.keppler, yeylon | ||||||
Target Milestone: | --- | Keywords: | Reopened, Triaged | ||||||
Target Release: | 3.5.1 | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
URL: | https://polarion.engineering.redhat.com/polarion/#/project/RHEVM3/workitem?id=RHEVM-15005 | ||||||||
Whiteboard: | sla | ||||||||
Fixed In Version: | ovirt-3.5.1_rc1 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2015-05-11 06:30:05 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | SLA | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1181672, 1193058 | ||||||||
Attachments: |
|
Created attachment 956590 [details]
corresponding server log to go with engine.log
Workaround for the time being: update vds_statistics set ha_configured=true,ha_score=1 where vds_id in (select vds_id from vds where vds_name = 'YOURHOST') Then it should be able to maintenance the host and then remove it. root cause is a network exception which is thrown in Maintence command in an attempt to SetHaMaintence on a host with positive score. this aborts the parent maintence command with a false log hinting there are runing vms. see Maintenance.java:59 [1] to avoid that we probably need to catch the exception and handle properly incase the hosted engine VM is running on that host. [1] http://gerrit.ovirt.org/gitweb?p=ovirt-engine.git;a=blob;f=backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/MaintenanceVdsCommand.java This is an automated message: This bug should be fixed in oVirt 3.5.1 RC1, moving to QA oVirt 3.5.1 has been released. If problems still persist, please make note of it in this bug report. Works for me on these components: sanlock-2.8-1.el6.x86_64 libvirt-client-0.10.2-46.el6_6.2.x86_64 qemu-kvm-rhev-0.12.1.2-2.448.el6.x86_64 mom-0.4.1-4.el6ev.noarch vdsm-4.16.8.1-6.el6ev.x86_64 ovirt-hosted-engine-setup-1.2.1-9.el6ev.noarch ovirt-host-deploy-1.3.0-2.el6ev.noarch ovirt-hosted-engine-ha-1.2.4-5.el6ev.noarch rhevm-3.5.0-0.30.el6ev.noarch rhevm-guest-agent-common-1.0.10-2.el6ev.noarch *** Bug 1181672 has been marked as a duplicate of this bug. *** Hello Team, Any update on this BUG. I am unable to delete ovirt node if the node is in "Non Responsive mode" |
Created attachment 956589 [details] engine log around the put to maintenance event Description of problem: If a node has a hardware failure then it isn't possible to remove that host using the webui Version-Release number of selected component (if applicable): ovirt-3.5 (3.4 probably has the same problem) How reproducible: Always Steps to Reproduce: Create a setup of hosted-engine with 2 but preferable 3 hosts. Make sure the engine runs on host1 and is the SPM. Make sure host2 and host3 are ready and not in maintenance. Put host2 in maintenance, pull the plug on host3. Host3 will go into 'non responsive'. Now try to put it into maintenance. It will fail. Using the 'Confirm host has rebooted' and trying Maintenance will fail also. You're now stuck with host3. Host2 can be removed without problems where it is online or not. Actual results: Stuck with a host that can't be removed from the webui Expected results: Be able to (force) remove dead hosts