Bug 1162998 - Unable to remove a failed node from a hosted-engine setup
Summary: Unable to remove a failed node from a hosted-engine setup
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: oVirt
Classification: Retired
Component: ovirt-hosted-engine-ha
Version: 3.5
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.5.1
Assignee: Doron Fediuck
QA Contact: Nikolai Sednev
URL: https://polarion.engineering.redhat.c...
Whiteboard: sla
Depends On:
Blocks: 1181672 1193058
TreeView+ depends on / blocked
 
Reported: 2014-11-12 08:06 UTC by Joop van de Wege
Modified: 2019-04-28 10:46 UTC (History)
13 users (show)

Fixed In Version: ovirt-3.5.1_rc1
Clone Of:
Environment:
Last Closed: 2015-05-11 06:30:05 UTC
oVirt Team: SLA
Embargoed:


Attachments (Terms of Use)
engine log around the put to maintenance event (700.30 KB, text/plain)
2014-11-12 08:06 UTC, Joop van de Wege
no flags Details
corresponding server log to go with engine.log (700.30 KB, text/plain)
2014-11-12 08:08 UTC, Joop van de Wege
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 35269 0 master MERGED core: don't throw an exception, when moving unresponsive VDS to maintenance Never
oVirt gerrit 35654 0 ovirt-engine-3.5 MERGED core: don't throw an exception, when moving unresponsive VDS to maintenance Never

Description Joop van de Wege 2014-11-12 08:06:10 UTC
Created attachment 956589 [details]
engine log around the put to maintenance event

Description of problem:

If a node has a hardware failure then it isn't possible to remove that host using the webui

Version-Release number of selected component (if applicable):
ovirt-3.5 (3.4 probably has the same problem)

How reproducible:
Always

Steps to Reproduce:
Create a setup of hosted-engine with 2 but preferable 3 hosts. Make sure the engine runs on host1 and is the SPM. Make sure host2 and host3 are ready and not in maintenance. Put host2 in maintenance, pull the plug on host3. Host3 will go into 'non responsive'. Now try to put it into maintenance. It will fail. Using the 'Confirm host has rebooted' and trying Maintenance will fail also. You're now stuck with host3. Host2 can be removed without problems where it is online or not.

Actual results:
Stuck with a host that can't be removed from the webui

Expected results:
Be able to (force) remove dead hosts

Comment 1 Joop van de Wege 2014-11-12 08:08:42 UTC
Created attachment 956590 [details]
corresponding server log to go with engine.log

Comment 2 Joop van de Wege 2014-11-12 08:11:28 UTC
Workaround for the time being:
update vds_statistics set ha_configured=true,ha_score=1 where vds_id in (select vds_id from vds where vds_name = 'YOURHOST')

Then it should be able to maintenance the host and then remove it.

Comment 3 Roy Golan 2014-11-12 08:48:09 UTC
root cause is a network exception which is thrown in Maintence command in an
attempt to SetHaMaintence on a host with positive score. this aborts the 
parent maintence command with a false log hinting there are runing vms.

see Maintenance.java:59 [1]

to avoid that we probably need to catch the exception and handle 
properly incase the hosted engine VM is running on that host.

[1] http://gerrit.ovirt.org/gitweb?p=ovirt-engine.git;a=blob;f=backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/MaintenanceVdsCommand.java

Comment 4 Sandro Bonazzola 2015-01-15 14:15:26 UTC
This is an automated message: 
This bug should be fixed in oVirt 3.5.1 RC1, moving to QA

Comment 5 Sandro Bonazzola 2015-01-21 16:06:16 UTC
oVirt 3.5.1 has been released. If problems still persist, please make note of it in this bug report.

Comment 6 Nikolai Sednev 2015-01-22 14:59:06 UTC
Works for me on these components:
sanlock-2.8-1.el6.x86_64
libvirt-client-0.10.2-46.el6_6.2.x86_64
qemu-kvm-rhev-0.12.1.2-2.448.el6.x86_64
mom-0.4.1-4.el6ev.noarch
vdsm-4.16.8.1-6.el6ev.x86_64
ovirt-hosted-engine-setup-1.2.1-9.el6ev.noarch
ovirt-host-deploy-1.3.0-2.el6ev.noarch
ovirt-hosted-engine-ha-1.2.4-5.el6ev.noarch
rhevm-3.5.0-0.30.el6ev.noarch
rhevm-guest-agent-common-1.0.10-2.el6ev.noarch

Comment 7 Roy Golan 2015-01-27 10:32:54 UTC
*** Bug 1181672 has been marked as a duplicate of this bug. ***

Comment 9 Abhishek Sahni 2018-05-24 11:49:41 UTC
Hello Team,

Any update on this BUG.

I am unable to delete ovirt node if the node is in "Non Responsive mode"


Note You need to log in before you can comment on or make changes to this bug.