Bug 1162998

Summary: Unable to remove a failed node from a hosted-engine setup
Product: [Retired] oVirt Reporter: Joop van de Wege <jvandewege>
Component: ovirt-hosted-engine-haAssignee: Doron Fediuck <dfediuck>
Status: CLOSED CURRENTRELEASE QA Contact: Nikolai Sednev <nsednev>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.5CC: abhishek, agkesos, bugs, dfediuck, ecohen, iheim, istein, lsurette, rbalakri, rgolan, sbonazzo, thomas.keppler, yeylon
Target Milestone: ---Keywords: Reopened, Triaged
Target Release: 3.5.1   
Hardware: Unspecified   
OS: Unspecified   
URL: https://polarion.engineering.redhat.com/polarion/#/project/RHEVM3/workitem?id=RHEVM-15005
Whiteboard: sla
Fixed In Version: ovirt-3.5.1_rc1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-05-11 06:30:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1181672, 1193058    
Attachments:
Description Flags
engine log around the put to maintenance event
none
corresponding server log to go with engine.log none

Description Joop van de Wege 2014-11-12 08:06:10 UTC
Created attachment 956589 [details]
engine log around the put to maintenance event

Description of problem:

If a node has a hardware failure then it isn't possible to remove that host using the webui

Version-Release number of selected component (if applicable):
ovirt-3.5 (3.4 probably has the same problem)

How reproducible:
Always

Steps to Reproduce:
Create a setup of hosted-engine with 2 but preferable 3 hosts. Make sure the engine runs on host1 and is the SPM. Make sure host2 and host3 are ready and not in maintenance. Put host2 in maintenance, pull the plug on host3. Host3 will go into 'non responsive'. Now try to put it into maintenance. It will fail. Using the 'Confirm host has rebooted' and trying Maintenance will fail also. You're now stuck with host3. Host2 can be removed without problems where it is online or not.

Actual results:
Stuck with a host that can't be removed from the webui

Expected results:
Be able to (force) remove dead hosts

Comment 1 Joop van de Wege 2014-11-12 08:08:42 UTC
Created attachment 956590 [details]
corresponding server log to go with engine.log

Comment 2 Joop van de Wege 2014-11-12 08:11:28 UTC
Workaround for the time being:
update vds_statistics set ha_configured=true,ha_score=1 where vds_id in (select vds_id from vds where vds_name = 'YOURHOST')

Then it should be able to maintenance the host and then remove it.

Comment 3 Roy Golan 2014-11-12 08:48:09 UTC
root cause is a network exception which is thrown in Maintence command in an
attempt to SetHaMaintence on a host with positive score. this aborts the 
parent maintence command with a false log hinting there are runing vms.

see Maintenance.java:59 [1]

to avoid that we probably need to catch the exception and handle 
properly incase the hosted engine VM is running on that host.

[1] http://gerrit.ovirt.org/gitweb?p=ovirt-engine.git;a=blob;f=backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/MaintenanceVdsCommand.java

Comment 4 Sandro Bonazzola 2015-01-15 14:15:26 UTC
This is an automated message: 
This bug should be fixed in oVirt 3.5.1 RC1, moving to QA

Comment 5 Sandro Bonazzola 2015-01-21 16:06:16 UTC
oVirt 3.5.1 has been released. If problems still persist, please make note of it in this bug report.

Comment 6 Nikolai Sednev 2015-01-22 14:59:06 UTC
Works for me on these components:
sanlock-2.8-1.el6.x86_64
libvirt-client-0.10.2-46.el6_6.2.x86_64
qemu-kvm-rhev-0.12.1.2-2.448.el6.x86_64
mom-0.4.1-4.el6ev.noarch
vdsm-4.16.8.1-6.el6ev.x86_64
ovirt-hosted-engine-setup-1.2.1-9.el6ev.noarch
ovirt-host-deploy-1.3.0-2.el6ev.noarch
ovirt-hosted-engine-ha-1.2.4-5.el6ev.noarch
rhevm-3.5.0-0.30.el6ev.noarch
rhevm-guest-agent-common-1.0.10-2.el6ev.noarch

Comment 7 Roy Golan 2015-01-27 10:32:54 UTC
*** Bug 1181672 has been marked as a duplicate of this bug. ***

Comment 9 Abhishek Sahni 2018-05-24 11:49:41 UTC
Hello Team,

Any update on this BUG.

I am unable to delete ovirt node if the node is in "Non Responsive mode"