Bug 1286562 - [HC] Unable to move the host running hosted-engine to maintenance state
[HC] Unable to move the host running hosted-engine to maintenance state
Status: CLOSED CURRENTRELEASE
Product: ovirt-hosted-engine-ha
Classification: oVirt
Component: Agent (Show other bugs)
1.3.2.1
x86_64 Linux
unspecified Severity urgent (vote)
: ovirt-4.0.0-alpha
: ---
Assigned To: Martin Sivák
Ilanit Stein
:
Depends On:
Blocks: Gluster-HC-1
  Show dependency treegraph
 
Reported: 2015-11-30 04:14 EST by SATHEESARAN
Modified: 2016-02-04 03:15 EST (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
RHEV+RHGS Hyperconvergence RHEL 7.2 Nodes as hypervisors
Last Closed: 2016-02-04 03:15:06 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: SLA
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
dfediuck: ovirt‑4.0.0?
rule-engine: planning_ack?
sasundar: devel_ack?
rule-engine: testing_ack?


Attachments (Terms of Use)
engine.log from hosted-engine (678.50 KB, application/x-gzip)
2015-12-03 03:23 EST, SATHEESARAN
no flags Details
vdsm.log from host-3 (14.75 MB, text/plain)
2015-12-03 03:26 EST, SATHEESARAN
no flags Details
agent.log from host-3 (7.74 MB, text/plain)
2015-12-03 03:30 EST, SATHEESARAN
no flags Details
hosted-engine-setup log (384.36 KB, text/plain)
2015-12-03 03:31 EST, SATHEESARAN
no flags Details
screenshot showing host-3 was still in 'preparing for maintenance' state (72.55 KB, image/png)
2015-12-03 03:33 EST, SATHEESARAN
no flags Details

  None (edit)
Description SATHEESARAN 2015-11-30 04:14:04 EST
Description of problem:
------------------------
Hosted-engine was configured to use gluster domain. 2 Additional hosts are also added to the cluster. In this usecase, when a host running hosted-engine was moved to maintenance state, the host still remains in "Preparing for Maintenance", with the hosted engine running on it.

But the events tab indicate the host is switched to maintenance mode

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
RHEVM 3.6 Beta1 - ( 3.6.0.3-0.1.el6 )
qemu-kvm-rhev-2.3.0-31.el7_2.3.x86_64
gluster-3.8 dev rpms

How reproducible:
-----------------
Never tried to reproduce. But seen the same issue in another setup also

Steps to Reproduce:
-------------------
1. Setup the hosted engine with glusterfs volume as backend
2. Add 2 more additional hosts using hosted-engine setup
3. When the cluster is up, move the host running to hosted-engine to maintenance

Actual results:
---------------
Hosts status continues to stay in "Preparing for maintenance" for long time
Host's events tab says, the host switched to maintenance state, though hosted-engine is still running on that host

Expected results:
------------------
Hosted-engine should move to another host and host should move to maintenance state


Additional info:
Comment 1 Red Hat Bugzilla Rules Engine 2015-11-30 10:34:34 EST
This bug is not marked for z-stream, yet the milestone is for a z-stream version, therefore the milestone has been reset.
Please set the correct milestone or add the z-stream flag.
Comment 2 Martin Sivák 2015-11-30 10:40:30 EST
Can you please attach the engine.log from the hosted engine VM and vdsm and hosted engine logs from the host?
Comment 3 Sahina Bose 2015-12-03 02:06:00 EST
Related but maybe not the same - I was unable to move a host (host2) to maintenance mode even when hosted-engine was not running on it. 
hosted-engine --vm-status --showed that hosted-engine VM was running on host3.

Migrated all running VMs from host2 to other hosts. Engine continued to show that there was a VM running on host2.

vdsClient -s 0 list --> showed hosted-engine VM with state "Down"

Once I destroyed this VM (vdsClient destroy <vmid>), host2 moved to maintenance.
Comment 4 SATHEESARAN 2015-12-03 03:22:15 EST
# hosted-engine --vm-status


--== Host 1 status ==--

Status up-to-date                  : True
Hostname                           : rhs-client10.lab.eng.blr.redhat.com
Host ID                            : 1
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : f4d92048
Host timestamp                     : 175330


--== Host 2 status ==--

Status up-to-date                  : True
Hostname                           : rhs-gp-srv1.lab.eng.blr.redhat.com
Host ID                            : 2
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 0
stopped                            : False
Local maintenance                  : True
crc32                              : c1e61847
Host timestamp                     : 158830


--== Host 3 status ==--

Status up-to-date                  : True
Hostname                           : rhs-hpc-srv3.lab.eng.blr.redhat.com
Host ID                            : 3
Engine status                      : {"health": "good", "vm": "up", "detail": "up"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 5285de0a
Host timestamp                     : 137108

hosted_engine was running in hosted_engine_3

After moving hosted_engine_3 to maintenance state, 

# hosted-engine --vm-status


--== Host 1 status ==--

Status up-to-date                  : True
Hostname                           : rhs-client10.lab.eng.blr.redhat.com
Host ID                            : 1
Engine status                      : {"health": "good", "vm": "up", "detail": "up"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : c1e6f23e
Host timestamp                     : 178291


--== Host 2 status ==--

Status up-to-date                  : True
Hostname                           : rhs-gp-srv1.lab.eng.blr.redhat.com
Host ID                            : 2
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 122a0c92
Host timestamp                     : 161493


--== Host 3 status ==--

Status up-to-date                  : True
Hostname                           : rhs-hpc-srv3.lab.eng.blr.redhat.com
Host ID                            : 3
Engine status                      : {"reason": "bad vm status", "health": "bad", "vm": "down", "detail": "down"}
Score                              : 0
stopped                            : False
Local maintenance                  : True
crc32                              : a854c225
Host timestamp                     : 140084

The UI still status of hosted_engine_3 as "Preparing for Maintenance" and event tabs states that - host is switched to maintenance

I will be attaching the engine.log from hosted-engine, vdsm logs post this comment
Comment 5 SATHEESARAN 2015-12-03 03:23 EST
Created attachment 1101700 [details]
engine.log from hosted-engine
Comment 6 SATHEESARAN 2015-12-03 03:26 EST
Created attachment 1101701 [details]
vdsm.log from host-3
Comment 7 SATHEESARAN 2015-12-03 03:30 EST
Created attachment 1101702 [details]
agent.log from host-3
Comment 8 SATHEESARAN 2015-12-03 03:31 EST
Created attachment 1101703 [details]
hosted-engine-setup log
Comment 9 SATHEESARAN 2015-12-03 03:33 EST
Created attachment 1101704 [details]
screenshot showing host-3 was still in 'preparing for maintenance' state
Comment 10 SATHEESARAN 2016-02-04 03:15:06 EST
Tested with RHEV 3.6 beta3 ( RHEVM 3.6.3 ), with hosted engine ( ovirt-hosted-engine-setup-1.3.2.3-1.el7ev.noarch )

This works now.
I could move the host to maintenance, once master storage domain is created ( which imports hosted-engine storage domain too ).

Note You need to log in before you can comment on or make changes to this bug.