1862053 – Failed start HostedEngine at gluster storage

Bug 1862053 - Failed start HostedEngine at gluster storage

Summary: Failed start HostedEngine at gluster storage

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	vdsm
Classification:	oVirt
Component:	Gluster
Sub Component:
Version:	4.40.22
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Gobinda Das
QA Contact:	SATHEESARAN
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-07-30 10:10 UTC by Oleh Horbachov
Modified:	2020-08-05 07:53 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-08-05 07:53:01 UTC
oVirt Team:	Gluster
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
vm boot issue (11.23 KB, image/png) 2020-07-30 10:10 UTC, Oleh Horbachov	no flags	Details
vdsmd logs from start-stop interval (160.99 KB, text/plain) 2020-07-30 11:32 UTC, Oleh Horbachov	no flags	Details
engine logs (6.91 KB, text/plain) 2020-07-30 11:59 UTC, Oleh Horbachov	no flags	Details
supervdsm (8.64 KB, text/plain) 2020-07-30 12:26 UTC, Oleh Horbachov	no flags	Details
gluster mnt log (2.02 KB, text/plain) 2020-07-30 12:30 UTC, Oleh Horbachov	no flags	Details
agent logs (2.34 KB, text/plain) 2020-07-30 12:54 UTC, Oleh Horbachov	no flags	Details
agent logs (2.34 KB, text/plain) 2020-07-30 12:57 UTC, Oleh Horbachov	no flags	Details
broker (3.33 KB, text/plain) 2020-07-30 12:57 UTC, Oleh Horbachov	no flags	Details
View All

Description Oleh Horbachov 2020-07-30 10:10:41 UTC

Created attachment 1702924 [details]
vm boot issue

Created attachment 1702924 [details]
vm boot issue

Description of problem:

I deployed oVirt 4.4.1 with a hosted-engine at Gluster storage. Often the Engine starts with an error wich showed in attachment. The Engine started after a lot stop-start iterations. 
The same error is inherent for other cases:
 - following Hosted-Engine installation: with disabled maintenance  mode the Engine started after few attempts on different hosts
 - for common virtual machines at Gluster storage

If disk to move to iscsi the error not present

Version-Release number of selected component (if applicable):

ovirt-4.4.1.4
vdsm-4.40.22-1.el8.x86_64
glusterfs-7.6-1.el8.x86_64

How reproducible:

Steps to Reproduce:
1. Enable Global maintenance: 'hosted-engine --set-maintenance --mode=global'
2. Stop Hosted Engine: 'hosted-engine --vm-shutdown'
3. Start Hosted Engine: 'hosted-engine --vm-start'
4. Set console password: 'hosted-engine --add-console-password'
5. Connect Engine's VNC console and look 'hosted-engine --vm-status'

Actual results:
1. Engine status: {"vm": "up", "health": "bad", "detail": "Up", "reason": "failed liveliness check"}
2. Screenshot in attachement

Expected results:
Engine status: {"vm": "up", "health": "good", "detail": "Up"}

Additional info:

Comment 1 Yaniv Kaul 2020-07-30 10:57:00 UTC

Can you attach relevant logs? Specifically, Gluster, VDSM and hosted engine logs?

Comment 2 Oleh Horbachov 2020-07-30 11:32:51 UTC

Created attachment 1702936 [details]
vdsmd logs from start-stop interval

Comment 3 Oleh Horbachov 2020-07-30 11:36:55 UTC

glusterd.log only this sting from start-stop interval

[2020-07-30 11:18:15.525364] I [MSGID: 106488] [glusterd-handler.c:1400:__glusterd_handle_cli_get_volume] 0-management: Received get vol req

Comment 4 Oleh Horbachov 2020-07-30 11:44:39 UTC

In additional I think may be related https://bugzilla.redhat.com/show_bug.cgi?id=1859403

Comment 5 Oleh Horbachov 2020-07-30 11:59:40 UTC

Created attachment 1702942 [details]
engine logs

Comment 6 Gobinda Das 2020-07-30 12:03:03 UTC

Can you please provide engine mount log?

Comment 7 Oleh Horbachov 2020-07-30 12:26:18 UTC

Created attachment 1702946 [details]
supervdsm

Comment 8 Oleh Horbachov 2020-07-30 12:30:55 UTC

Created attachment 1702948 [details]
gluster mnt log

Comment 9 Gobinda Das 2020-07-30 12:47:31 UTC

Could not find anything from attached logs.
If the health is bad and the vm is up, the HA services will try to restart the Manager virtual machine to get the Manager back.
Also please provide HA agent and broker log.

Please check  "gluster volume status" ?
If yes then I will need some logs from engine (need to connect to engine via console):
/var/log/messages, /var/log/ovirt-engine/engine.log and /var/log/ovirt-engine/server.log.

You can also check engine status by running cmd:
systemctl status -l ovirt-engine
If anything wrong then check: 
journalctl -u ovirt-engine

Comment 10 Oleh Horbachov 2020-07-30 12:54:35 UTC

Created attachment 1702950 [details]
agent logs

Comment 11 Oleh Horbachov 2020-07-30 12:57:04 UTC

Created attachment 1702951 [details]
agent logs

Comment 12 Oleh Horbachov 2020-07-30 12:57:35 UTC

Created attachment 1702952 [details]
broker

Comment 13 Oleh Horbachov 2020-07-30 13:02:04 UTC

(In reply to Gobinda Das from comment #9)
> Could not find anything from attached logs.
> If the health is bad and the vm is up, the HA services will try to restart
> the Manager virtual machine to get the Manager back.
> Also please provide HA agent and broker log.
> 
> Please check  "gluster volume status" ?
Brick store-01:/gluster_bricks/engine/engine
Status: Connected
Number of entries: 0

Brick store-02:/gluster_bricks/engine/engine
Status: Connected
Number of entries: 0

Brick store-03:/gluster_bricks/engine/engine
Status: Connected
Number of entries: 0

> If yes then I will need some logs from engine (need to connect to engine via
> console):
> /var/log/messages, /var/log/ovirt-engine/engine.log and
> /var/log/ovirt-engine/server.log.
> 
> You can also check engine status by running cmd:
> systemctl status -l ovirt-engine
> If anything wrong then check: 
> journalctl -u ovirt-engine

I cant check ovirt-engine service. VM not started really. Please look first attachment

Comment 14 Gobinda Das 2020-07-31 07:32:59 UTC

So this issue we hit long back with gluster sharding and The issue is fixed with glusterfs release-7 and release-8

Ref BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1823423
This is the fix -> https://review.gluster.org/#/c/glusterfs/+/24480/

Please upgrade your glusterfs to v7.7 and try.

Comment 15 Oleh Horbachov 2020-07-31 08:54:38 UTC

I upgraded glusterfs to 7.7 from centos-gluster7-test and issue fixed. Unfortunately v7.7 not present in main repo now

Note You need to log in before you can comment on or make changes to this bug.