1247098 – [hosted-engine] [GlusterFS support] Creation of a Gluster storage domain in the hosted-engine setup causes the VM to become unreachable

Bug 1247098 - [hosted-engine] [GlusterFS support] Creation of a Gluster storage domain in the hosted-engine setup causes the VM to become unreachable

Summary: [hosted-engine] [GlusterFS support] Creation of a Gluster storage domain in t...

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	oVirt
Classification:	Retired
Component:	ovirt-engine-core
Sub Component:
Version:	3.6
Hardware:	x86_64
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	m1
Target Release:	3.6.0
Assignee:	Ala Hino
QA Contact:	Elad
Docs Contact:
URL:
Whiteboard:	storage
Depends On:
Blocks:	Hosted_Engine_HC
TreeView+	depends on / blocked

Reported:	2015-07-27 10:35 UTC by Elad
Modified:	2016-03-10 06:18 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2015-07-29 08:37:51 UTC
oVirt Team:	Storage
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Elad 2015-07-27 10:35:02 UTC

Description of problem:
I deployed successfully hosted-engine over GlusterFS. 
I tried to create a Gluster storage domain (with a replica 3 volume) in the setup and the vm immediately became unreachable. The VM status is reported as up:


[root@green-vdsc 7]# hosted-engine --vm-status


--== Host 1 status ==--

Status up-to-date                  : True
Hostname                           : green-vdsc.qa.lab.tlv.redhat.com
Host ID                            : 1
Engine status                      : {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "up"}
Score                              : 2400
stopped                            : False
Local maintenance                  : False
crc32                              : f9f6e4f7
Host timestamp                     : 1284496


Trying to power off the VM and it gets stuck in Powering down state, also killing the qemu process doesn't help. I'll file a bug.


Version-Release number of selected component (if applicable):
Hypervisor:

ovirt-hosted-engine-ha-1.3.0-0.0.master.20150615153650.20150615153645.git5f8c290.el7.noarch
ovirt-hosted-engine-setup-1.3.0-0.0.master.20150723145342.gitc6bc631.el7.noarch
vdsm-xmlrpc-4.17.0-1198.git6ede99a.el7.noarch
vdsm-python-4.17.0-1198.git6ede99a.el7.noarch
vdsm-4.17.0-1198.git6ede99a.el7.noarch
vdsm-infra-4.17.0-1198.git6ede99a.el7.noarch
vdsm-jsonrpc-4.17.0-1198.git6ede99a.el7.noarch
vdsm-yajsonrpc-4.17.0-1198.git6ede99a.el7.noarch
vdsm-cli-4.17.0-1198.git6ede99a.el7.noarch
libvirt-client-1.2.8-16.el7_1.3.x86_64
libvirt-daemon-driver-secret-1.2.8-16.el7_1.3.x86_64
libvirt-lock-sanlock-1.2.8-16.el7_1.3.x86_64
libvirt-daemon-driver-nwfilter-1.2.8-16.el7_1.3.x86_64
libvirt-daemon-driver-interface-1.2.8-16.el7_1.3.x86_64
libvirt-daemon-driver-qemu-1.2.8-16.el7_1.3.x86_64
libvirt-python-1.2.8-7.el7_1.1.x86_64
libvirt-daemon-driver-nodedev-1.2.8-16.el7_1.3.x86_64
libvirt-daemon-driver-network-1.2.8-16.el7_1.3.x86_64
libvirt-daemon-kvm-1.2.8-16.el7_1.3.x86_64
libvirt-daemon-config-nwfilter-1.2.8-16.el7_1.3.x86_64
libvirt-daemon-driver-storage-1.2.8-16.el7_1.3.x86_64
libvirt-daemon-1.2.8-16.el7_1.3.x86_64
qemu-kvm-tools-ev-2.1.2-23.el7_1.4.1.x86_64
qemu-img-ev-2.1.2-23.el7_1.4.1.x86_64
qemu-kvm-common-ev-2.1.2-23.el7_1.4.1.x86_64
ipxe-roms-qemu-20130517-6.gitc4bce43.el7.noarch
libvirt-daemon-driver-qemu-1.2.8-16.el7_1.3.x86_64
qemu-kvm-ev-2.1.2-23.el7_1.4.1.x86_64
sanlock-3.2.2-2.el7.x86_64
selinux-policy-3.13.1-23.el7_1.7.noarch


Engine:
ovirt-engine-3.6.0-0.0.master.20150627185750.git6f063c1.el6.noarch


How reproducible:
Always

Steps to Reproduce:
1. Deploy hosted-engine over GlusterFS storage using repilica 3 volume. 
2. Once deployment is done, create some storage domains in the setup
3. Create a Gluster domain in the setup

Actual results:
Once clicking OK in the webadmin for the storage domain creation, the hosted-engine VM gets unreachable.

It might be that during storage domain creation, the host got disconnected from the Gluster server.

The VM is unreachable although it is reported as Up by vdsm and libvirt. Tried to do 'hosted-engine --vm-poweroff' and the it got stuck in 'Powering down', also killing the qemu process didn't help.
Therefore, for now, I can't examine the engine.log. 

Expected results:
Gluster storage domain should be created successfully.

Additional info:

sosreport: http://file.tlv.redhat.com/ebenahar/sosreport-green-vdsc.qa.lab.tlv.redhat.com-20150727100546.tar.xz

Gluster volume configuration (the same for both the volumes - the one used for the hosted engine vm image and the one for the Gluster storage domain):

[root@gluster-storage-03 ~]# gluster volume info elad1
 
Volume Name: elad1
Type: Replicate
Volume ID: 34a9bdeb-30b3-4868-921c-2c6c2cfd83b4
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.35.160.6:/gluster_volumes/elad1
Brick2: 10.35.160.202:/gluster_volumes/elad1
Brick3: 10.35.160.203:/gluster_volumes/elad1
Options Reconfigured:
server.allow-insecure: on
cluster.server-quorum-type: server
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
auth.allow: *
network.ping-timeout: 10
cluster.quorum-type: auto
storage.owner-uid: 36
storage.owner-gid: 36
performance.readdir-ahead: on

Comment 1 Elad 2015-07-29 08:37:51 UTC

Cannot reproduce, closing.

Will re-open in case I'll encounter it again.

Note You need to log in before you can comment on or make changes to this bug.