Bug 1247098

Summary:	[hosted-engine] [GlusterFS support] Creation of a Gluster storage domain in the hosted-engine setup causes the VM to become unreachable
Product:	[Retired] oVirt	Reporter:	Elad <ebenahar>
Component:	ovirt-engine-core	Assignee:	Ala Hino <ahino>
Status:	CLOSED WORKSFORME	QA Contact:	Elad <ebenahar>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	3.6	CC:	acanan, amureini, bugs, ecohen, gklein, lsurette, rbalakri, sabose, sbonazzo, yeylon
Target Milestone:	m1
Target Release:	3.6.0
Hardware:	x86_64
OS:	Unspecified
Whiteboard:	storage
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2015-07-29 08:37:51 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	Storage	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1175354

Description Elad 2015-07-27 10:35:02 UTC

Description of problem:
I deployed successfully hosted-engine over GlusterFS. 
I tried to create a Gluster storage domain (with a replica 3 volume) in the setup and the vm immediately became unreachable. The VM status is reported as up:


[root@green-vdsc 7]# hosted-engine --vm-status


--== Host 1 status ==--

Status up-to-date                  : True
Hostname                           : green-vdsc.qa.lab.tlv.redhat.com
Host ID                            : 1
Engine status                      : {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "up"}
Score                              : 2400
stopped                            : False
Local maintenance                  : False
crc32                              : f9f6e4f7
Host timestamp                     : 1284496


Trying to power off the VM and it gets stuck in Powering down state, also killing the qemu process doesn't help. I'll file a bug.


Version-Release number of selected component (if applicable):
Hypervisor:

ovirt-hosted-engine-ha-1.3.0-0.0.master.20150615153650.20150615153645.git5f8c290.el7.noarch
ovirt-hosted-engine-setup-1.3.0-0.0.master.20150723145342.gitc6bc631.el7.noarch
vdsm-xmlrpc-4.17.0-1198.git6ede99a.el7.noarch
vdsm-python-4.17.0-1198.git6ede99a.el7.noarch
vdsm-4.17.0-1198.git6ede99a.el7.noarch
vdsm-infra-4.17.0-1198.git6ede99a.el7.noarch
vdsm-jsonrpc-4.17.0-1198.git6ede99a.el7.noarch
vdsm-yajsonrpc-4.17.0-1198.git6ede99a.el7.noarch
vdsm-cli-4.17.0-1198.git6ede99a.el7.noarch
libvirt-client-1.2.8-16.el7_1.3.x86_64
libvirt-daemon-driver-secret-1.2.8-16.el7_1.3.x86_64
libvirt-lock-sanlock-1.2.8-16.el7_1.3.x86_64
libvirt-daemon-driver-nwfilter-1.2.8-16.el7_1.3.x86_64
libvirt-daemon-driver-interface-1.2.8-16.el7_1.3.x86_64
libvirt-daemon-driver-qemu-1.2.8-16.el7_1.3.x86_64
libvirt-python-1.2.8-7.el7_1.1.x86_64
libvirt-daemon-driver-nodedev-1.2.8-16.el7_1.3.x86_64
libvirt-daemon-driver-network-1.2.8-16.el7_1.3.x86_64
libvirt-daemon-kvm-1.2.8-16.el7_1.3.x86_64
libvirt-daemon-config-nwfilter-1.2.8-16.el7_1.3.x86_64
libvirt-daemon-driver-storage-1.2.8-16.el7_1.3.x86_64
libvirt-daemon-1.2.8-16.el7_1.3.x86_64
qemu-kvm-tools-ev-2.1.2-23.el7_1.4.1.x86_64
qemu-img-ev-2.1.2-23.el7_1.4.1.x86_64
qemu-kvm-common-ev-2.1.2-23.el7_1.4.1.x86_64
ipxe-roms-qemu-20130517-6.gitc4bce43.el7.noarch
libvirt-daemon-driver-qemu-1.2.8-16.el7_1.3.x86_64
qemu-kvm-ev-2.1.2-23.el7_1.4.1.x86_64
sanlock-3.2.2-2.el7.x86_64
selinux-policy-3.13.1-23.el7_1.7.noarch


Engine:
ovirt-engine-3.6.0-0.0.master.20150627185750.git6f063c1.el6.noarch


How reproducible:
Always

Steps to Reproduce:
1. Deploy hosted-engine over GlusterFS storage using repilica 3 volume. 
2. Once deployment is done, create some storage domains in the setup
3. Create a Gluster domain in the setup

Actual results:
Once clicking OK in the webadmin for the storage domain creation, the hosted-engine VM gets unreachable.

It might be that during storage domain creation, the host got disconnected from the Gluster server.

The VM is unreachable although it is reported as Up by vdsm and libvirt. Tried to do 'hosted-engine --vm-poweroff' and the it got stuck in 'Powering down', also killing the qemu process didn't help.
Therefore, for now, I can't examine the engine.log. 

Expected results:
Gluster storage domain should be created successfully.

Additional info:

sosreport: http://file.tlv.redhat.com/ebenahar/sosreport-green-vdsc.qa.lab.tlv.redhat.com-20150727100546.tar.xz

Gluster volume configuration (the same for both the volumes - the one used for the hosted engine vm image and the one for the Gluster storage domain):

[root@gluster-storage-03 ~]# gluster volume info elad1
 
Volume Name: elad1
Type: Replicate
Volume ID: 34a9bdeb-30b3-4868-921c-2c6c2cfd83b4
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.35.160.6:/gluster_volumes/elad1
Brick2: 10.35.160.202:/gluster_volumes/elad1
Brick3: 10.35.160.203:/gluster_volumes/elad1
Options Reconfigured:
server.allow-insecure: on
cluster.server-quorum-type: server
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
auth.allow: *
network.ping-timeout: 10
cluster.quorum-type: auto
storage.owner-uid: 36
storage.owner-gid: 36
performance.readdir-ahead: on

Comment 1 Elad 2015-07-29 08:37:51 UTC

Cannot reproduce, closing.

Will re-open in case I'll encounter it again.