Description of problem: I deployed successfully hosted-engine over GlusterFS. I tried to create a Gluster storage domain (with a replica 3 volume) in the setup and the vm immediately became unreachable. The VM status is reported as up: [root@green-vdsc 7]# hosted-engine --vm-status --== Host 1 status ==-- Status up-to-date : True Hostname : green-vdsc.qa.lab.tlv.redhat.com Host ID : 1 Engine status : {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "up"} Score : 2400 stopped : False Local maintenance : False crc32 : f9f6e4f7 Host timestamp : 1284496 Trying to power off the VM and it gets stuck in Powering down state, also killing the qemu process doesn't help. I'll file a bug. Version-Release number of selected component (if applicable): Hypervisor: ovirt-hosted-engine-ha-1.3.0-0.0.master.20150615153650.20150615153645.git5f8c290.el7.noarch ovirt-hosted-engine-setup-1.3.0-0.0.master.20150723145342.gitc6bc631.el7.noarch vdsm-xmlrpc-4.17.0-1198.git6ede99a.el7.noarch vdsm-python-4.17.0-1198.git6ede99a.el7.noarch vdsm-4.17.0-1198.git6ede99a.el7.noarch vdsm-infra-4.17.0-1198.git6ede99a.el7.noarch vdsm-jsonrpc-4.17.0-1198.git6ede99a.el7.noarch vdsm-yajsonrpc-4.17.0-1198.git6ede99a.el7.noarch vdsm-cli-4.17.0-1198.git6ede99a.el7.noarch libvirt-client-1.2.8-16.el7_1.3.x86_64 libvirt-daemon-driver-secret-1.2.8-16.el7_1.3.x86_64 libvirt-lock-sanlock-1.2.8-16.el7_1.3.x86_64 libvirt-daemon-driver-nwfilter-1.2.8-16.el7_1.3.x86_64 libvirt-daemon-driver-interface-1.2.8-16.el7_1.3.x86_64 libvirt-daemon-driver-qemu-1.2.8-16.el7_1.3.x86_64 libvirt-python-1.2.8-7.el7_1.1.x86_64 libvirt-daemon-driver-nodedev-1.2.8-16.el7_1.3.x86_64 libvirt-daemon-driver-network-1.2.8-16.el7_1.3.x86_64 libvirt-daemon-kvm-1.2.8-16.el7_1.3.x86_64 libvirt-daemon-config-nwfilter-1.2.8-16.el7_1.3.x86_64 libvirt-daemon-driver-storage-1.2.8-16.el7_1.3.x86_64 libvirt-daemon-1.2.8-16.el7_1.3.x86_64 qemu-kvm-tools-ev-2.1.2-23.el7_1.4.1.x86_64 qemu-img-ev-2.1.2-23.el7_1.4.1.x86_64 qemu-kvm-common-ev-2.1.2-23.el7_1.4.1.x86_64 ipxe-roms-qemu-20130517-6.gitc4bce43.el7.noarch libvirt-daemon-driver-qemu-1.2.8-16.el7_1.3.x86_64 qemu-kvm-ev-2.1.2-23.el7_1.4.1.x86_64 sanlock-3.2.2-2.el7.x86_64 selinux-policy-3.13.1-23.el7_1.7.noarch Engine: ovirt-engine-3.6.0-0.0.master.20150627185750.git6f063c1.el6.noarch How reproducible: Always Steps to Reproduce: 1. Deploy hosted-engine over GlusterFS storage using repilica 3 volume. 2. Once deployment is done, create some storage domains in the setup 3. Create a Gluster domain in the setup Actual results: Once clicking OK in the webadmin for the storage domain creation, the hosted-engine VM gets unreachable. It might be that during storage domain creation, the host got disconnected from the Gluster server. The VM is unreachable although it is reported as Up by vdsm and libvirt. Tried to do 'hosted-engine --vm-poweroff' and the it got stuck in 'Powering down', also killing the qemu process didn't help. Therefore, for now, I can't examine the engine.log. Expected results: Gluster storage domain should be created successfully. Additional info: sosreport: http://file.tlv.redhat.com/ebenahar/sosreport-green-vdsc.qa.lab.tlv.redhat.com-20150727100546.tar.xz Gluster volume configuration (the same for both the volumes - the one used for the hosted engine vm image and the one for the Gluster storage domain): [root@gluster-storage-03 ~]# gluster volume info elad1 Volume Name: elad1 Type: Replicate Volume ID: 34a9bdeb-30b3-4868-921c-2c6c2cfd83b4 Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 10.35.160.6:/gluster_volumes/elad1 Brick2: 10.35.160.202:/gluster_volumes/elad1 Brick3: 10.35.160.203:/gluster_volumes/elad1 Options Reconfigured: server.allow-insecure: on cluster.server-quorum-type: server network.remote-dio: enable cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off auth.allow: * network.ping-timeout: 10 cluster.quorum-type: auto storage.owner-uid: 36 storage.owner-gid: 36 performance.readdir-ahead: on
Cannot reproduce, closing. Will re-open in case I'll encounter it again.