Bug 1298693 - [RFE] - Single point of failure on entry point server deploying hosted-engine over gluster FS
[RFE] - Single point of failure on entry point server deploying hosted-engine...
Status: CLOSED CURRENTRELEASE
Product: ovirt-hosted-engine-setup
Classification: oVirt
Component: Plugins.Gluster (Show other bugs)
1.3.2.2
Unspecified Unspecified
medium Severity medium (vote)
: ovirt-3.6.7
: 1.3.7.0
Assigned To: Simone Tiraboschi
RamaKasturi
: FutureFeature, ZStream
: 1319631 (view as bug list)
Depends On: 1327516
Blocks: Gluster-HC-1 Gluster-HC-2 1299427 1319657 1339213 1339216
  Show dependency treegraph
 
Reported: 2016-01-14 13:12 EST by Simone Tiraboschi
Modified: 2016-07-04 08:29 EDT (History)
18 users (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
hosted-engine-setup let the user entry just a single gluster fs entry point that becomes a single point of failure. Accepting custom mount options via answer file let the user pass backup-volfile-servers options.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-07-04 08:29:48 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Gluster
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: ovirt‑3.6.z+
rule-engine: exception+
ylavi: planning_ack+
sbonazzo: devel_ack+
mavital: testing_ack+


Attachments (Terms of Use)
Relevant logs (58.07 KB, application/x-gzip)
2016-01-14 17:18 EST, Simone Tiraboschi
no flags Details
logs-onehostdown (1.57 MB, application/x-gzip)
2016-04-20 03:36 EDT, Sahina Bose
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 57760 master MERGED storage: add the capability to optionally pass mount options 2016-05-20 10:28 EDT
oVirt gerrit 57765 master MERGED storage: pass mount options via answerfile 2016-05-23 09:20 EDT
oVirt gerrit 57786 ovirt-hosted-engine-setup-1.3 MERGED storage: pass mount options via answerfile 2016-05-23 09:20 EDT
oVirt gerrit 57787 ovirt-hosted-engine-ha-1.3 MERGED storage: add the capability to optionally pass mount options 2016-05-20 10:29 EDT

  None (edit)
Description Simone Tiraboschi 2016-01-14 13:12:25 EST
Description of problem:

I deployed hosted-engine on a replica 3 gluster volume (hyperconverged by the way but this is not relevant).

When hosted-engine setup asked:
Please specify the full shared storage connection path to use (example: host:/path):
I simply entered:
c72het20160114g1.localdomain:/engine

And I successfully deployed my three hosts (c72het20160114g1.localdomain c72het20160114g2.localdomain c72het20160114g3.localdomain)

Under /etc/ovirt-hosted-engine/hosted-engine.conf I found just
storage=c72het20160114g1.localdomain:/engine

Than I bring down all the tree hosts and I started just c72het20160114g2 and c72het20160114g3 so that the entry point is down.

ovirt-ha-agent is not able to restart the engine VM since it cannot connect the gluster storage server.

[root@c72het20160114g3 logs]# systemctl status ovirt-ha-agent
● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent
   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled)
   Active: active (running) since gio 2016-01-14 18:54:58 CET; 59s ago
 Main PID: 7192 (ovirt-ha-agent)
   CGroup: /system.slice/ovirt-ha-agent.service
           └─7192 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon

gen 14 18:55:48 c72het20160114g3.localdomain ovirt-ha-agent[7192]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Error: 'Connection to storage server failed' - trying to restart agent
gen 14 18:55:48 c72het20160114g3.localdomain ovirt-ha-agent[7192]: ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Error: 'Connection to storage server failed' - trying to restart agent
gen 14 18:55:53 c72het20160114g3.localdomain ovirt-ha-agent[7192]: WARNING:ovirt_hosted_engine_ha.agent.agent.Agent:Restarting agent, attempt '7'
gen 14 18:55:53 c72het20160114g3.localdomain ovirt-ha-agent[7192]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Found certificate common name: c72het20160114g3.localdomain
gen 14 18:55:53 c72het20160114g3.localdomain ovirt-ha-agent[7192]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Initializing VDSM
gen 14 18:55:53 c72het20160114g3.localdomain ovirt-ha-agent[7192]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Connecting the storage
gen 14 18:55:53 c72het20160114g3.localdomain ovirt-ha-agent[7192]: INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting storage server
gen 14 18:55:53 c72het20160114g3.localdomain ovirt-ha-agent[7192]: INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting storage server
gen 14 18:55:54 c72het20160114g3.localdomain ovirt-ha-agent[7192]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Error: 'Connection to storage server failed' - trying to restart agent
gen 14 18:55:54 c72het20160114g3.localdomain ovirt-ha-agent[7192]: ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Error: 'Connection to storage server failed' - trying to restart agent
[root@c72het20160114g3 logs]# gluster volume info engine

Here the volume status:
[root@c72het20160114g3 logs]# gluster volume info engine
 
Volume Name: engine
Type: Replicate
Volume ID: 61b87493-67a5-4761-b9c8-cea49d4bc7fe
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: c72het20160114g1:/gluster/engine/brick1
Brick2: c72het20160114g2:/gluster/engine/brick1
Brick3: c72het20160114g3:/gluster/engine/brick1
Options Reconfigured:
performance.readdir-ahead: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
storage.owner-uid: 36
storage.owner-gid: 36
features.shard: on
features.shard-block-size: 512MB
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
[root@c72het20160114g3 logs]# gluster volume status engine
Status of volume: engine
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick c72het20160114g2:/gluster/engine/bric
k1                                          49152     0          Y       7348 
Brick c72het20160114g3:/gluster/engine/bric
k1                                          49152     0          Y       7429 
NFS Server on localhost                     2049      0          Y       7444 
Self-heal Daemon on localhost               N/A       N/A        Y       7449 
NFS Server on c72het20160114g2              N/A       N/A        N       N/A  
Self-heal Daemon on c72het20160114g2        N/A       N/A        Y       24663
 
Task Status of Volume engine
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@c72het20160114g3 logs]# ping c72het20160114g1
PING c72het20160114g1.localdomain (192.168.1.129) 56(84) bytes of data.
From c72het20160114g3.localdomain (192.168.1.109) icmp_seq=1 Destination Host Unreachable
From c72het20160114g3.localdomain (192.168.1.109) icmp_seq=2 Destination Host Unreachable
^C


Version-Release number of selected component (if applicable):
ovirt-hosted-engine-ha.noarch       1.3.3.7-1.20160112142209.gitce165da.el7
ovirt-hosted-engine-setup.noarch    1.3.2.3-0.0.master.20160112153639.git90b8fde.el7.centos

How reproducible:
100%

Steps to Reproduce:
1. deploy ovirt-hosted-engine setup over a replica-3 gluster volume
2. bring down the server that has been used as gluster entry point
3. restart all the hosted-engine hosts (excluding the one that was used as gluster entry point if hyperconverged)

Actual results:
ovirt-ha-agent simply tries to connect the host that was used as entry point at the setup stage. If just this is down connectStorageServer fails and ovirt-ha-agent is not able to start the engine VM.

Expected results:
No single point of failure

Additional info:
Attaching all the relevant logs.
Comment 1 Simone Tiraboschi 2016-01-14 17:18 EST
Created attachment 1114966 [details]
Relevant logs
Comment 2 Simone Tiraboschi 2016-01-14 17:24:30 EST
Manually hanging the value (from: storage=c72het20160114g1.localdomain:/engine) of
 storage
under /etc/ovirt-hosted-engine/hosted-engine.conf
on one (c72het20160114g3 in my case) of the two live hosts directly pointing to the other live host (to: storage=c72het20160114g2.localdomain:/engine) and restarting all the involved services was enough to restart the engine VM on that host that now can connect again the gluster volume.
Comment 3 Simone Tiraboschi 2016-03-29 04:50:55 EDT
*** Bug 1319631 has been marked as a duplicate of this bug. ***
Comment 4 Darryl Bond 2016-04-06 00:18:35 EDT
Also, if the initial gluster storage host is rebooted later, the engine vm (running on another host) will crash.

VDSM needs to be able to pass mount parameters for gluster to add the backup servers and number of retries to the mount options.
Comment 5 Darryl Bond 2016-04-06 23:53:33 EDT
More testing:
3 Hosted engine hosts: h1, h2, h3
Hosted engine initially installed on h1, engine on gluster filesystem set up between h1,h2,h3 (hyperconverged)

Initial setup used a special hostname (hosted-storage) for the gluster hostname rather than h1,2 or 3. This hostname was set up in /etc/hosts as an alias to their own IP address. The rationale was that the hosts would use there local disks as it was a replica 3 gluster. Additional hosts (2 & 3) were added to the hosted engine cluster.

Hosted engine running on h2 or h3: when h1 is rebooted, the engine VM crashes. hosted-engine --vm-status on the host that was running the hosted engine displays the following error:
ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request failed: failed to read metadata: [Errno 2] No such file or directory: '/rhev/data-center/mnt/glusterSD/hosted-storage:_hosted-engine/f0d9de6c-00ef-4b99-b708-a734212c827b/ha_agent/hosted-engine.metadata'

On checking, this a sym link to /var/run/vdsm/storage/f0d9de6c-00ef-4b99-b708-a734212c827b/6cc38947-db4b-4599-83b8-d03256de8c29/aec2f58d-99b9-4c19-9e3d-0854be313d40 which no longer exists. The contents of /var/run/vdsm/storage/f0d9de6c-00ef-4b99-b708-a734212c827b/6cc38947 only come back after h1 has completed rebooting.

I set up keepalived to maintain an IP address for the hosted storage name hosted-storage across all 3 gluster servers and rebooted all the hosted engine hosts to make sure they were using it.

Rebooting h1 still crashes the engine even though the VRRP ip address has moved to another host. The symtoms are the same as above.

Observation: hosted engine on gluster as a SPOF on the initial host as Simone observed and 4.1.0 for the target milestone seems a long way off. 
Can I convert my hosted engine to NFS on my existing gluster servers ie, use the gluster NFS server instead of the fuse gluster and use the VRRP ip address without rebuilding my hosted engine? Or will it have the same SPOF.
Comment 6 Sahina Bose 2016-04-13 10:43:20 EDT
This issue does not seem related to the single point of failure. Tested this on a 3 node setup with each node mounting the volume hosting HE as localhost:/engine. Since all nodes have glusterd running and belong to same cluster, with any one node down - mount should continue to work.
But HE VM is restarted once a node is powered off.

broker.log :
Thread-4602::ERROR::2016-04-13 18:50:28,249::listener::192::ovirt_hosted_engine_ha.broker.list
ener.ConnectionHandler::(handle) Error handling request, data: 'set-storage-domain FilesystemB
ackend dom_type=glusterfs sd_uuid=7fe3707b-2435-4e71-b831-4daba08cc72c'
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166,
 in handle
    data)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 299,
 in _dispatch
    .set_storage_domain(client, sd_type, **options)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", lin
e 66, in set_storage_domain
    self._backends[client].connect()
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line
 456, in connect
    self._dom_type)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line
 108, in get_domain_path
    " in {1}".format(sd_uuid, parent))
BackendFailureException: path to storage domain 7fe3707b-2435-4e71-b831-4daba08cc72c not found
 in /rhev/data-center/mnt/glusterSD

agent.log
MainThread::INFO::2016-04-13 18:50:26,020::storage_server::207::ovirt_hosted_engine_ha.lib.sto
rage_server.StorageServer::(connect_storage_server) Connecting storage server
MainThread::INFO::2016-04-13 18:50:28,054::hosted_engine::807::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor) Stopped VDSM domain monitor for 7fe3707b-2435-4e71-b831-4daba08cc72c
MainThread::INFO::2016-04-13 18:50:28,055::image::184::ovirt_hosted_engine_ha.lib.image.Image::(teardown_images) Teardown images
MainThread::WARNING::2016-04-13 18:50:28,177::hosted_engine::675::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Disconnecting the storage
MainThread::INFO::2016-04-13 18:50:28,177::storage_server::229::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(disconnect_storage_server) Disconnecting storage server



The gluster mount logs for this time frame contain unmount messages
[2016-04-13 13:20:28.199429] I [fuse-bridge.c:4997:fuse_thread_proc] 0-fuse: unmounting /rhev/
data-center/mnt/glusterSD/localhost:_engine
[2016-04-13 13:20:28.199934] W [glusterfsd.c:1251:cleanup_and_exit] (-->/lib64/libpthread.so.0
(+0x7dc5) [0x7ff9b3ceddc5] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x7ff9b53588b5] -
->/usr/sbin/glusterfs(cleanup_and_exit+0x69) [0x7ff9b5358739] ) 0-: received signum (15), shut
ting down
[2016-04-13 13:20:28.199970] I [fuse-bridge.c:5704:fini] 0-fuse: Unmounting '/rhev/data-center
/mnt/glusterSD/localhost:_engine'.
Comment 7 Simone Tiraboschi 2016-04-13 11:42:31 EDT
Sahina,
can you please attach VDSM logs for the relevant time frame?

can you please try to reproduce with ovirt-ha-agent in debug mode?
It's enough to change from 

[logger_root]
level=INFO

to

[logger_root]
level=DEBUG

in /etc/ovirt-hosted-engine-ha/agent-log.conf and restart the agent
Comment 8 Sahina Bose 2016-04-20 03:36 EDT
Created attachment 1148882 [details]
logs-onehostdown
Comment 9 Sahina Bose 2016-04-20 03:43:31 EDT
Time frame test was conducted
[root@rhsdev14 ~]# date 
Wed Apr 20 10:02:39 IST 2016

Had 3 nodes - rhsdev9, rhsdev13, rhsdev14. HE was running on rhsdev14, engine volume mounted as localhost:/engine on all 3 nodes. 
Following errors occurred after rhsdev9 was shutdown:

1. hosted-engine --vm-status throws traceback errors:
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 176, in set_storage_domain
    .format(sd_type, options, e))
ovirt_hosted_engine_ha.lib.exceptions.RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'glusterfs', 'sd_uuid': '7fe3707b-2435-4e71-b831-4daba08cc72c'}: Connection timed out
--requery:
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 260, in _checked_communicate
    .format(message or response))
ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request failed: failed to read metadata: [Errno 2] No such file or directory: '/rhev/data-center/mnt/glusterSD/localhost:_engine/7fe3707b-2435-4e71-b831-4daba08cc72c/ha_agent/hosted-engine.metadata'

2. All storage domains (he, master data, export domains) move to Down state. (rhsdev9 has 2 IPs and the 10.70.42.105 was used to mount volume with backup-volfile-servers option)
Comment 10 Sahina Bose 2016-04-20 07:01:12 EDT
Please note that the second point reg. storage domain going to UNKNOWN state was due to the SPM node being shut down. This error can be ignored in the context of this bug.
Comment 11 Yaniv Lavi 2016-04-27 03:14:31 EDT
This is not a priority for virt right now and we have other RFEs on this on ISCSI as well that were pushed out due to capacity. We can assist in doing this for the project, but we don't have the needed resources to do this task ourselves for this time line. Moving back to Gluster.
Comment 12 Yaniv Lavi 2016-04-27 03:31:47 EDT
*** Bug 1319657 has been marked as a duplicate of this bug. ***
Comment 13 Simone Tiraboschi 2016-04-28 12:31:39 EDT
Sahina, which is the right gluster command to get the list of the available entry point?

Nir, which is the additional parameter to pass that to connectStorageServer?
Comment 14 Sahina Bose 2016-04-29 05:40:08 EDT
Simone, "gluster volume info <volname>" returns the bricks with servers. This list can be parsed to build list. If the gluster server is not local, then this command has to be executed with the --remote-host option. --remote-host option is prone to compatibility errors between client and server versions.
Alternately, if this is part of hosted-engine setup, user can enter the alternate servers as an option, which can be passed to mount option "backup-volfile-servers".
Comment 15 Simone Tiraboschi 2016-04-29 10:24:50 EDT
So, considering also UX side, do you think it would be better to:
a. try identifying three distinct address parsing the output of gluster volume info and asking the user to review them
b. directly asking (maybe with an example or an hint) the backup-volfile-servers parameter to the user
Comment 16 Sandro Bonazzola 2016-05-02 05:48:08 EDT
Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA.
Comment 17 Sahina Bose 2016-05-02 06:14:31 EDT
(In reply to Simone Tiraboschi from comment #15)
> So, considering also UX side, do you think it would be better to:
> a. try identifying three distinct address parsing the output of gluster
> volume info and asking the user to review them
> b. directly asking (maybe with an example or an hint) the
> backup-volfile-servers parameter to the user

Could we try identifying and asking user to review, and if there's an error in identifying, allowing user to enter it manually?
(there is a possibility of error in 'gluster vol info --remote-host' if gluster version in hypervisor i.e. client does not match with gluster server)
Comment 18 Luiz Goncalves 2016-05-16 01:43:39 EDT
(In reply to Simone Tiraboschi from comment #15)
> So, considering also UX side, do you think it would be better to:
> a. try identifying three distinct address parsing the output of gluster
> volume info and asking the user to review them
> b. directly asking (maybe with an example or an hint) the
> backup-volfile-servers parameter to the user

Hi Simone, I would say that option B is better, because it's simpler, faster to implement and more reliable. By adding an extra parameter on the "hosted-engine --deploy" flow, to add "Mount Options", for example: "backupvolfile-server=gluster.xyz.com,fetch-attempts=2,log-level=WARNING,log-file=/var/log/engine_domain.log" would do the job. On the UI, the existing storage ui could allow the user to manually adjust the hosted_storage domain, to edit the "Mount Options" and let the new config to take place after restarting vdsm on the servers. I would say this issue is critical, because gluster with 3 external nodes is already being used on production in many places, including me, therefore, it's a different situation than glusterfs in hyperconvergency mode and this is a nasty single point of failure. Would be great to have it prioritized on 3.6.x branch.

Thanks
-Luiz
Comment 19 Luiz Goncalves 2016-05-19 11:57:30 EDT
Thanks Sandro for addressing this issue for ovirt-3.6.7.
Comment 20 Simone Tiraboschi 2016-05-19 12:31:11 EDT
With patch https://gerrit.ovirt.org/#/c/57760/ and https://gerrit.ovirt.org/#/c/57765/ it will be possible to pass optional mount options to hosted-engine-setup via answer file adding
OVEHOSTED_STORAGE/mntOptions=str:backupvolfile-server=gluster.xyz.com,fetch-attempts=2,log-level=WARNING,log-file=/var/log/engine_domain.log (or what else is needed) to the answer file.

On existing setup it will be necessary to manually add an additional line like 
mnt_options=backupvolfile-server=gluster.xyz.com,fetch-attempts=2,log-level=WARNING,log-file=/var/log/engine_domain.log (or what else is needed)
in /etc/ovirt-hosted-engine/hosted-engine.conf on each hosted-engine host.
Comment 21 Luiz Goncalves 2016-05-19 20:51:27 EDT
Awesome Simone. Thanks for addressing this issue.

Thanks
-Luiz
Comment 22 Sahina Bose 2016-06-02 07:36:52 EDT
I've tested it works, Kasturi can you verify this bug?
Comment 23 RamaKasturi 2016-06-03 08:24:00 EDT
Hi Simone,

       I was verifying this bug and as part of the verification i did the following test.

Had three hosts dhcp35-53.lab.eng.blr.redhat.com, dhcp35-194.lab.eng.blr.redhat.com, zod.lab.eng.blr.redhat.com.

HostedEngine was running on dhcp35-194.lab.eng.blr.redhat.com and i powered off the zod.lab.eng.blr.redhat.com. When i executed hosted-engine --vm-status on the machine where HostedEngine was running it gave me the below exception.

hosted-engine --vm-status 
ovirt_hosted_engine_ha.lib.exceptions.RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'glusterfs', 'sd_uuid': 'c78bf0e8-ff81-4e98-ba76-068613d3cefe'}: Connection timed out

Hosted Engine is restarted in another node after some time. Once the node which is powered off is up i see that the storage domains are in inactive state. Can you please tell me why is this behaviour ? 

All logs from hosts and engine are present here. 

http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/HC/1298693/
Comment 24 Simone Tiraboschi 2016-06-03 09:46:45 EDT
What I can tell for sure is that now we correctly pass the mnt_options parameter to VDSM, indeed we have:

hread-520503::INFO::2016-06-03 17:01:32,690::logUtils::48::dispatcher::(wrapper) Run and protect: connectStorageServer(domType=7, spUUID='00000000-0000-0000-0000-000000000000', conList=[{'id': '2eaa9f75-5826-4e44-b3b6-007d2493335c', 'vfs_type': 'glusterfs', 'mnt_options': 'backup-volfile-servers=10.70.34.41:10.70.34.39', 'connection': '10.70.34.35:/engine', 'user': 'kvm'}], options=None)

And it seams that vdsm correctly passed the parameter to gluster since in 
/var/log/glusterfs/rhev-data-center-mnt-glusterSD-10.70.34.35\:_engine.log
from dhcp35-194.lab.eng.blr.redhat.com
we can see:

[2016-06-02 06:14:40.721017] I [MSGID: 100030] [glusterfsd.c:2338:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.9 (args: /usr/sbin/glusterfs --volfile-server=10.70.34.35 --volfile-server=10.70.34.41 --volfile-server=10.70.34.39 --volfile-id=/engine /rhev/data-center/mnt/glusterSD/10.70.34.35:_engine)

so glusterfsd knows for sure about the three volfile servers.

Now the issue is understanding who and why killed the VM.

When you shutdown the other host at about 10:40:09 we start seeing errors in the gluster logs ( var/log/glusterfs/rhev-data-center-mnt-glusterSD-10.70.34.35\:_engine.log ):

[2016-06-03 10:38:26.453395] I [glusterfsd-mgmt.c:1600:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing
[2016-06-03 10:40:09.460706] C [rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired] 11-engine-client-2: server 10.70.34.39:49153 has not responded in the last 42 seconds, disconnecting.
[2016-06-03 10:40:09.461987] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f5763145c02] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f5762f1084e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f5762f1095e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f5762f122ea] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f5762f12b18] ))))) 11-engine-client-2: forced unwinding frame type(GlusterFS 3.3) op(FINODELK(30)) called at 2016-06-03 10:38:53.412020 (xid=0x783ec)
[2016-06-03 10:40:09.462033] E [MSGID: 114031] [client-rpc-fops.c:1676:client3_3_finodelk_cbk] 11-engine-client-2: remote operation failed [Transport endpoint is not connected]
[2016-06-03 10:40:09.462438] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f5763145c02] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f5762f1084e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f5762f1095e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f5762f122ea] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f5762f12b18] ))))) 11-engine-client-2: forced unwinding frame type(GlusterFS 3.3) op(FINODELK(30)) called at 2016-06-03 10:39:03.922307 (xid=0x783f3)
[2016-06-03 10:40:09.462755] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f5763145c02] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f5762f1084e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f5762f1095e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f5762f122ea] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f5762f12b18] ))))) 11-engine-client-2: forced unwinding frame type(GlusterFS 3.3) op(FXATTROP(34)) called at 2016-06-03 10:38:53.453284 (xid=0x783ed)
[2016-06-03 10:40:09.462780] W [MSGID: 114031] [client-rpc-fops.c:1917:client3_3_fxattrop_cbk] 11-engine-client-2: remote operation failed
[2016-06-03 10:40:09.466912] I [socket.c:3434:socket_submit_request] 11-engine-client-2: not connected (priv->connected = 0)
[2016-06-03 10:40:09.466936] W [rpc-clnt.c:1586:rpc_clnt_submit] 11-engine-client-2: failed to submit rpc-request (XID: 0x783f9 Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (engine-client-2)
[2016-06-03 10:40:09.466979] W [rpc-clnt.c:1586:rpc_clnt_submit] 11-engine-client-2: failed to submit rpc-request (XID: 0x783fa Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (engine-client-2)
[2016-06-03 10:40:09.467166] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f5763145c02] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f5762f1084e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f5762f1095e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f5762f122ea] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f5762f12b18] ))))) 11-engine-client-2: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2016-06-03 10:38:54.023448 (xid=0x783ee)
[2016-06-03 10:40:09.467197] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 11-engine-client-2: remote operation failed. Path: / (00000000-0000-0000-0000-000000000001) [Transport endpoint is not connected]
[2016-06-03 10:40:09.467246] W [rpc-clnt.c:1586:rpc_clnt_submit] 11-engine-client-2: failed to submit rpc-request (XID: 0x783fb Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (engine-client-2)
[2016-06-03 10:40:09.467524] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f5763145c02] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f5762f1084e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f5762f1095e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f5762f122ea] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f5762f12b18] ))))) 11-engine-client-2: forced unwinding frame type(GlusterFS 3.3) op(FXATTROP(34)) called at 2016-06-03 10:38:54.453597 (xid=0x783ef)
[2016-06-03 10:40:09.467655] W [rpc-clnt.c:1586:rpc_clnt_submit] 11-engine-client-2: failed to submit rpc-request (XID: 0x783fc Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport (engine-client-2)
[2016-06-03 10:40:09.467681] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 11-engine-client-2: remote operation failed. Path: /c78bf0e8-ff81-4e98-ba76-068613d3cefe (e8c56470-dac8-4054-b417-b4408d4ee4cf) [Transport endpoint is not connected]
[2016-06-03 10:40:09.467723] W [rpc-clnt.c:1586:rpc_clnt_submit] 11-engine-client-2: failed to submit rpc-request (XID: 0x783fd Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (engine-client-2)
[2016-06-03 10:40:09.467930] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f5763145c02] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f5762f1084e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f5762f1095e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f5762f122ea] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f5762f12b18] ))))) 11-engine-client-2: forced unwinding frame type(GlusterFS 3.3) op(FXATTROP(34)) called at 2016-06-03 10:38:54.453659 (xid=0x783f0)
[2016-06-03 10:40:09.468042] W [rpc-clnt.c:1586:rpc_clnt_submit] 11-engine-client-2: failed to submit rpc-request (XID: 0x783fe Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (engine-client-2)
[2016-06-03 10:40:09.468234] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f5763145c02] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f5762f1084e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f5762f1095e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f5762f122ea] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f5762f12b18] ))))) 11-engine-client-2: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2016-06-03 10:38:55.103572 (xid=0x783f1)
[2016-06-03 10:40:09.468523] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f5763145c02] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f5762f1084e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f5762f1095e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f5762f122ea] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f5762f12b18] ))))) 11-engine-client-2: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2016-06-03 10:38:56.497379 (xid=0x783f2)
[2016-06-03 10:40:09.468792] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f5763145c02] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f5762f1084e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f5762f1095e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f5762f122ea] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f5762f12b18] ))))) 11-engine-client-2: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2016-06-03 10:39:04.044294 (xid=0x783f4)
[2016-06-03 10:40:09.468818] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 11-engine-client-2: remote operation failed. Path: /c78bf0e8-ff81-4e98-ba76-068613d3cefe/images/f74f8e67-75c1-47f2-8d87-7388a1ddcaa2/ae65578f-5763-431f-b340-43ca1bd1e5ae (36716bda-9ce4-4c70-92d0-efb9b3398375) [Transport endpoint is not connected]
[2016-06-03 10:40:09.469216] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f5763145c02] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f5762f1084e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f5762f1095e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f5762f122ea] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f5762f12b18] ))))) 11-engine-client-2: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2016-06-03 10:39:27.456773 (xid=0x783f5)
[2016-06-03 10:40:09.469242] W [rpc-clnt-ping.c:208:rpc_clnt_ping_cbk] 11-engine-client-2: socket disconnected
[2016-06-03 10:40:09.469437] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f5763145c02] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f5762f1084e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f5762f1095e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f5762f122ea] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f5762f12b18] ))))) 11-engine-client-2: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2016-06-03 10:39:30.280014 (xid=0x783f6)
...


And the VM went down (/var/log/libvirt/qemu/HostedEngine.log):
2016-06-03 10:40:09.914+0000: shutting down

So now the issue is why!
Comment 25 RamaKasturi 2016-06-03 09:51:51 EDT
pranith / krutika,

   Can you please help us understand what is happening in comment 24 ?
Comment 26 Luiz Goncalves 2016-06-03 23:52:15 EDT
Hi everyone, not sure if it's related, but the gluster config's must be correct in order to allow the backupvolfile-server to catch up correctly. 

gluster volume set engine network.ping-timeout 10
^ This one is pretty important to let fuse mount to stop trying to recover the primary entry point after 10s and therefore move to the backupvolfile-server. Meanwhile the gluster fuse mount will be unavailable and might be causing the issues in your case. I'm not 100% sure about it.

by the way, my gluster configs are below... not sure if it matches your configuration or are the best ones for the gluster engine domain. Anyway, Simone what do you think?

Volume Name: engine
Type: Replicate
Volume ID: 64b413d2-c42e-40fd-b356-3e6975e941b0
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: gluster1.xyz.com:/gluster/engine/brick1
Brick2: gluster2.xyz.com:/gluster/engine/brick1
Brick3: gluster-root1.xyz.com:/gluster/engine/brick1
Options Reconfigured:
network.ping-timeout: 10
performance.cache-size: 1GB
performance.write-behind-window-size: 4MB
performance.write-behind: off
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
cluster.quorum-type: auto
network.remote-dio: enable
cluster.server-quorum-type: server
cluster.data-self-heal-algorithm: full
performance.low-prio-threads: 32
features.shard-block-size: 512MB
features.shard: on
storage.owner-gid: 36
storage.owner-uid: 36
performance.readdir-ahead: on

Regards
-Luiz
Comment 27 Ravishankar N 2016-06-06 06:51:14 EDT
(In reply to RamaKasturi from comment #25)
> pranith / krutika,
> 
>    Can you please help us understand what is happening in comment 24 ?

When a (brick) node is powered off the mount does not immediately receive notification that it is disconnected. It happens only when the ping-timeout expires. Until then, the mount appears to hang because further write fops are sent to that node also (despite it being down) and we wait for the replies before sending back the response. When the timeout expires, all calls for which we haven't received any response are unwound with ENOTCONN (which is what is seen from the logs). Like Luiz said in comment #26, we can reduce the ping-timeout value (which I found wasn't the case in the engine volume in Kasturi's setup).
Comment 28 RamaKasturi 2016-06-07 02:45:23 EDT
Verified and works fine with build ovirt-hosted-engine-setup-1.3.7.0-1.el7ev.noarch.

Tested following scenarios:
==========================

1) stopped glusterd on the  first host while deploying hosted engine and i see that hosted_storage remains active and hosted-engine --vm-status does not display any exception.

2) powered off host1 where hosted engine vm is running and HE vm is restarted on Host2 after some time. While HE vm tries to restart on another node i see the following when hosted-engine --vm-status is executed and below logs in agent and broker.log.

[root@zod ~]# hosted-engine --vm-status
Traceback (most recent call last):
  File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 117, in <module>
    if not status_checker.print_status():
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 60, in print_status
    all_host_stats = ha_cli.get_all_host_stats()
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 160, in get_all_host_stats
    return self.get_all_stats(self.StatModes.HOST)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 103, in get_all_stats
    self._configure_broker_conn(broker)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 180, in _configure_broker_conn
    dom_type=dom_type)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 176, in set_storage_domain
    .format(sd_type, options, e))
ovirt_hosted_engine_ha.lib.exceptions.RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'glusterfs', 'sd_uuid': 'c78bf0e8-ff81-4e98-ba76-068613d3cefe'}: Connection timed out


Broker.log:
================================

[root@dhcp35-53 ~]# grep 'ERROR' /var/log/ovirt-hosted-engine-ha/broker.log
Thread-1::ERROR::2016-06-07 11:59:31,048::listener::192::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=glusterfs sd_uuid=c78bf0e8-ff81-4e98-ba76-068613d3cefe'
Thread-6::ERROR::2016-06-07 12:03:45,281::cpu_load_no_engine::167::cpu_load_no_engine.EngineHealth::(update_stat_file) Failed to read vm stats: [Errno 2] No such file or directory: '/proc/0/stat'
Thread-169::ERROR::2016-06-07 12:04:46,108::cpu_load_no_engine::156::cpu_load_no_engine.EngineHealth::(update_stat_file) Failed to getVmStats: 'pid'

agent.log:
===========================

[root@dhcp35-53 ~]# grep 'ERROR' /var/log/ovirt-hosted-engine-ha/agent.log
MainThread::ERROR::2016-06-07 12:04:07,285::brokerlink::279::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(_communicate) Connection closed: Connection timed out
MainThread::ERROR::2016-06-07 12:04:07,287::agent::198::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Disconnected from broker 'Connection timed out' - reinitializing


3) Powered off another node where HE vm is not running and i see that HE vm restarts on another node and that is being tracked in the bug https://bugzilla.redhat.com/show_bug.cgi?id=1319657

So moving this bug to verified state.
Comment 29 Luiz Goncalves 2016-06-13 07:42:51 EDT
Hi everyone, quick question... what happened to the vm's running on top of the gluster storage while moving to the backupvolfile-server ? The mount point will be "off" for ~10s and therefore the vm's will not be able to read/write anything during this period. The vm's stayed running normally or were moved to a paused state or something weird like this ? 

According to your test the HE catch up correctly after some time, but the vm's state are equally important to verify.

In my production setup I have two distinct gluster storages (engine and vms), but both running on top of the same replica 3 hosts, therefore, it's my situation.

Thanks
-Luiz
Comment 30 RamaKasturi 2016-06-13 12:33:59 EDT
I verified this bug on a new setup not existing one so  have not seen this issue.
Comment 31 Luiz Goncalves 2016-06-15 06:45:26 EDT
So, were you running one virtual machine (not the Hosted Engine) on top of your gluster replica 3 and this virtual machine stayed running normally after powering off the gluster main entry point?

Thanks
-Luiz
Comment 32 Sahina Bose 2016-06-15 07:04:02 EDT
Luiz - this bug is about HE storage domain using backup-volfile-servers, so the only VM running on this is the Hosted Engine, and that was verified as running.

Backup-volfile-servers for other storage domains (Master, Data) has been around for a while via the "Mount options" that can be configured via engine UI. The VM continues to run when the primary server is brought down in this case.

Once volume is mounted and all read/write operations are directly to the bricks, so the I/O is not interrupted if any one of the bricks go down in a replica 3 set.

Is there a specific issue you are facing that's prompting the query?
Comment 33 Luiz Goncalves 2016-06-15 08:07:46 EDT
I understand the issue belongs to the HE Single Point of Failure and I know the mount options are around for while for Master/Data Gluster storage domains, but just want to be sure that the unavailability period of the engine gluster storage domain will not affect the running vm's. For a gluster engine storage domain running on top of dedicated hardware , it might not affect the vm's, but what happens when you have the engine and the master/data domains on top of the same gluster replica 3 hardware ? Look this is not hyperconvergency necessarily, but it will be the case for hyperconvergency as well. I think this is going to be a fairly normal use case and it would be critical to guarantee that this setup works correctly. On my specific case, engine and master/data domain are on top of the same gluster replica 3 hardware, but on different gluster volumes.

I would appreciate having this case tested. I would be more than happy to test it myself, but the environment is in production and I can't do it as easy as you can in your testing environment.

Thanks
-Luiz
Comment 34 Sahina Bose 2016-06-16 02:54:59 EDT
If you have multiple volumes - engine and data, in your case - on the same physical infrastructure, the unavailability of engine volume will not affect the VMs running on data volume.

Note You need to log in before you can comment on or make changes to this bug.