Bug 1313593 - [HC] Ovirt does not detect gluser peer down
[HC] Ovirt does not detect gluser peer down
Product: ovirt-engine
Classification: oVirt
Component: General (Show other bugs)
x86_64 Linux
unspecified Severity high (vote)
: ---
: ---
Assigned To: bugs@ovirt.org
Depends On:
  Show dependency treegraph
Reported: 2016-03-01 20:17 EST by Badalyan Vyacheslav
Modified: 2016-03-16 09:10 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2016-03-16 09:10:41 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: Gluster
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?

Attachments (Terms of Use)

  None (edit)
Description Badalyan Vyacheslav 2016-03-01 20:17:55 EST
Description of problem:

If you will reboot gluster server you get 
1. Read errors in dmesg
2. Many postgres zombie processes
3. Reboot hosted engine by Agent-HA
4. Runned in Pause mode hosted engine becouse glutser bad

How reproducible:

Steps to Reproduce:
1. create replica 3 gluster with 3 different servers
2. install hosted engine on gluster
3. run and configure
4. stop one gluster server

Actual results:
Maintance of gluster server in auto mode in night stop service. Need unpause hosted engine by hands. 

Expected results:
ovirt engine or agent-ha muse see down one of servers and 
Minimal) suspend hosted VM and resome after gluster up
Maximal) Add second master postgess server linked to memory and do replication to secondary in glustered disk. All needed to work engine must be cloned in memory or ram dist to prevert small network issues on storage domains.

Or gluster must normail work without one peer. Without speed degrace. 

Additional info:
Comment 1 Sahina Bose 2016-03-09 02:31:21 EST
Can you post the ha-agent.log, vdsm.log and gluster mount log from the node where hosted engine was running and then went to paused state?
Comment 2 Badalyan Vyacheslav 2016-03-13 12:57:25 EDT
Sorry. Can't. I Remove all gluster and go to NFS
Comment 3 Badalyan Vyacheslav 2016-03-13 12:57:31 EDT
Sorry. Can't. I Remove all gluster and go to NFS
Comment 4 Sahina Bose 2016-03-14 04:47:28 EDT
Kasturi, can you check if you see this behaviour in your setup?

Else we can close this as insufficient data
Comment 5 RamaKasturi 2016-03-16 08:33:49 EDT
sahina, i put down gluster server in one of my node in the cluster. As reported  UI does not indicate that gluster service is down on that node. Looks like there is already a bug file https://bugzilla.redhat.com/show_bug.cgi?id=1262046.

But i did not see any pause in any of my app vms , hosted engine is up and running fine.
Comment 6 Sahina Bose 2016-03-16 09:10:41 EDT
Closing this as could not reproduce it, and user could not provide logs.

Note You need to log in before you can comment on or make changes to this bug.