Bug 1313593 - [HC] Ovirt does not detect gluser peer down
Summary: [HC] Ovirt does not detect gluser peer down
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: General
Version: 3.6.4
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: bugs@ovirt.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-03-02 01:17 UTC by Badalyan Vyacheslav
Modified: 2016-03-16 13:10 UTC (History)
4 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2016-03-16 13:10:41 UTC
oVirt Team: Gluster
Embargoed:
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?


Attachments (Terms of Use)

Description Badalyan Vyacheslav 2016-03-02 01:17:55 UTC
Description of problem:

If you will reboot gluster server you get 
1. Read errors in dmesg
2. Many postgres zombie processes
3. Reboot hosted engine by Agent-HA
4. Runned in Pause mode hosted engine becouse glutser bad

How reproducible:
Always

Steps to Reproduce:
1. create replica 3 gluster with 3 different servers
2. install hosted engine on gluster
3. run and configure
4. stop one gluster server

Actual results:
Maintance of gluster server in auto mode in night stop service. Need unpause hosted engine by hands. 

Expected results:
ovirt engine or agent-ha muse see down one of servers and 
Minimal) suspend hosted VM and resome after gluster up
Maximal) Add second master postgess server linked to memory and do replication to secondary in glustered disk. All needed to work engine must be cloned in memory or ram dist to prevert small network issues on storage domains.

Or gluster must normail work without one peer. Without speed degrace. 


Additional info:

Comment 1 Sahina Bose 2016-03-09 07:31:21 UTC
Can you post the ha-agent.log, vdsm.log and gluster mount log from the node where hosted engine was running and then went to paused state?

Comment 2 Badalyan Vyacheslav 2016-03-13 16:57:25 UTC
Sorry. Can't. I Remove all gluster and go to NFS

Comment 3 Badalyan Vyacheslav 2016-03-13 16:57:31 UTC
Sorry. Can't. I Remove all gluster and go to NFS

Comment 4 Sahina Bose 2016-03-14 08:47:28 UTC
Kasturi, can you check if you see this behaviour in your setup?

Else we can close this as insufficient data

Comment 5 RamaKasturi 2016-03-16 12:33:49 UTC
sahina, i put down gluster server in one of my node in the cluster. As reported  UI does not indicate that gluster service is down on that node. Looks like there is already a bug file https://bugzilla.redhat.com/show_bug.cgi?id=1262046.

But i did not see any pause in any of my app vms , hosted engine is up and running fine.

Comment 6 Sahina Bose 2016-03-16 13:10:41 UTC
Closing this as could not reproduce it, and user could not provide logs.


Note You need to log in before you can comment on or make changes to this bug.