Red Hat Bugzilla – Bug 1313593
[HC] Ovirt does not detect gluser peer down
Last modified: 2016-03-16 09:10:41 EDT
Description of problem:
If you will reboot gluster server you get
1. Read errors in dmesg
2. Many postgres zombie processes
3. Reboot hosted engine by Agent-HA
4. Runned in Pause mode hosted engine becouse glutser bad
Steps to Reproduce:
1. create replica 3 gluster with 3 different servers
2. install hosted engine on gluster
3. run and configure
4. stop one gluster server
Maintance of gluster server in auto mode in night stop service. Need unpause hosted engine by hands.
ovirt engine or agent-ha muse see down one of servers and
Minimal) suspend hosted VM and resome after gluster up
Maximal) Add second master postgess server linked to memory and do replication to secondary in glustered disk. All needed to work engine must be cloned in memory or ram dist to prevert small network issues on storage domains.
Or gluster must normail work without one peer. Without speed degrace.
Can you post the ha-agent.log, vdsm.log and gluster mount log from the node where hosted engine was running and then went to paused state?
Sorry. Can't. I Remove all gluster and go to NFS
Kasturi, can you check if you see this behaviour in your setup?
Else we can close this as insufficient data
sahina, i put down gluster server in one of my node in the cluster. As reported UI does not indicate that gluster service is down on that node. Looks like there is already a bug file https://bugzilla.redhat.com/show_bug.cgi?id=1262046.
But i did not see any pause in any of my app vms , hosted engine is up and running fine.
Closing this as could not reproduce it, and user could not provide logs.