Description of problem: I am using pacemaker to run glusterfs. After setting it up i tested it with 'crm node standby node01' but got a 'time out' from the volume agent: crmd: error: process_lrm_event: Result of stop operation for p_volume_gluster on node02: Timed Out | call=559 key=p_volume_gluster_stop_0 timeout=20000ms when checking the processes with ps -ef i still could see gluster processes running on the node. Version-Release number of selected component (if applicable): Name : glusterfs-resource-agents Arch : noarch Version : 3.12.14 Release : 1.el6 Size : 13 k Repo : installed From repo : centos-gluster312 How reproducible: configure gluster in pacemaker (2 nodes): primitive glusterd ocf:glusterfs:glusterd \ op monitor interval=10 timeout=120s \ op start timeout=120s interval=0 \ op stop timeout=120s interval=0 primitive p_volume_gluster ocf:glusterfs:volume \ params volname=gv0 \ op stop interval=0 trace_ra=1 \ op monitor interval=0 timeout=120s \ op start timeout=120s interval=0 clone cl_glusterd glusterd \ meta interleave=true clone-max=2 clone-node-max=1 target-role=Started clone cl_glustervol p_volume_gluster \ meta interleave=true clone-max=2 clone-node-max=1 run the gluster in the cluster then put a node on standby. Steps to Reproduce: 1. start gluster in pacemaker 2. put a node on standby: crm node standby node01 3. wait for the error messages Actual results: getting a time out error for the volume primitive. the processes are still running: /usr/sbin/glusterfsd Expected results: gluster should shutdown and no error should be in corosync.log Additional info: i did do debuging of the volume resource agent (/usr/lib/ocf/resource.d/glusterfs/volume) and could find 2 issues that prevented the agent to stop the processes. 1. SHORTHOSTNAME=`hostname -s` In my system only the full hostname was used. i had to change this line to: SHORTHOSTNAME=`hostname` 2. function volume_getdir() had wrong path hardcoded volume_getdir() { local voldir voldir="/etc/glusterd/vols/${OCF_RESKEY_volname}" [ -d ${voldir} ] || return 1 echo "${voldir}" return 0 } i had to change /etc/glusterd into /var/lib/glusterd: volume_getdir() { local voldir voldir="/var/lib/glusterd/vols/${OCF_RESKEY_volname}" [ -d ${voldir} ] || return 1 echo "${voldir}" return 0 } i am not sure if this is because of i am running centos 6. maybe the paths and hostnames differ on centos 7..
Release 3.12 has been EOLd and this bug was still found to be in the NEW state, hence moving the version to mainline, to triage the same and take appropriate actions.
Hi Erik, We as a community have moved towards support of server packages on only CentOS7 since almost a year now. It would be great to see if upgrading the OS helps to resolve the issue. Also would be great if you update glusterfs to higher version.
Hi Amar, i do not have RHEL or CENTO7 available. It would be great if you could attach here the current /usr/lib/ocf/resource.d/glusterfs/volume and i can check if the code did change or not.
Looks like it got fixed with https://review.gluster.org/#/c/glusterfs/+/19799/ Check latest code @ https://github.com/gluster/glusterfs/tree/master/extras/ocf
volume_getpid_dir() did change but SHORTHOSTNAME is still the same. one would have to test it on 7 maybe it works now fine.
common-ha is for the .../extras/ganesha/... stuff. named as such when we thought it would be rewritten using ctdb for Samba and Ganesha.
This bug is moved to https://github.com/gluster/glusterfs/issues/930, and will be tracked there from now on. Visit GitHub issues URL for further details