Description of problem: I had a cluster of 4 glusterfs nodes and all these 4 nodes were participating in nfs-ganesha cluster. First I deleted a node from a nfs-ganesha cluster using the script, time /usr/libexec/ganesha/ganesha-ha.sh and the node got deleted. Noe, I tried to delete another node, this time the delete node operation failed. Version-Release number of selected component (if applicable): glusterfs-3.7.0-3.el6rhs.x86_64 nfs-ganesha-2.2.0-0.el6.x86_64 How reproducible: Happening first time itself Steps to Reproduce: 1. create a volume of type 6x2, start it 2. bring up nfs-ganesha, after doing all the pre requisites 3. mount the volume with vers=4 4. start some I/O 5. delete a node 6. post completion of deletion and I/O , delete another node from cluster Actual results: step 5 result, deletion happens but there is a BZ 1228158 step 6 result, deletion fails, result after step 6, [root@nfs5 ~]# time /usr/libexec/ganesha/ganesha-ha.sh --delete /etc/ganesha/ nfs7 Removing Constraint - colocation-nfs5-cluster_ip-1-nfs5-trigger_ip-1-INFINITY Removing Constraint - colocation-nfs5-cluster_ip-1-nfs5-trigger_ip-1-INFINITY-1 Removing Constraint - location-nfs5-cluster_ip-1 Removing Constraint - location-nfs5-cluster_ip-1-nfs6-1000 Removing Constraint - location-nfs5-cluster_ip-1-nfs7-2000 Removing Constraint - location-nfs5-cluster_ip-1-nfs5-3000 Removing Constraint - order-nfs-grace-clone-nfs5-cluster_ip-1-mandatory Removing Constraint - order-nfs-grace-clone-nfs5-cluster_ip-1-mandatory-1 Deleting Resource - nfs5-cluster_ip-1 Removing Constraint - order-nfs5-trigger_ip-1-nfs-grace-clone-mandatory Removing Constraint - order-nfs5-trigger_ip-1-nfs-grace-clone-mandatory-1 Deleting Resource - nfs5-trigger_ip-1 Removing Constraint - colocation-nfs6-cluster_ip-1-nfs6-trigger_ip-1-INFINITY Removing Constraint - colocation-nfs6-cluster_ip-1-nfs6-trigger_ip-1-INFINITY-1 Removing Constraint - location-nfs6-cluster_ip-1 Removing Constraint - location-nfs6-cluster_ip-1-nfs7-1000 Removing Constraint - location-nfs6-cluster_ip-1-nfs5-2000 Removing Constraint - location-nfs6-cluster_ip-1-nfs6-3000 Removing Constraint - order-nfs-grace-clone-nfs6-cluster_ip-1-mandatory Removing Constraint - order-nfs-grace-clone-nfs6-cluster_ip-1-mandatory-1 Deleting Resource - nfs6-cluster_ip-1 Removing Constraint - order-nfs6-trigger_ip-1-nfs-grace-clone-mandatory Removing Constraint - order-nfs6-trigger_ip-1-nfs-grace-clone-mandatory-1 Deleting Resource - nfs6-trigger_ip-1 Removing Constraint - colocation-nfs7-cluster_ip-1-nfs7-trigger_ip-1-INFINITY Removing Constraint - colocation-nfs7-cluster_ip-1-nfs7-trigger_ip-1-INFINITY-1 Removing Constraint - location-nfs7-cluster_ip-1 Removing Constraint - location-nfs7-cluster_ip-1-nfs5-1000 Removing Constraint - location-nfs7-cluster_ip-1-nfs6-2000 Removing Constraint - location-nfs7-cluster_ip-1-nfs7-3000 Removing Constraint - order-nfs-grace-clone-nfs7-cluster_ip-1-mandatory Removing Constraint - order-nfs-grace-clone-nfs7-cluster_ip-1-mandatory-1 Deleting Resource - nfs7-cluster_ip-1 Removing Constraint - order-nfs7-trigger_ip-1-nfs-grace-clone-mandatory Removing Constraint - order-nfs7-trigger_ip-1-nfs-grace-clone-mandatory-1 Deleting Resource - nfs7-trigger_ip-1 Adding nfs5-trigger_ip-1 nfs-grace-clone (kind: Mandatory) (Options: first-action=start then-action=start) Adding nfs-grace-clone nfs5-cluster_ip-1 (kind: Mandatory) (Options: first-action=start then-action=start) Adding nfs6-trigger_ip-1 nfs-grace-clone (kind: Mandatory) (Options: first-action=start then-action=start) Adding nfs-grace-clone nfs6-cluster_ip-1 (kind: Mandatory) (Options: first-action=start then-action=start) Adding nfs7-trigger_ip-1 nfs-grace-clone (kind: Mandatory) (Options: first-action=start then-action=start) Adding nfs-grace-clone nfs7-cluster_ip-1 (kind: Mandatory) (Options: first-action=start then-action=start) Error: unable to create resource/fence device 'nfs5-cluster_ip-1', 'nfs5-cluster_ip-1' already exists on this system Error: unable to create resource/fence device 'nfs5-trigger_ip-1', 'nfs5-trigger_ip-1' already exists on this system Adding nfs5-trigger_ip-1 nfs-grace-clone (kind: Mandatory) (Options: first-action=start then-action=start) Adding nfs-grace-clone nfs5-cluster_ip-1 (kind: Mandatory) (Options: first-action=start then-action=start) Error: unable to create resource/fence device 'nfs6-cluster_ip-1', 'nfs6-cluster_ip-1' already exists on this system Error: unable to create resource/fence device 'nfs6-trigger_ip-1', 'nfs6-trigger_ip-1' already exists on this system Adding nfs6-trigger_ip-1 nfs-grace-clone (kind: Mandatory) (Options: first-action=start then-action=start) Adding nfs-grace-clone nfs6-cluster_ip-1 (kind: Mandatory) (Options: first-action=start then-action=start) CIB updated CIB updated Removing Constraint - location-nfs_stop-nfs7-nfs7-INFINITY Attempting to stop: nfs_stop-nfs7...Stopped Deleting Resource - nfs_stop-nfs7 Error: Unable to open cluster.conf file to get nodes list /usr/libexec/ganesha/ganesha-ha.sh: line 828: manage-service: command not found real 0m57.981s user 0m14.707s sys 0m5.633s [root@nfs5 ~]# [root@nfs5 ~]# [root@nfs5 ~]# [root@nfs5 ~]# pcs status Cluster name: Last updated: Thu Jun 4 21:45:29 2015 Last change: Thu Jun 4 21:20:06 2015 Stack: cman Current DC: nfs6 - partition with quorum Version: 1.1.11-97629de 3 Nodes configured 14 Resources configured Online: [ nfs5 nfs6 nfs7 ] Full list of resources: Clone Set: nfs-mon-clone [nfs-mon] Started: [ nfs5 nfs6 nfs7 ] Clone Set: nfs-grace-clone [nfs-grace] Started: [ nfs5 nfs6 nfs7 ] nfs8-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs5 nfs8-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs5 nfs5-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs5 nfs5-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs5 nfs6-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs6 nfs6-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs6 nfs7-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs7 nfs7-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs7 Failed actions: nfs-mon_monitor_10000 on nfs5 'unknown error' (1): call=16, status=Timed Out, last-rc-change='Thu Jun 4 19:55:40 2015', queued=0ms, exec=0ms [root@nfs5 ~]# for i in 5 6 7 8 ; do ssh nfs$i "hostname"; ssh nfs$i "ps -eaf | grep ganesha"; echo "---"; done nfs5 root 21255 22181 4 21:46 pts/0 00:00:00 ssh nfs5 ps -eaf | grep ganesha root 21262 21257 2 21:46 ? 00:00:00 bash -c ps -eaf | grep ganesha root 21272 21262 0 21:46 ? 00:00:00 grep ganesha root 24551 1 9 19:25 ? 00:13:40 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p /var/run/ganesha.nfsd.pid --- nfs6 root 24827 1 0 19:25 ? 00:00:03 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p /var/run/ganesha.nfsd.pid root 26445 26440 3 21:46 ? 00:00:00 bash -c ps -eaf | grep ganesha root 26455 26445 0 21:46 ? 00:00:00 grep ganesha --- nfs7 root 1819 1814 2 21:46 ? 00:00:00 bash -c ps -eaf | grep ganesha root 1829 1819 0 21:46 ? 00:00:00 grep ganesha root 32583 1 0 19:25 ? 00:00:03 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p /var/run/ganesha.nfsd.pid --- nfs8 root 3127 1 0 19:25 ? 00:00:03 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p /var/run/ganesha.nfsd.pid root 10731 10726 3 21:46 ? 00:00:00 bash -c ps -eaf | grep ganesha root 10741 10731 0 21:46 ? 00:00:00 grep ganesha Expected results: subseqeunt deletion should fail Additional info: pcs status post deletion of the first trial of a node, root@nfs5 ~]# pcs status Cluster name: Last updated: Thu Jun 4 19:48:07 2015 Last change: Thu Jun 4 19:39:12 2015 Stack: cman Current DC: nfs6 - partition with quorum Version: 1.1.11-97629de 3 Nodes configured 14 Resources configured Online: [ nfs5 nfs6 nfs7 ] Full list of resources: Clone Set: nfs-mon-clone [nfs-mon] Started: [ nfs5 nfs6 nfs7 ] Clone Set: nfs-grace-clone [nfs-grace] Started: [ nfs5 nfs6 nfs7 ] nfs5-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs5 nfs5-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs5 nfs6-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs6 nfs6-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs6 nfs7-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs7 nfs7-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs7 nfs8-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs5 nfs8-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs5
Created attachment 1034706 [details] sosreport of nfs5
I just tried using the latest build (3.7.1-10.el6rhs) and was able to delete two nodes from a four node cluster. I was then able to add them both back, and delete a node again. But--- If you start with a four node cluster and delete two nodes, you will no longer have quorum and pacemaker will shut down HA.
Doc text is edited. Please sign off to be included in Known Issues.
Doc text looks good to me.
pacemaker (quorum) requires at least two notes