Bug 1228196 - nfs-ganesha: a subsequent delete node operation post a delete node operation fails to delete the node from nfs-ganesha cluster
Summary: nfs-ganesha: a subsequent delete node operation post a delete node operation ...
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: gluster-nfs
Version: rhgs-3.1
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
: ---
Assignee: Kaleb KEITHLEY
QA Contact: Saurabh
URL:
Whiteboard:
Depends On:
Blocks: 1216951
TreeView+ depends on / blocked
 
Reported: 2015-06-04 11:45 UTC by Saurabh
Modified: 2016-08-19 09:16 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
If you have less than three nodes, pacemaker shuts down HA. Workaround: To restore HA, add a third node with `ganesha-ha.sh --add $path-to-config $node $virt-ip
Clone Of:
Environment:
Last Closed: 2016-06-15 13:30:44 UTC
Embargoed:


Attachments (Terms of Use)
sosreport of nfs5 (10.70 MB, application/x-xz)
2015-06-04 12:32 UTC, Saurabh
no flags Details

Description Saurabh 2015-06-04 11:45:08 UTC
Description of problem:
I had a cluster of 4 glusterfs nodes and all these 4 nodes were participating in nfs-ganesha cluster.
First I deleted a node from a nfs-ganesha cluster using the script,
time /usr/libexec/ganesha/ganesha-ha.sh
and the node got deleted.
Noe, I tried to delete another node, this time the delete node operation failed.

Version-Release number of selected component (if applicable):
glusterfs-3.7.0-3.el6rhs.x86_64
nfs-ganesha-2.2.0-0.el6.x86_64

How reproducible:
Happening first time itself

Steps to Reproduce:
1. create a volume of type 6x2, start it
2. bring up nfs-ganesha, after doing all the pre requisites
3. mount the volume with vers=4
4. start some I/O 
5. delete a node
6. post completion of deletion and I/O , delete another node from cluster

Actual results:
step 5 result, deletion happens but there is a BZ 1228158
step 6 result, deletion fails,
result after step 6,
[root@nfs5 ~]# time /usr/libexec/ganesha/ganesha-ha.sh --delete /etc/ganesha/ nfs7
Removing Constraint - colocation-nfs5-cluster_ip-1-nfs5-trigger_ip-1-INFINITY
Removing Constraint - colocation-nfs5-cluster_ip-1-nfs5-trigger_ip-1-INFINITY-1
Removing Constraint - location-nfs5-cluster_ip-1
Removing Constraint - location-nfs5-cluster_ip-1-nfs6-1000
Removing Constraint - location-nfs5-cluster_ip-1-nfs7-2000
Removing Constraint - location-nfs5-cluster_ip-1-nfs5-3000
Removing Constraint - order-nfs-grace-clone-nfs5-cluster_ip-1-mandatory
Removing Constraint - order-nfs-grace-clone-nfs5-cluster_ip-1-mandatory-1
Deleting Resource - nfs5-cluster_ip-1
Removing Constraint - order-nfs5-trigger_ip-1-nfs-grace-clone-mandatory
Removing Constraint - order-nfs5-trigger_ip-1-nfs-grace-clone-mandatory-1
Deleting Resource - nfs5-trigger_ip-1
Removing Constraint - colocation-nfs6-cluster_ip-1-nfs6-trigger_ip-1-INFINITY
Removing Constraint - colocation-nfs6-cluster_ip-1-nfs6-trigger_ip-1-INFINITY-1
Removing Constraint - location-nfs6-cluster_ip-1
Removing Constraint - location-nfs6-cluster_ip-1-nfs7-1000
Removing Constraint - location-nfs6-cluster_ip-1-nfs5-2000
Removing Constraint - location-nfs6-cluster_ip-1-nfs6-3000
Removing Constraint - order-nfs-grace-clone-nfs6-cluster_ip-1-mandatory
Removing Constraint - order-nfs-grace-clone-nfs6-cluster_ip-1-mandatory-1
Deleting Resource - nfs6-cluster_ip-1
Removing Constraint - order-nfs6-trigger_ip-1-nfs-grace-clone-mandatory
Removing Constraint - order-nfs6-trigger_ip-1-nfs-grace-clone-mandatory-1
Deleting Resource - nfs6-trigger_ip-1
Removing Constraint - colocation-nfs7-cluster_ip-1-nfs7-trigger_ip-1-INFINITY
Removing Constraint - colocation-nfs7-cluster_ip-1-nfs7-trigger_ip-1-INFINITY-1
Removing Constraint - location-nfs7-cluster_ip-1
Removing Constraint - location-nfs7-cluster_ip-1-nfs5-1000
Removing Constraint - location-nfs7-cluster_ip-1-nfs6-2000
Removing Constraint - location-nfs7-cluster_ip-1-nfs7-3000
Removing Constraint - order-nfs-grace-clone-nfs7-cluster_ip-1-mandatory
Removing Constraint - order-nfs-grace-clone-nfs7-cluster_ip-1-mandatory-1
Deleting Resource - nfs7-cluster_ip-1
Removing Constraint - order-nfs7-trigger_ip-1-nfs-grace-clone-mandatory
Removing Constraint - order-nfs7-trigger_ip-1-nfs-grace-clone-mandatory-1
Deleting Resource - nfs7-trigger_ip-1
Adding nfs5-trigger_ip-1 nfs-grace-clone (kind: Mandatory) (Options: first-action=start then-action=start)
Adding nfs-grace-clone nfs5-cluster_ip-1 (kind: Mandatory) (Options: first-action=start then-action=start)
Adding nfs6-trigger_ip-1 nfs-grace-clone (kind: Mandatory) (Options: first-action=start then-action=start)
Adding nfs-grace-clone nfs6-cluster_ip-1 (kind: Mandatory) (Options: first-action=start then-action=start)
Adding nfs7-trigger_ip-1 nfs-grace-clone (kind: Mandatory) (Options: first-action=start then-action=start)
Adding nfs-grace-clone nfs7-cluster_ip-1 (kind: Mandatory) (Options: first-action=start then-action=start)
Error: unable to create resource/fence device 'nfs5-cluster_ip-1', 'nfs5-cluster_ip-1' already exists on this system
Error: unable to create resource/fence device 'nfs5-trigger_ip-1', 'nfs5-trigger_ip-1' already exists on this system
Adding nfs5-trigger_ip-1 nfs-grace-clone (kind: Mandatory) (Options: first-action=start then-action=start)
Adding nfs-grace-clone nfs5-cluster_ip-1 (kind: Mandatory) (Options: first-action=start then-action=start)
Error: unable to create resource/fence device 'nfs6-cluster_ip-1', 'nfs6-cluster_ip-1' already exists on this system
Error: unable to create resource/fence device 'nfs6-trigger_ip-1', 'nfs6-trigger_ip-1' already exists on this system
Adding nfs6-trigger_ip-1 nfs-grace-clone (kind: Mandatory) (Options: first-action=start then-action=start)
Adding nfs-grace-clone nfs6-cluster_ip-1 (kind: Mandatory) (Options: first-action=start then-action=start)
CIB updated
CIB updated
Removing Constraint - location-nfs_stop-nfs7-nfs7-INFINITY
Attempting to stop: nfs_stop-nfs7...Stopped
Deleting Resource - nfs_stop-nfs7
Error: Unable to open cluster.conf file to get nodes list
/usr/libexec/ganesha/ganesha-ha.sh: line 828: manage-service: command not found

real	0m57.981s
user	0m14.707s
sys	0m5.633s
[root@nfs5 ~]# 


[root@nfs5 ~]# 
[root@nfs5 ~]# 
[root@nfs5 ~]# pcs status
Cluster name: 
Last updated: Thu Jun  4 21:45:29 2015
Last change: Thu Jun  4 21:20:06 2015
Stack: cman
Current DC: nfs6 - partition with quorum
Version: 1.1.11-97629de
3 Nodes configured
14 Resources configured


Online: [ nfs5 nfs6 nfs7 ]

Full list of resources:

 Clone Set: nfs-mon-clone [nfs-mon]
     Started: [ nfs5 nfs6 nfs7 ]
 Clone Set: nfs-grace-clone [nfs-grace]
     Started: [ nfs5 nfs6 nfs7 ]
 nfs8-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs5 
 nfs8-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs5 
 nfs5-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs5 
 nfs5-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs5 
 nfs6-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs6 
 nfs6-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs6 
 nfs7-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs7 
 nfs7-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs7 

Failed actions:
    nfs-mon_monitor_10000 on nfs5 'unknown error' (1): call=16, status=Timed Out, last-rc-change='Thu Jun  4 19:55:40 2015', queued=0ms, exec=0ms


[root@nfs5 ~]# for i in 5 6 7 8 ; do ssh nfs$i "hostname"; ssh nfs$i "ps -eaf | grep ganesha"; echo "---"; done
nfs5
root     21255 22181  4 21:46 pts/0    00:00:00 ssh nfs5 ps -eaf | grep ganesha
root     21262 21257  2 21:46 ?        00:00:00 bash -c ps -eaf | grep ganesha
root     21272 21262  0 21:46 ?        00:00:00 grep ganesha
root     24551     1  9 19:25 ?        00:13:40 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p /var/run/ganesha.nfsd.pid
---
nfs6
root     24827     1  0 19:25 ?        00:00:03 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p /var/run/ganesha.nfsd.pid
root     26445 26440  3 21:46 ?        00:00:00 bash -c ps -eaf | grep ganesha
root     26455 26445  0 21:46 ?        00:00:00 grep ganesha
---
nfs7
root      1819  1814  2 21:46 ?        00:00:00 bash -c ps -eaf | grep ganesha
root      1829  1819  0 21:46 ?        00:00:00 grep ganesha
root     32583     1  0 19:25 ?        00:00:03 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p /var/run/ganesha.nfsd.pid
---
nfs8
root      3127     1  0 19:25 ?        00:00:03 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p /var/run/ganesha.nfsd.pid
root     10731 10726  3 21:46 ?        00:00:00 bash -c ps -eaf | grep ganesha
root     10741 10731  0 21:46 ?        00:00:00 grep ganesha

Expected results:
subseqeunt deletion should fail

Additional info:
pcs status post deletion of the first trial of a node,

root@nfs5 ~]# pcs status
Cluster name: 
Last updated: Thu Jun  4 19:48:07 2015
Last change: Thu Jun  4 19:39:12 2015
Stack: cman
Current DC: nfs6 - partition with quorum
Version: 1.1.11-97629de
3 Nodes configured
14 Resources configured


Online: [ nfs5 nfs6 nfs7 ]

Full list of resources:

 Clone Set: nfs-mon-clone [nfs-mon]
     Started: [ nfs5 nfs6 nfs7 ]
 Clone Set: nfs-grace-clone [nfs-grace]
     Started: [ nfs5 nfs6 nfs7 ]
 nfs5-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs5 
 nfs5-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs5 
 nfs6-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs6 
 nfs6-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs6 
 nfs7-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs7 
 nfs7-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs7 
 nfs8-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs5 
 nfs8-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs5

Comment 2 Saurabh 2015-06-04 12:32:55 UTC
Created attachment 1034706 [details]
sosreport of nfs5

Comment 4 Kaleb KEITHLEY 2015-07-15 16:52:36 UTC
I just tried using the latest build (3.7.1-10.el6rhs) and was able to delete two nodes from a four node cluster. I was then able to add them both back, and delete a node again.

But--- If you start with a four node cluster and delete two nodes, you will no longer have quorum and pacemaker will shut down HA.

Comment 5 monti lawrence 2015-07-22 17:28:02 UTC
Doc text is edited. Please sign off to be included in Known Issues.

Comment 6 Soumya Koduri 2015-07-27 09:12:32 UTC
Doc text looks good to me.

Comment 8 Kaleb KEITHLEY 2016-06-15 13:30:44 UTC
pacemaker (quorum) requires at least two notes


Note You need to log in before you can comment on or make changes to this bug.