1228196 – nfs-ganesha: a subsequent delete node operation post a delete node operation fails to delete the node from nfs-ganesha cluster

Bug 1228196 - nfs-ganesha: a subsequent delete node operation post a delete node operation fails to delete the node from nfs-ganesha cluster

Summary: nfs-ganesha: a subsequent delete node operation post a delete node operation ...

Keywords:
Status:	CLOSED CANTFIX
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	gluster-nfs
Sub Component:
Version:	rhgs-3.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Kaleb KEITHLEY
QA Contact:	Saurabh
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1216951
TreeView+	depends on / blocked

Reported:	2015-06-04 11:45 UTC by Saurabh
Modified:	2016-08-19 09:16 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	Known Issue
Doc Text:	If you have less than three nodes, pacemaker shuts down HA. Workaround: To restore HA, add a third node with `ganesha-ha.sh --add $path-to-config $node $virt-ip
Clone Of:
Environment:
Last Closed:	2016-06-15 13:30:44 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
sosreport of nfs5 (10.70 MB, application/x-xz) 2015-06-04 12:32 UTC, Saurabh	no flags	Details
View All

Description Saurabh 2015-06-04 11:45:08 UTC

Description of problem:
I had a cluster of 4 glusterfs nodes and all these 4 nodes were participating in nfs-ganesha cluster.
First I deleted a node from a nfs-ganesha cluster using the script,
time /usr/libexec/ganesha/ganesha-ha.sh
and the node got deleted.
Noe, I tried to delete another node, this time the delete node operation failed.

Version-Release number of selected component (if applicable):
glusterfs-3.7.0-3.el6rhs.x86_64
nfs-ganesha-2.2.0-0.el6.x86_64

How reproducible:
Happening first time itself

Steps to Reproduce:
1. create a volume of type 6x2, start it
2. bring up nfs-ganesha, after doing all the pre requisites
3. mount the volume with vers=4
4. start some I/O 
5. delete a node
6. post completion of deletion and I/O , delete another node from cluster

Actual results:
step 5 result, deletion happens but there is a BZ 1228158
step 6 result, deletion fails,
result after step 6,
[root@nfs5 ~]# time /usr/libexec/ganesha/ganesha-ha.sh --delete /etc/ganesha/ nfs7
Removing Constraint - colocation-nfs5-cluster_ip-1-nfs5-trigger_ip-1-INFINITY
Removing Constraint - colocation-nfs5-cluster_ip-1-nfs5-trigger_ip-1-INFINITY-1
Removing Constraint - location-nfs5-cluster_ip-1
Removing Constraint - location-nfs5-cluster_ip-1-nfs6-1000
Removing Constraint - location-nfs5-cluster_ip-1-nfs7-2000
Removing Constraint - location-nfs5-cluster_ip-1-nfs5-3000
Removing Constraint - order-nfs-grace-clone-nfs5-cluster_ip-1-mandatory
Removing Constraint - order-nfs-grace-clone-nfs5-cluster_ip-1-mandatory-1
Deleting Resource - nfs5-cluster_ip-1
Removing Constraint - order-nfs5-trigger_ip-1-nfs-grace-clone-mandatory
Removing Constraint - order-nfs5-trigger_ip-1-nfs-grace-clone-mandatory-1
Deleting Resource - nfs5-trigger_ip-1
Removing Constraint - colocation-nfs6-cluster_ip-1-nfs6-trigger_ip-1-INFINITY
Removing Constraint - colocation-nfs6-cluster_ip-1-nfs6-trigger_ip-1-INFINITY-1
Removing Constraint - location-nfs6-cluster_ip-1
Removing Constraint - location-nfs6-cluster_ip-1-nfs7-1000
Removing Constraint - location-nfs6-cluster_ip-1-nfs5-2000
Removing Constraint - location-nfs6-cluster_ip-1-nfs6-3000
Removing Constraint - order-nfs-grace-clone-nfs6-cluster_ip-1-mandatory
Removing Constraint - order-nfs-grace-clone-nfs6-cluster_ip-1-mandatory-1
Deleting Resource - nfs6-cluster_ip-1
Removing Constraint - order-nfs6-trigger_ip-1-nfs-grace-clone-mandatory
Removing Constraint - order-nfs6-trigger_ip-1-nfs-grace-clone-mandatory-1
Deleting Resource - nfs6-trigger_ip-1
Removing Constraint - colocation-nfs7-cluster_ip-1-nfs7-trigger_ip-1-INFINITY
Removing Constraint - colocation-nfs7-cluster_ip-1-nfs7-trigger_ip-1-INFINITY-1
Removing Constraint - location-nfs7-cluster_ip-1
Removing Constraint - location-nfs7-cluster_ip-1-nfs5-1000
Removing Constraint - location-nfs7-cluster_ip-1-nfs6-2000
Removing Constraint - location-nfs7-cluster_ip-1-nfs7-3000
Removing Constraint - order-nfs-grace-clone-nfs7-cluster_ip-1-mandatory
Removing Constraint - order-nfs-grace-clone-nfs7-cluster_ip-1-mandatory-1
Deleting Resource - nfs7-cluster_ip-1
Removing Constraint - order-nfs7-trigger_ip-1-nfs-grace-clone-mandatory
Removing Constraint - order-nfs7-trigger_ip-1-nfs-grace-clone-mandatory-1
Deleting Resource - nfs7-trigger_ip-1
Adding nfs5-trigger_ip-1 nfs-grace-clone (kind: Mandatory) (Options: first-action=start then-action=start)
Adding nfs-grace-clone nfs5-cluster_ip-1 (kind: Mandatory) (Options: first-action=start then-action=start)
Adding nfs6-trigger_ip-1 nfs-grace-clone (kind: Mandatory) (Options: first-action=start then-action=start)
Adding nfs-grace-clone nfs6-cluster_ip-1 (kind: Mandatory) (Options: first-action=start then-action=start)
Adding nfs7-trigger_ip-1 nfs-grace-clone (kind: Mandatory) (Options: first-action=start then-action=start)
Adding nfs-grace-clone nfs7-cluster_ip-1 (kind: Mandatory) (Options: first-action=start then-action=start)
Error: unable to create resource/fence device 'nfs5-cluster_ip-1', 'nfs5-cluster_ip-1' already exists on this system
Error: unable to create resource/fence device 'nfs5-trigger_ip-1', 'nfs5-trigger_ip-1' already exists on this system
Adding nfs5-trigger_ip-1 nfs-grace-clone (kind: Mandatory) (Options: first-action=start then-action=start)
Adding nfs-grace-clone nfs5-cluster_ip-1 (kind: Mandatory) (Options: first-action=start then-action=start)
Error: unable to create resource/fence device 'nfs6-cluster_ip-1', 'nfs6-cluster_ip-1' already exists on this system
Error: unable to create resource/fence device 'nfs6-trigger_ip-1', 'nfs6-trigger_ip-1' already exists on this system
Adding nfs6-trigger_ip-1 nfs-grace-clone (kind: Mandatory) (Options: first-action=start then-action=start)
Adding nfs-grace-clone nfs6-cluster_ip-1 (kind: Mandatory) (Options: first-action=start then-action=start)
CIB updated
CIB updated
Removing Constraint - location-nfs_stop-nfs7-nfs7-INFINITY
Attempting to stop: nfs_stop-nfs7...Stopped
Deleting Resource - nfs_stop-nfs7
Error: Unable to open cluster.conf file to get nodes list
/usr/libexec/ganesha/ganesha-ha.sh: line 828: manage-service: command not found

real	0m57.981s
user	0m14.707s
sys	0m5.633s
[root@nfs5 ~]# 


[root@nfs5 ~]# 
[root@nfs5 ~]# 
[root@nfs5 ~]# pcs status
Cluster name: 
Last updated: Thu Jun  4 21:45:29 2015
Last change: Thu Jun  4 21:20:06 2015
Stack: cman
Current DC: nfs6 - partition with quorum
Version: 1.1.11-97629de
3 Nodes configured
14 Resources configured


Online: [ nfs5 nfs6 nfs7 ]

Full list of resources:

 Clone Set: nfs-mon-clone [nfs-mon]
     Started: [ nfs5 nfs6 nfs7 ]
 Clone Set: nfs-grace-clone [nfs-grace]
     Started: [ nfs5 nfs6 nfs7 ]
 nfs8-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs5 
 nfs8-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs5 
 nfs5-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs5 
 nfs5-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs5 
 nfs6-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs6 
 nfs6-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs6 
 nfs7-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs7 
 nfs7-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs7 

Failed actions:
    nfs-mon_monitor_10000 on nfs5 'unknown error' (1): call=16, status=Timed Out, last-rc-change='Thu Jun  4 19:55:40 2015', queued=0ms, exec=0ms


[root@nfs5 ~]# for i in 5 6 7 8 ; do ssh nfs$i "hostname"; ssh nfs$i "ps -eaf | grep ganesha"; echo "---"; done
nfs5
root     21255 22181  4 21:46 pts/0    00:00:00 ssh nfs5 ps -eaf | grep ganesha
root     21262 21257  2 21:46 ?        00:00:00 bash -c ps -eaf | grep ganesha
root     21272 21262  0 21:46 ?        00:00:00 grep ganesha
root     24551     1  9 19:25 ?        00:13:40 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p /var/run/ganesha.nfsd.pid
---
nfs6
root     24827     1  0 19:25 ?        00:00:03 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p /var/run/ganesha.nfsd.pid
root     26445 26440  3 21:46 ?        00:00:00 bash -c ps -eaf | grep ganesha
root     26455 26445  0 21:46 ?        00:00:00 grep ganesha
---
nfs7
root      1819  1814  2 21:46 ?        00:00:00 bash -c ps -eaf | grep ganesha
root      1829  1819  0 21:46 ?        00:00:00 grep ganesha
root     32583     1  0 19:25 ?        00:00:03 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p /var/run/ganesha.nfsd.pid
---
nfs8
root      3127     1  0 19:25 ?        00:00:03 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p /var/run/ganesha.nfsd.pid
root     10731 10726  3 21:46 ?        00:00:00 bash -c ps -eaf | grep ganesha
root     10741 10731  0 21:46 ?        00:00:00 grep ganesha

Expected results:
subseqeunt deletion should fail

Additional info:
pcs status post deletion of the first trial of a node,

root@nfs5 ~]# pcs status
Cluster name: 
Last updated: Thu Jun  4 19:48:07 2015
Last change: Thu Jun  4 19:39:12 2015
Stack: cman
Current DC: nfs6 - partition with quorum
Version: 1.1.11-97629de
3 Nodes configured
14 Resources configured


Online: [ nfs5 nfs6 nfs7 ]

Full list of resources:

 Clone Set: nfs-mon-clone [nfs-mon]
     Started: [ nfs5 nfs6 nfs7 ]
 Clone Set: nfs-grace-clone [nfs-grace]
     Started: [ nfs5 nfs6 nfs7 ]
 nfs5-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs5 
 nfs5-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs5 
 nfs6-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs6 
 nfs6-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs6 
 nfs7-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs7 
 nfs7-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs7 
 nfs8-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs5 
 nfs8-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs5

Comment 2 Saurabh 2015-06-04 12:32:55 UTC

Created attachment 1034706 [details]
sosreport of nfs5

Comment 4 Kaleb KEITHLEY 2015-07-15 16:52:36 UTC

I just tried using the latest build (3.7.1-10.el6rhs) and was able to delete two nodes from a four node cluster. I was then able to add them both back, and delete a node again.

But--- If you start with a four node cluster and delete two nodes, you will no longer have quorum and pacemaker will shut down HA.

Comment 5 monti lawrence 2015-07-22 17:28:02 UTC

Doc text is edited. Please sign off to be included in Known Issues.

Comment 6 Soumya Koduri 2015-07-27 09:12:32 UTC

Doc text looks good to me.

Comment 8 Kaleb KEITHLEY 2016-06-15 13:30:44 UTC

pacemaker (quorum) requires at least two notes

Note You need to log in before you can comment on or make changes to this bug.