Bug 1235536 - In one of the attempts of ganesha failback, failback process failed
Summary: In one of the attempts of ganesha failback, failback process failed
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: nfs-ganesha
Version: rhgs-3.1
Hardware: All
OS: Linux
high
medium
Target Milestone: ---
: ---
Assignee: Bug Updates Notification Mailing List
QA Contact: Apeksha
URL:
Whiteboard:
Depends On:
Blocks: 1251815
TreeView+ depends on / blocked
 
Reported: 2015-06-25 06:33 UTC by Apeksha
Modified: 2015-09-07 17:06 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-08-28 06:21:38 UTC
Embargoed:


Attachments (Terms of Use)
ganesha.log of the failed node (37.21 KB, text/plain)
2015-06-25 06:36 UTC, Apeksha
no flags Details

Description Apeksha 2015-06-25 06:33:27 UTC
Description of problem:
In one of the attempts of ganesha failback, failback process failed

Version-Release number of selected component (if applicable):
glusterfs-3.7.1-3.el6rhs.x86_64
nfs-ganesha-2.2.0-3.el6rhs.x86_64

How reproducible:
Seen on one of the ganesha setup

Steps to Reproduce:
1. Setup the ganesha cluster
2. Kill the ganesha process on one of the node
3. Failover happens successfully
4. Now start the nfs-ganesha process on that node
5. Failback dint happen

root@nfs2 ~]# pcs status
Cluster name: G1434073180.8
Last updated: Thu Jun 25 00:48:51 2015
Last change: Thu Jun 25 00:44:37 2015
Stack: cman
Current DC: nfs1 - partition with quorum
Version: 1.1.11-97629de
4 Nodes configured
17 Resources configured


Online: [ nfs1 nfs2 nfs3 nfs4 ]

Full list of resources:

 Clone Set: nfs-mon-clone [nfs-mon]
     nfs-mon	(ocf::heartbeat:ganesha_mon):	FAILED nfs2 
     Started: [ nfs1 nfs3 nfs4 ]
 Clone Set: nfs-grace-clone [nfs-grace]
     Started: [ nfs1 nfs2 nfs3 nfs4 ]
 nfs2-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs4 
 nfs2-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs4 
 nfs1-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs1 
 nfs1-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs1 
 nfs3-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs3 
 nfs3-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs3 
 nfs4-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs4 
 nfs4-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs4 
 nfs2-dead_ip-1	(ocf::heartbeat:Dummy):	Started nfs2 

Failed actions:
    nfs-mon_monitor_10000 on nfs2 'unknown error' (1): call=16, status=Timed Out, last-rc-change='Thu Jun 25 00:44:45 2015', queued=0ms, exec=0ms


Actual results: Failback dint happen


Expected results: Failback should happen successfully


Additional info:
/var/log/ganesha.log:

25/06/2015 00:44:30 : epoch 558b0196 : nfs2 : ganesha.nfsd-9489[main] nfs_start :NFS STARTUP :EVENT :             NFS SERVER INITIALIZED
25/06/2015 00:44:30 : epoch 558b0196 : nfs2 : ganesha.nfsd-9489[main] nfs_start :NFS STARTUP :EVENT :-------------------------------------------------
25/06/2015 00:45:30 : epoch 558b0196 : nfs2 : ganesha.nfsd-9489[reaper] nfs_in_grace :STATE :EVENT :NFS Server Now NOT IN GRACE
25/06/2015 00:45:30 : epoch 558b0196 : nfs2 : ganesha.nfsd-9489[reaper] nfs4_clean_old_recov_dir :CLIENT ID :EVENT :Failed to open old v4 recovery dir (/var/lib/nfs/ganesha/v4old), errno=2

Comment 2 Apeksha 2015-06-25 06:36:39 UTC
Created attachment 1042958 [details]
ganesha.log of the failed node

Comment 3 Apeksha 2015-06-25 06:45:47 UTC
sosreports of all 4 nodes : http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1235536/

Comment 4 Apeksha 2015-06-25 07:17:31 UTC
After couple of minutes pcs status shows stopped for all the nodes:

[root@nfs1 ~]# pcs status
Cluster name: G1434073180.8
Last updated: Thu Jun 25 02:01:48 2015
Last change: Wed Jun 24 06:17:01 2015
Stack: cman
Current DC: nfs1 - partition WITHOUT quorum
Version: 1.1.11-97629de
4 Nodes configured
17 Resources configured


Online: [ nfs1 ]
OFFLINE: [ nfs2 nfs3 nfs4 ]

Full list of resources:

 Clone Set: nfs-mon-clone [nfs-mon]
     Stopped: [ nfs1 nfs2 nfs3 nfs4 ]
 Clone Set: nfs-grace-clone [nfs-grace]
     Stopped: [ nfs1 nfs2 nfs3 nfs4 ]
 nfs2-cluster_ip-1	(ocf::heartbeat:IPaddr):	Stopped 
 nfs2-trigger_ip-1	(ocf::heartbeat:Dummy):	Stopped 
 nfs1-cluster_ip-1	(ocf::heartbeat:IPaddr):	Stopped 
 nfs1-trigger_ip-1	(ocf::heartbeat:Dummy):	Stopped 
 nfs3-cluster_ip-1	(ocf::heartbeat:IPaddr):	Stopped 
 nfs3-trigger_ip-1	(ocf::heartbeat:Dummy):	Stopped 
 nfs4-cluster_ip-1	(ocf::heartbeat:IPaddr):	Stopped 
 nfs4-trigger_ip-1	(ocf::heartbeat:Dummy):	Stopped 
 nfs2-dead_ip-1	(ocf::heartbeat:Dummy):	Stopped

Comment 5 Soumya Koduri 2015-06-25 10:03:58 UTC
'ganesha-ha.script' has been in modified state and there are couple of steps which got commented out in it. Requested Apeksha to update the RPMs and recheck the issue.

Comment 7 Meghana 2015-07-13 07:13:37 UTC
This has always been working for Saurabh and the developers. I request Apeksha to check this again and update the bug.

Comment 8 SATHEESARAN 2015-08-10 03:28:23 UTC
(In reply to Meghana from comment #7)
> This has always been working for Saurabh and the developers. I request
> Apeksha to check this again and update the bug.

Meghana,

In this case, after Apeksha updates ( about whether the issue is reproducible ), this bug should be closed as CLOSED - WORKSFORME

Its not appropriate to move the bug ON_QA.
Moving the bug ON_QA, is valid only when there was a issue, and that issue was fixed with the patch, and the patch was made available in the build ( as mentioned in FIXED-IN-VERSION field )

Hope that helps

Comment 9 Meghana 2015-08-10 05:37:11 UTC
Thanks. I'll wait for Apeksha's updates and do that accordingly.

Comment 12 Apeksha 2015-08-28 06:21:38 UTC
Dint hit this issue again, hence closing it


Note You need to log in before you can comment on or make changes to this bug.