Bug 1341567

Summary: After setting up ganesha on RHEL 6, nodes remains in stopped state and grace related failures observed in pcs status
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Shashank Raj <sraj>
Component: nfs-ganeshaAssignee: Kaleb KEITHLEY <kkeithle>
Status: CLOSED ERRATA QA Contact: Shashank Raj <sraj>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: rhgs-3.1CC: amukherj, jthottan, kkeithle, ndevos, rcyriac, rhinduja, sashinde, skoduri
Target Milestone: ---Keywords: Regression, TestBlocker, ZStream
Target Release: RHGS 3.1.3   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.7.9-8 Doc Type: Bug Fix
Doc Text:
When fully qualified domain names are used, the output of pcs status is expected to contain Failed Actions, for example: Failed Actions: * nfs-grace_monitor_5000 on node1.fully.qualified.domain.name.com 'unknown error' (1): call=20, status=complete, exitreason='none', last-rc-change='Wed Jun 1 12:32:32 2016', queued=0ms, exec=0ms * nfs-grace_monitor_5000 on node2.fully.qualified.domain.name.com 'unknown error' (1): call=18, status=complete, exitreason='none', last-rc-change='Wed Jun 1 12:32:42 2016', queued=0ms, exec=0ms This is "normal" as long as all the nfs-grace-clone and cluster_ip-1 resource agents are in the Started state.
Story Points: ---
Clone Of:
: 1341768 (view as bug list) Environment:
Last Closed: 2016-06-23 05:25:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1311817, 1341768, 1341770, 1341772    

Description Shashank Raj 2016-06-01 10:02:11 UTC
Description of problem:

After setting up ganesha on RHEL 6, nodes remains in stopped state and  grace related failures observed in pcs status

Version-Release number of selected component (if applicable):

glusterfs-3.7.9-7
nfs-ganesha-2.3.1-7

How reproducible:

Always

Steps to Reproduce:

1. Try to setup ganesha on RHEL 6 platform
2. Observe that after setting up, nodes of the cluster remains in stopped state, failures related to nfs-grace_start are observed in pcs status.
3. Assigned VIPs are not pingable and hence the volume can't be mounted through VIP.
4. Following error messages are seen in /var/log/messages:

Jun  1 20:54:08 dhcp43-119 pengine[22877]:  warning: Forcing nfs-grace-clone away from dhcp42-33.lab.eng.blr.redhat.com after 1000000 failures (max=1000000)
Jun  1 20:54:08 dhcp43-119 pengine[22877]:  warning: Forcing nfs-grace-clone away from dhcp42-33.lab.eng.blr.redhat.com after 1000000 failures (max=1000000)
Jun  1 20:54:08 dhcp43-119 pengine[22877]:  warning: Forcing nfs-grace-clone away from dhcp43-119.lab.eng.blr.redhat.com after 1000000 failures (max=1000000)
Jun  1 20:54:08 dhcp43-119 pengine[22877]:  warning: Forcing nfs-grace-clone away from dhcp43-119.lab.eng.blr.redhat.com after 1000000 failures (max=1000000)
Jun  1 20:54:08 dhcp43-119 pengine[22877]:   notice: Start   dhcp43-119.lab.eng.blr.redhat.com-cluster_ip-1#011(dhcp43-119.lab.eng.blr.redhat.com - blocked)
Jun  1 20:54:08 dhcp43-119 pengine[22877]:   notice: Start   dhcp42-33.lab.eng.blr.redhat.com-cluster_ip-1#011(dhcp42-33.lab.eng.blr.redhat.com - blocked)


Actual results:

After setting up ganesha on RHEL 6, nodes remains in stopped state and  grace related failures observed in pcs status

Expected results:

Setup should be successful and no failures should be observed.

Additional info:

Comment 2 Shashank Raj 2016-06-01 10:08:17 UTC
sosreports are placed at http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1341567

Comment 6 Kaleb KEITHLEY 2016-06-01 15:48:27 UTC
Seems to be a timing issue, race condition.

The ganesha_grace RA's start method is running before ganesha_mon RA's monitor method has set the resources that the grace start method queries.

Comment 7 Atin Mukherjee 2016-06-02 06:13:24 UTC
Upstream mainline patch : http://review.gluster.org/#/c/14607 posted for review.

Comment 9 Kaleb KEITHLEY 2016-06-02 10:06:56 UTC
The downstream patch is probably more relevant here.

https://code.engineering.redhat.com/gerrit/75679

And yes, I want the doc text here.

Comment 11 Shashank Raj 2016-06-04 07:12:00 UTC
Verified this bug with latest glusterfs-3.7.9-8 build and its working as expected.

On both RHEL 6 and RHEL 7 platform, after setting up ganesha, no grace related failures are seen and all the nodes shows in started state.

Based on the above observation, marking this bug as Verified.

Comment 13 Kaleb KEITHLEY 2016-06-09 09:59:43 UTC
Okay, yes, a known issue or documenting it is fine.

Comment 15 errata-xmlrpc 2016-06-23 05:25:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1240