Bug 1575557

Summary: [Ganesha] "Gluster nfs-ganesha enable" commands sometimes gives output as "failed" with "Unlocking failed" error messages ,even though cluster is up and healthy in backend
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Manisha Saini <msaini>
Component: glusterdAssignee: Sanju <srakonde>
Status: CLOSED ERRATA QA Contact: Manisha Saini <msaini>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.4CC: amukherj, dang, ffilz, grajoria, jthottan, msaini, rhs-bugs, sankarshan, sheggodu, srakonde, storage-qa-internal, vbellur, vdas
Target Milestone: ---   
Target Release: RHGS 3.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.12.2-13 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1577731 (view as bug list) Environment:
Last Closed: 2018-09-04 06:48:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1503137, 1577731    

Description Manisha Saini 2018-05-07 09:58:30 UTC
Description of problem:

While enabling ganesha on 8 nodes,sometimes "gluster nfs-gamesha enable" commands gives output as failed even when ganesha cluster is up and running in backend.In glusterd.log "unlocking failed" messages are observed

This issue is intermittent but have encountered this 3-4 times. 

# time gluster nfs-ganesha enable
Enabling NFS-Ganesha requires Gluster-NFS to be disabled across the trusted pool. Do you still want to continue?
 (y/n) y
This will take a few minutes to complete. Please wait ..
nfs-ganesha: failed

real	3m56.254s
user	0m0.105s
sys	0m0.180s


Glusterd logs
----------------
[2018-05-07 09:42:50.357894] I [MSGID: 106474] [glusterd-ganesha.c:433:check_host_list] 0-management: ganesha host found Hostname is dhcp47-193.lab.eng.blr.redhat.com
[2018-05-07 09:45:20.231419] I [glusterd-locks.c:730:gd_mgmt_v3_unlock_timer_cbk] 0-management: In gd_mgmt_v3_unlock_timer_cbk
[2018-05-07 09:46:14.626849] E [MSGID: 106116] [glusterd-mgmt.c:124:gd_mgmt_v3_collate_errors] 0-management: Unlocking failed on dhcp37-121.lab.eng.blr.redhat.com. Please check log file for details.
[2018-05-07 09:46:14.627387] E [MSGID: 106116] [glusterd-mgmt.c:124:gd_mgmt_v3_collate_errors] 0-management: Unlocking failed on dhcp37-103.lab.eng.blr.redhat.com. Please check log file for details.
[2018-05-07 09:46:14.627483] E [MSGID: 106116] [glusterd-mgmt.c:124:gd_mgmt_v3_collate_errors] 0-management: Unlocking failed on dhcp37-218.lab.eng.blr.redhat.com. Please check log file for details.
[2018-05-07 09:46:14.627583] E [MSGID: 106116] [glusterd-mgmt.c:124:gd_mgmt_v3_collate_errors] 0-management: Unlocking failed on dhcp37-136.lab.eng.blr.redhat.com. Please check log file for details.
[2018-05-07 09:46:14.627644] E [MSGID: 106116] [glusterd-mgmt.c:124:gd_mgmt_v3_collate_errors] 0-management: Unlocking failed on dhcp46-116.lab.eng.blr.redhat.com. Please check log file for details.
[2018-05-07 09:46:14.627703] E [MSGID: 106116] [glusterd-mgmt.c:124:gd_mgmt_v3_collate_errors] 0-management: Unlocking failed on dhcp46-184.lab.eng.blr.redhat.com. Please check log file for details.
[2018-05-07 09:46:14.627779] E [MSGID: 106116] [glusterd-mgmt.c:124:gd_mgmt_v3_collate_errors] 0-management: Unlocking failed on dhcp47-2.lab.eng.blr.redhat.com. Please check log file for details.
[2018-05-07 09:46:14.627952] E [MSGID: 106152] [glusterd-syncop.c:1641:gd_unlock_op_phase] 0-management: Failed to unlock on some peer(s)
[2018-05-07 09:46:14.628158] W [glusterd-locks.c:845:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe1379) [0x7fa08bdaa379] -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe09ca) [0x7fa08bda99ca] -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe8935) [0x7fa08bdb1935] ) 0-management: Lock for global All not held
[2018-05-07 09:46:14.628197] E [MSGID: 106118] [glusterd-syncop.c:1667:gd_unlock_op_phase] 0-management: Unable to release lock for All
------------------


Version-Release number of selected component (if applicable):
# rpm -qa | grep ganesha
glusterfs-ganesha-3.12.2-8.el7rhgs.x86_64
nfs-ganesha-gluster-2.5.5-6.el7rhgs.x86_64
nfs-ganesha-2.5.5-6.el7rhgs.x86_64


How reproducible:
Intermittent

Steps to Reproduce:
1.Create 8 node ganesha cluster


Actual results:

Ganesha enable command fails with "Unlocking failed " messages in glusterd.log but when checked in backend,cluster is up and running

--------------------

# gluster nfs-ganesha disable
Disabling NFS-Ganesha will tear down the entire ganesha cluster across the trusted pool. Do you still want to continue?
 (y/n) y
This will take a few minutes to complete. Please wait ..
nfs-ganesha : success 

[root@dhcp47-193 ganesha]# time gluster nfs-ganesha enable
Enabling NFS-Ganesha requires Gluster-NFS to be disabled across the trusted pool. Do you still want to continue?
 (y/n) y
This will take a few minutes to complete. Please wait ..
nfs-ganesha: failed

real	3m56.254s
user	0m0.105s
sys	0m0.180s


# pcs status
Cluster name: ganesha-ha
Stack: corosync
Current DC: dhcp47-193.lab.eng.blr.redhat.com (version 1.1.18-11.el7-2b07d5c5a9) - partition with quorum
Last updated: Mon May  7 15:16:31 2018
Last change: Mon May  7 15:15:57 2018 by root via cibadmin on dhcp47-193.lab.eng.blr.redhat.com

8 nodes configured
48 resources configured

Online: [ dhcp37-103.lab.eng.blr.redhat.com dhcp37-121.lab.eng.blr.redhat.com dhcp37-136.lab.eng.blr.redhat.com dhcp37-218.lab.eng.blr.redhat.com dhcp46-116.lab.eng.blr.redhat.com dhcp46-184.lab.eng.blr.redhat.com dhcp47-193.lab.eng.blr.redhat.com dhcp47-2.lab.eng.blr.redhat.com ]

Full list of resources:

 Clone Set: nfs_setup-clone [nfs_setup]
     Started: [ dhcp37-103.lab.eng.blr.redhat.com dhcp37-121.lab.eng.blr.redhat.com dhcp37-136.lab.eng.blr.redhat.com dhcp37-218.lab.eng.blr.redhat.com dhcp46-116.lab.eng.blr.redhat.com dhcp46-184.lab.eng.blr.redhat.com dhcp47-193.lab.eng.blr.redhat.com dhcp47-2.lab.eng.blr.redhat.com ]
 Clone Set: nfs-mon-clone [nfs-mon]
     Started: [ dhcp37-103.lab.eng.blr.redhat.com dhcp37-121.lab.eng.blr.redhat.com dhcp37-136.lab.eng.blr.redhat.com dhcp37-218.lab.eng.blr.redhat.com dhcp46-116.lab.eng.blr.redhat.com dhcp46-184.lab.eng.blr.redhat.com dhcp47-193.lab.eng.blr.redhat.com dhcp47-2.lab.eng.blr.redhat.com ]
 Clone Set: nfs-grace-clone [nfs-grace]
     Started: [ dhcp37-103.lab.eng.blr.redhat.com dhcp37-121.lab.eng.blr.redhat.com dhcp37-136.lab.eng.blr.redhat.com dhcp37-218.lab.eng.blr.redhat.com dhcp46-116.lab.eng.blr.redhat.com dhcp46-184.lab.eng.blr.redhat.com dhcp47-193.lab.eng.blr.redhat.com dhcp47-2.lab.eng.blr.redhat.com ]
 Resource Group: dhcp37-121.lab.eng.blr.redhat.com-group
     dhcp37-121.lab.eng.blr.redhat.com-nfs_block	(ocf::heartbeat:portblock):	Started dhcp37-121.lab.eng.blr.redhat.com
     dhcp37-121.lab.eng.blr.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started dhcp37-121.lab.eng.blr.redhat.com
     dhcp37-121.lab.eng.blr.redhat.com-nfs_unblock	(ocf::heartbeat:portblock):	Started dhcp37-121.lab.eng.blr.redhat.com
 Resource Group: dhcp37-103.lab.eng.blr.redhat.com-group
     dhcp37-103.lab.eng.blr.redhat.com-nfs_block	(ocf::heartbeat:portblock):	Started dhcp37-103.lab.eng.blr.redhat.com
     dhcp37-103.lab.eng.blr.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started dhcp37-103.lab.eng.blr.redhat.com
     dhcp37-103.lab.eng.blr.redhat.com-nfs_unblock	(ocf::heartbeat:portblock):	Started dhcp37-103.lab.eng.blr.redhat.com
 Resource Group: dhcp37-218.lab.eng.blr.redhat.com-group
     dhcp37-218.lab.eng.blr.redhat.com-nfs_block	(ocf::heartbeat:portblock):	Started dhcp37-218.lab.eng.blr.redhat.com
     dhcp37-218.lab.eng.blr.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started dhcp37-218.lab.eng.blr.redhat.com
     dhcp37-218.lab.eng.blr.redhat.com-nfs_unblock	(ocf::heartbeat:portblock):	Started dhcp37-218.lab.eng.blr.redhat.com
 Resource Group: dhcp37-136.lab.eng.blr.redhat.com-group
     dhcp37-136.lab.eng.blr.redhat.com-nfs_block	(ocf::heartbeat:portblock):	Started dhcp37-136.lab.eng.blr.redhat.com
     dhcp37-136.lab.eng.blr.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started dhcp37-136.lab.eng.blr.redhat.com
     dhcp37-136.lab.eng.blr.redhat.com-nfs_unblock	(ocf::heartbeat:portblock):	Started dhcp37-136.lab.eng.blr.redhat.com
 Resource Group: dhcp47-193.lab.eng.blr.redhat.com-group
     dhcp47-193.lab.eng.blr.redhat.com-nfs_block	(ocf::heartbeat:portblock):	Started dhcp47-193.lab.eng.blr.redhat.com
     dhcp47-193.lab.eng.blr.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started dhcp47-193.lab.eng.blr.redhat.com
     dhcp47-193.lab.eng.blr.redhat.com-nfs_unblock	(ocf::heartbeat:portblock):	Started dhcp47-193.lab.eng.blr.redhat.com
 Resource Group: dhcp46-116.lab.eng.blr.redhat.com-group
     dhcp46-116.lab.eng.blr.redhat.com-nfs_block	(ocf::heartbeat:portblock):	Started dhcp46-116.lab.eng.blr.redhat.com
     dhcp46-116.lab.eng.blr.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started dhcp46-116.lab.eng.blr.redhat.com
     dhcp46-116.lab.eng.blr.redhat.com-nfs_unblock	(ocf::heartbeat:portblock):	Started dhcp46-116.lab.eng.blr.redhat.com
 Resource Group: dhcp46-184.lab.eng.blr.redhat.com-group
     dhcp46-184.lab.eng.blr.redhat.com-nfs_block	(ocf::heartbeat:portblock):	Started dhcp46-184.lab.eng.blr.redhat.com
     dhcp46-184.lab.eng.blr.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started dhcp46-184.lab.eng.blr.redhat.com
     dhcp46-184.lab.eng.blr.redhat.com-nfs_unblock	(ocf::heartbeat:portblock):	Started dhcp46-184.lab.eng.blr.redhat.com
 Resource Group: dhcp47-2.lab.eng.blr.redhat.com-group
     dhcp47-2.lab.eng.blr.redhat.com-nfs_block	(ocf::heartbeat:portblock):	Started dhcp47-2.lab.eng.blr.redhat.com
     dhcp47-2.lab.eng.blr.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started dhcp47-2.lab.eng.blr.redhat.com
     dhcp47-2.lab.eng.blr.redhat.com-nfs_unblock	(ocf::heartbeat:portblock):	Started dhcp47-2.lab.eng.blr.redhat.com

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled




Expected results:

Ganesha enable command should give correct output


Additional info:

Raising it against component "ganesha".Change the component if required.Attaching sosreport shortly

Comment 5 Sanju 2018-05-09 05:18:47 UTC
Hi Manisha,

I'm suspecting this bug is same as https://bugzilla.redhat.com/show_bug.cgi?id=1568436. I raised this bug to fix the issue we discussed in the mail thread with subject "nfs-ganesha enable issue". Please let me whether it is same or something different.

Comment 19 errata-xmlrpc 2018-09-04 06:48:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607

Comment 20 Manisha Saini 2018-09-24 05:16:10 UTC
This is the ganesha setup usecase which is covered as part of each ganesha testcase. Hence setting the qe_test_coverage flag + with no Testcase ID.