Bug 1575557 - [Ganesha] "Gluster nfs-ganesha enable" commands sometimes gives output as "failed" with "Unlocking failed" error messages ,even though cluster is up and healthy in backend
Summary: [Ganesha] "Gluster nfs-ganesha enable" commands sometimes gives output as "fa...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterd
Version: rhgs-3.4
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: RHGS 3.4.0
Assignee: Sanju
QA Contact: Manisha Saini
URL:
Whiteboard:
Depends On:
Blocks: 1503137 1577731
TreeView+ depends on / blocked
 
Reported: 2018-05-07 09:58 UTC by Manisha Saini
Modified: 2018-09-24 05:16 UTC (History)
13 users (show)

Fixed In Version: glusterfs-3.12.2-13
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1577731 (view as bug list)
Environment:
Last Closed: 2018-09-04 06:48:05 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:2607 0 None None None 2018-09-04 06:49:57 UTC

Description Manisha Saini 2018-05-07 09:58:30 UTC
Description of problem:

While enabling ganesha on 8 nodes,sometimes "gluster nfs-gamesha enable" commands gives output as failed even when ganesha cluster is up and running in backend.In glusterd.log "unlocking failed" messages are observed

This issue is intermittent but have encountered this 3-4 times. 

# time gluster nfs-ganesha enable
Enabling NFS-Ganesha requires Gluster-NFS to be disabled across the trusted pool. Do you still want to continue?
 (y/n) y
This will take a few minutes to complete. Please wait ..
nfs-ganesha: failed

real	3m56.254s
user	0m0.105s
sys	0m0.180s


Glusterd logs
----------------
[2018-05-07 09:42:50.357894] I [MSGID: 106474] [glusterd-ganesha.c:433:check_host_list] 0-management: ganesha host found Hostname is dhcp47-193.lab.eng.blr.redhat.com
[2018-05-07 09:45:20.231419] I [glusterd-locks.c:730:gd_mgmt_v3_unlock_timer_cbk] 0-management: In gd_mgmt_v3_unlock_timer_cbk
[2018-05-07 09:46:14.626849] E [MSGID: 106116] [glusterd-mgmt.c:124:gd_mgmt_v3_collate_errors] 0-management: Unlocking failed on dhcp37-121.lab.eng.blr.redhat.com. Please check log file for details.
[2018-05-07 09:46:14.627387] E [MSGID: 106116] [glusterd-mgmt.c:124:gd_mgmt_v3_collate_errors] 0-management: Unlocking failed on dhcp37-103.lab.eng.blr.redhat.com. Please check log file for details.
[2018-05-07 09:46:14.627483] E [MSGID: 106116] [glusterd-mgmt.c:124:gd_mgmt_v3_collate_errors] 0-management: Unlocking failed on dhcp37-218.lab.eng.blr.redhat.com. Please check log file for details.
[2018-05-07 09:46:14.627583] E [MSGID: 106116] [glusterd-mgmt.c:124:gd_mgmt_v3_collate_errors] 0-management: Unlocking failed on dhcp37-136.lab.eng.blr.redhat.com. Please check log file for details.
[2018-05-07 09:46:14.627644] E [MSGID: 106116] [glusterd-mgmt.c:124:gd_mgmt_v3_collate_errors] 0-management: Unlocking failed on dhcp46-116.lab.eng.blr.redhat.com. Please check log file for details.
[2018-05-07 09:46:14.627703] E [MSGID: 106116] [glusterd-mgmt.c:124:gd_mgmt_v3_collate_errors] 0-management: Unlocking failed on dhcp46-184.lab.eng.blr.redhat.com. Please check log file for details.
[2018-05-07 09:46:14.627779] E [MSGID: 106116] [glusterd-mgmt.c:124:gd_mgmt_v3_collate_errors] 0-management: Unlocking failed on dhcp47-2.lab.eng.blr.redhat.com. Please check log file for details.
[2018-05-07 09:46:14.627952] E [MSGID: 106152] [glusterd-syncop.c:1641:gd_unlock_op_phase] 0-management: Failed to unlock on some peer(s)
[2018-05-07 09:46:14.628158] W [glusterd-locks.c:845:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe1379) [0x7fa08bdaa379] -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe09ca) [0x7fa08bda99ca] -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe8935) [0x7fa08bdb1935] ) 0-management: Lock for global All not held
[2018-05-07 09:46:14.628197] E [MSGID: 106118] [glusterd-syncop.c:1667:gd_unlock_op_phase] 0-management: Unable to release lock for All
------------------


Version-Release number of selected component (if applicable):
# rpm -qa | grep ganesha
glusterfs-ganesha-3.12.2-8.el7rhgs.x86_64
nfs-ganesha-gluster-2.5.5-6.el7rhgs.x86_64
nfs-ganesha-2.5.5-6.el7rhgs.x86_64


How reproducible:
Intermittent

Steps to Reproduce:
1.Create 8 node ganesha cluster


Actual results:

Ganesha enable command fails with "Unlocking failed " messages in glusterd.log but when checked in backend,cluster is up and running

--------------------

# gluster nfs-ganesha disable
Disabling NFS-Ganesha will tear down the entire ganesha cluster across the trusted pool. Do you still want to continue?
 (y/n) y
This will take a few minutes to complete. Please wait ..
nfs-ganesha : success 

[root@dhcp47-193 ganesha]# time gluster nfs-ganesha enable
Enabling NFS-Ganesha requires Gluster-NFS to be disabled across the trusted pool. Do you still want to continue?
 (y/n) y
This will take a few minutes to complete. Please wait ..
nfs-ganesha: failed

real	3m56.254s
user	0m0.105s
sys	0m0.180s


# pcs status
Cluster name: ganesha-ha
Stack: corosync
Current DC: dhcp47-193.lab.eng.blr.redhat.com (version 1.1.18-11.el7-2b07d5c5a9) - partition with quorum
Last updated: Mon May  7 15:16:31 2018
Last change: Mon May  7 15:15:57 2018 by root via cibadmin on dhcp47-193.lab.eng.blr.redhat.com

8 nodes configured
48 resources configured

Online: [ dhcp37-103.lab.eng.blr.redhat.com dhcp37-121.lab.eng.blr.redhat.com dhcp37-136.lab.eng.blr.redhat.com dhcp37-218.lab.eng.blr.redhat.com dhcp46-116.lab.eng.blr.redhat.com dhcp46-184.lab.eng.blr.redhat.com dhcp47-193.lab.eng.blr.redhat.com dhcp47-2.lab.eng.blr.redhat.com ]

Full list of resources:

 Clone Set: nfs_setup-clone [nfs_setup]
     Started: [ dhcp37-103.lab.eng.blr.redhat.com dhcp37-121.lab.eng.blr.redhat.com dhcp37-136.lab.eng.blr.redhat.com dhcp37-218.lab.eng.blr.redhat.com dhcp46-116.lab.eng.blr.redhat.com dhcp46-184.lab.eng.blr.redhat.com dhcp47-193.lab.eng.blr.redhat.com dhcp47-2.lab.eng.blr.redhat.com ]
 Clone Set: nfs-mon-clone [nfs-mon]
     Started: [ dhcp37-103.lab.eng.blr.redhat.com dhcp37-121.lab.eng.blr.redhat.com dhcp37-136.lab.eng.blr.redhat.com dhcp37-218.lab.eng.blr.redhat.com dhcp46-116.lab.eng.blr.redhat.com dhcp46-184.lab.eng.blr.redhat.com dhcp47-193.lab.eng.blr.redhat.com dhcp47-2.lab.eng.blr.redhat.com ]
 Clone Set: nfs-grace-clone [nfs-grace]
     Started: [ dhcp37-103.lab.eng.blr.redhat.com dhcp37-121.lab.eng.blr.redhat.com dhcp37-136.lab.eng.blr.redhat.com dhcp37-218.lab.eng.blr.redhat.com dhcp46-116.lab.eng.blr.redhat.com dhcp46-184.lab.eng.blr.redhat.com dhcp47-193.lab.eng.blr.redhat.com dhcp47-2.lab.eng.blr.redhat.com ]
 Resource Group: dhcp37-121.lab.eng.blr.redhat.com-group
     dhcp37-121.lab.eng.blr.redhat.com-nfs_block	(ocf::heartbeat:portblock):	Started dhcp37-121.lab.eng.blr.redhat.com
     dhcp37-121.lab.eng.blr.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started dhcp37-121.lab.eng.blr.redhat.com
     dhcp37-121.lab.eng.blr.redhat.com-nfs_unblock	(ocf::heartbeat:portblock):	Started dhcp37-121.lab.eng.blr.redhat.com
 Resource Group: dhcp37-103.lab.eng.blr.redhat.com-group
     dhcp37-103.lab.eng.blr.redhat.com-nfs_block	(ocf::heartbeat:portblock):	Started dhcp37-103.lab.eng.blr.redhat.com
     dhcp37-103.lab.eng.blr.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started dhcp37-103.lab.eng.blr.redhat.com
     dhcp37-103.lab.eng.blr.redhat.com-nfs_unblock	(ocf::heartbeat:portblock):	Started dhcp37-103.lab.eng.blr.redhat.com
 Resource Group: dhcp37-218.lab.eng.blr.redhat.com-group
     dhcp37-218.lab.eng.blr.redhat.com-nfs_block	(ocf::heartbeat:portblock):	Started dhcp37-218.lab.eng.blr.redhat.com
     dhcp37-218.lab.eng.blr.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started dhcp37-218.lab.eng.blr.redhat.com
     dhcp37-218.lab.eng.blr.redhat.com-nfs_unblock	(ocf::heartbeat:portblock):	Started dhcp37-218.lab.eng.blr.redhat.com
 Resource Group: dhcp37-136.lab.eng.blr.redhat.com-group
     dhcp37-136.lab.eng.blr.redhat.com-nfs_block	(ocf::heartbeat:portblock):	Started dhcp37-136.lab.eng.blr.redhat.com
     dhcp37-136.lab.eng.blr.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started dhcp37-136.lab.eng.blr.redhat.com
     dhcp37-136.lab.eng.blr.redhat.com-nfs_unblock	(ocf::heartbeat:portblock):	Started dhcp37-136.lab.eng.blr.redhat.com
 Resource Group: dhcp47-193.lab.eng.blr.redhat.com-group
     dhcp47-193.lab.eng.blr.redhat.com-nfs_block	(ocf::heartbeat:portblock):	Started dhcp47-193.lab.eng.blr.redhat.com
     dhcp47-193.lab.eng.blr.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started dhcp47-193.lab.eng.blr.redhat.com
     dhcp47-193.lab.eng.blr.redhat.com-nfs_unblock	(ocf::heartbeat:portblock):	Started dhcp47-193.lab.eng.blr.redhat.com
 Resource Group: dhcp46-116.lab.eng.blr.redhat.com-group
     dhcp46-116.lab.eng.blr.redhat.com-nfs_block	(ocf::heartbeat:portblock):	Started dhcp46-116.lab.eng.blr.redhat.com
     dhcp46-116.lab.eng.blr.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started dhcp46-116.lab.eng.blr.redhat.com
     dhcp46-116.lab.eng.blr.redhat.com-nfs_unblock	(ocf::heartbeat:portblock):	Started dhcp46-116.lab.eng.blr.redhat.com
 Resource Group: dhcp46-184.lab.eng.blr.redhat.com-group
     dhcp46-184.lab.eng.blr.redhat.com-nfs_block	(ocf::heartbeat:portblock):	Started dhcp46-184.lab.eng.blr.redhat.com
     dhcp46-184.lab.eng.blr.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started dhcp46-184.lab.eng.blr.redhat.com
     dhcp46-184.lab.eng.blr.redhat.com-nfs_unblock	(ocf::heartbeat:portblock):	Started dhcp46-184.lab.eng.blr.redhat.com
 Resource Group: dhcp47-2.lab.eng.blr.redhat.com-group
     dhcp47-2.lab.eng.blr.redhat.com-nfs_block	(ocf::heartbeat:portblock):	Started dhcp47-2.lab.eng.blr.redhat.com
     dhcp47-2.lab.eng.blr.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started dhcp47-2.lab.eng.blr.redhat.com
     dhcp47-2.lab.eng.blr.redhat.com-nfs_unblock	(ocf::heartbeat:portblock):	Started dhcp47-2.lab.eng.blr.redhat.com

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled




Expected results:

Ganesha enable command should give correct output


Additional info:

Raising it against component "ganesha".Change the component if required.Attaching sosreport shortly

Comment 5 Sanju 2018-05-09 05:18:47 UTC
Hi Manisha,

I'm suspecting this bug is same as https://bugzilla.redhat.com/show_bug.cgi?id=1568436. I raised this bug to fix the issue we discussed in the mail thread with subject "nfs-ganesha enable issue". Please let me whether it is same or something different.

Comment 19 errata-xmlrpc 2018-09-04 06:48:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607

Comment 20 Manisha Saini 2018-09-24 05:16:10 UTC
This is the ganesha setup usecase which is covered as part of each ganesha testcase. Hence setting the qe_test_coverage flag + with no Testcase ID.


Note You need to log in before you can comment on or make changes to this bug.