[Ganesha] "Gluster nfs-ganesha enable" commands sometimes gives output as "failed" with "Unlocking failed" error messages ,even though cluster is up and healthy in backend
Description of problem:
While enabling ganesha on 8 nodes,sometimes "gluster nfs-gamesha enable" commands gives output as failed even when ganesha cluster is up and running in backend.In glusterd.log "unlocking failed" messages are observed
This issue is intermittent but have encountered this 3-4 times.
# time gluster nfs-ganesha enable
Enabling NFS-Ganesha requires Gluster-NFS to be disabled across the trusted pool. Do you still want to continue?
(y/n) y
This will take a few minutes to complete. Please wait ..
nfs-ganesha: failed
real 3m56.254s
user 0m0.105s
sys 0m0.180s
Glusterd logs
----------------
[2018-05-07 09:42:50.357894] I [MSGID: 106474] [glusterd-ganesha.c:433:check_host_list] 0-management: ganesha host found Hostname is dhcp47-193.lab.eng.blr.redhat.com
[2018-05-07 09:45:20.231419] I [glusterd-locks.c:730:gd_mgmt_v3_unlock_timer_cbk] 0-management: In gd_mgmt_v3_unlock_timer_cbk
[2018-05-07 09:46:14.626849] E [MSGID: 106116] [glusterd-mgmt.c:124:gd_mgmt_v3_collate_errors] 0-management: Unlocking failed on dhcp37-121.lab.eng.blr.redhat.com. Please check log file for details.
[2018-05-07 09:46:14.627387] E [MSGID: 106116] [glusterd-mgmt.c:124:gd_mgmt_v3_collate_errors] 0-management: Unlocking failed on dhcp37-103.lab.eng.blr.redhat.com. Please check log file for details.
[2018-05-07 09:46:14.627483] E [MSGID: 106116] [glusterd-mgmt.c:124:gd_mgmt_v3_collate_errors] 0-management: Unlocking failed on dhcp37-218.lab.eng.blr.redhat.com. Please check log file for details.
[2018-05-07 09:46:14.627583] E [MSGID: 106116] [glusterd-mgmt.c:124:gd_mgmt_v3_collate_errors] 0-management: Unlocking failed on dhcp37-136.lab.eng.blr.redhat.com. Please check log file for details.
[2018-05-07 09:46:14.627644] E [MSGID: 106116] [glusterd-mgmt.c:124:gd_mgmt_v3_collate_errors] 0-management: Unlocking failed on dhcp46-116.lab.eng.blr.redhat.com. Please check log file for details.
[2018-05-07 09:46:14.627703] E [MSGID: 106116] [glusterd-mgmt.c:124:gd_mgmt_v3_collate_errors] 0-management: Unlocking failed on dhcp46-184.lab.eng.blr.redhat.com. Please check log file for details.
[2018-05-07 09:46:14.627779] E [MSGID: 106116] [glusterd-mgmt.c:124:gd_mgmt_v3_collate_errors] 0-management: Unlocking failed on dhcp47-2.lab.eng.blr.redhat.com. Please check log file for details.
[2018-05-07 09:46:14.627952] E [MSGID: 106152] [glusterd-syncop.c:1641:gd_unlock_op_phase] 0-management: Failed to unlock on some peer(s)
[2018-05-07 09:46:14.628158] W [glusterd-locks.c:845:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe1379) [0x7fa08bdaa379] -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe09ca) [0x7fa08bda99ca] -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe8935) [0x7fa08bdb1935] ) 0-management: Lock for global All not held
[2018-05-07 09:46:14.628197] E [MSGID: 106118] [glusterd-syncop.c:1667:gd_unlock_op_phase] 0-management: Unable to release lock for All
------------------
Version-Release number of selected component (if applicable):
# rpm -qa | grep ganesha
glusterfs-ganesha-3.12.2-8.el7rhgs.x86_64
nfs-ganesha-gluster-2.5.5-6.el7rhgs.x86_64
nfs-ganesha-2.5.5-6.el7rhgs.x86_64
How reproducible:
Intermittent
Steps to Reproduce:
1.Create 8 node ganesha cluster
Actual results:
Ganesha enable command fails with "Unlocking failed " messages in glusterd.log but when checked in backend,cluster is up and running
--------------------
# gluster nfs-ganesha disable
Disabling NFS-Ganesha will tear down the entire ganesha cluster across the trusted pool. Do you still want to continue?
(y/n) y
This will take a few minutes to complete. Please wait ..
nfs-ganesha : success
[root@dhcp47-193 ganesha]# time gluster nfs-ganesha enable
Enabling NFS-Ganesha requires Gluster-NFS to be disabled across the trusted pool. Do you still want to continue?
(y/n) y
This will take a few minutes to complete. Please wait ..
nfs-ganesha: failed
real 3m56.254s
user 0m0.105s
sys 0m0.180s
# pcs status
Cluster name: ganesha-ha
Stack: corosync
Current DC: dhcp47-193.lab.eng.blr.redhat.com (version 1.1.18-11.el7-2b07d5c5a9) - partition with quorum
Last updated: Mon May 7 15:16:31 2018
Last change: Mon May 7 15:15:57 2018 by root via cibadmin on dhcp47-193.lab.eng.blr.redhat.com
8 nodes configured
48 resources configured
Online: [ dhcp37-103.lab.eng.blr.redhat.com dhcp37-121.lab.eng.blr.redhat.com dhcp37-136.lab.eng.blr.redhat.com dhcp37-218.lab.eng.blr.redhat.com dhcp46-116.lab.eng.blr.redhat.com dhcp46-184.lab.eng.blr.redhat.com dhcp47-193.lab.eng.blr.redhat.com dhcp47-2.lab.eng.blr.redhat.com ]
Full list of resources:
Clone Set: nfs_setup-clone [nfs_setup]
Started: [ dhcp37-103.lab.eng.blr.redhat.com dhcp37-121.lab.eng.blr.redhat.com dhcp37-136.lab.eng.blr.redhat.com dhcp37-218.lab.eng.blr.redhat.com dhcp46-116.lab.eng.blr.redhat.com dhcp46-184.lab.eng.blr.redhat.com dhcp47-193.lab.eng.blr.redhat.com dhcp47-2.lab.eng.blr.redhat.com ]
Clone Set: nfs-mon-clone [nfs-mon]
Started: [ dhcp37-103.lab.eng.blr.redhat.com dhcp37-121.lab.eng.blr.redhat.com dhcp37-136.lab.eng.blr.redhat.com dhcp37-218.lab.eng.blr.redhat.com dhcp46-116.lab.eng.blr.redhat.com dhcp46-184.lab.eng.blr.redhat.com dhcp47-193.lab.eng.blr.redhat.com dhcp47-2.lab.eng.blr.redhat.com ]
Clone Set: nfs-grace-clone [nfs-grace]
Started: [ dhcp37-103.lab.eng.blr.redhat.com dhcp37-121.lab.eng.blr.redhat.com dhcp37-136.lab.eng.blr.redhat.com dhcp37-218.lab.eng.blr.redhat.com dhcp46-116.lab.eng.blr.redhat.com dhcp46-184.lab.eng.blr.redhat.com dhcp47-193.lab.eng.blr.redhat.com dhcp47-2.lab.eng.blr.redhat.com ]
Resource Group: dhcp37-121.lab.eng.blr.redhat.com-group
dhcp37-121.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp37-121.lab.eng.blr.redhat.com
dhcp37-121.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp37-121.lab.eng.blr.redhat.com
dhcp37-121.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Started dhcp37-121.lab.eng.blr.redhat.com
Resource Group: dhcp37-103.lab.eng.blr.redhat.com-group
dhcp37-103.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp37-103.lab.eng.blr.redhat.com
dhcp37-103.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp37-103.lab.eng.blr.redhat.com
dhcp37-103.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Started dhcp37-103.lab.eng.blr.redhat.com
Resource Group: dhcp37-218.lab.eng.blr.redhat.com-group
dhcp37-218.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp37-218.lab.eng.blr.redhat.com
dhcp37-218.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp37-218.lab.eng.blr.redhat.com
dhcp37-218.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Started dhcp37-218.lab.eng.blr.redhat.com
Resource Group: dhcp37-136.lab.eng.blr.redhat.com-group
dhcp37-136.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp37-136.lab.eng.blr.redhat.com
dhcp37-136.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp37-136.lab.eng.blr.redhat.com
dhcp37-136.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Started dhcp37-136.lab.eng.blr.redhat.com
Resource Group: dhcp47-193.lab.eng.blr.redhat.com-group
dhcp47-193.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp47-193.lab.eng.blr.redhat.com
dhcp47-193.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp47-193.lab.eng.blr.redhat.com
dhcp47-193.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Started dhcp47-193.lab.eng.blr.redhat.com
Resource Group: dhcp46-116.lab.eng.blr.redhat.com-group
dhcp46-116.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp46-116.lab.eng.blr.redhat.com
dhcp46-116.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp46-116.lab.eng.blr.redhat.com
dhcp46-116.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Started dhcp46-116.lab.eng.blr.redhat.com
Resource Group: dhcp46-184.lab.eng.blr.redhat.com-group
dhcp46-184.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp46-184.lab.eng.blr.redhat.com
dhcp46-184.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp46-184.lab.eng.blr.redhat.com
dhcp46-184.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Started dhcp46-184.lab.eng.blr.redhat.com
Resource Group: dhcp47-2.lab.eng.blr.redhat.com-group
dhcp47-2.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp47-2.lab.eng.blr.redhat.com
dhcp47-2.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp47-2.lab.eng.blr.redhat.com
dhcp47-2.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Started dhcp47-2.lab.eng.blr.redhat.com
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
Expected results:
Ganesha enable command should give correct output
Additional info:
Raising it against component "ganesha".Change the component if required.Attaching sosreport shortly
Hi Manisha,
I'm suspecting this bug is same as https://bugzilla.redhat.com/show_bug.cgi?id=1568436. I raised this bug to fix the issue we discussed in the mail thread with subject "nfs-ganesha enable issue". Please let me whether it is same or something different.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHSA-2018:2607