Bug 1301450 - Refresh-config failed on one of the node after editing the export file
Refresh-config failed on one of the node after editing the export file
Status: CLOSED DUPLICATE of bug 1335114
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: nfs-ganesha (Show other bugs)
3.1
x86_64 Linux
unspecified Severity medium
: ---
: ---
Assigned To: Soumya Koduri
storage-qa-internal@redhat.com
: ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-01-25 00:52 EST by Apeksha
Modified: 2017-03-17 06:19 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-06-06 07:18:02 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Apeksha 2016-01-25 00:52:44 EST
Description of problem:
Refresh-config failed on one of the node after editing the export file

Version-Release number of selected component (if applicable):
glusterfs-3.7.5-17.el6rhs.x86_64
nfs-ganesha-2.2.0-12.el6rhs.x86_64

How reproducible:
Once

Steps to Reproduce:
1. Run the nfs-ganesha rootsquash automated cases 
2. Edit of export file- enable rootsquash
3. Run refresh-config, that fails on localhost

[root@dhcp47-5 ~]# /usr/libexec/ganesha/ganesha-ha.sh --refresh-config /etc/ganesha/ testvol 
Refresh-config completed on dhcp47-61.
Error: refresh-config failed on localhost.

Actual results:


Expected results:


Additional info:
Comment 2 Apeksha 2016-01-25 00:56:08 EST
sosreports available at -  http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1301450/
Comment 3 Soumya Koduri 2016-01-25 01:22:32 EST
Thread 32 (Thread 0x7f58affff700 (LWP 20061)):
#0  0x0000003323e0a859 in pthread_mutex_unlock () from /lib64/libpthread.so.0
#1  0x00000000004b03f8 in cache_inode_lru_ref (entry=0x7f58ecafb3d0, flags=0) at /usr/src/debug/nfs-ganesha-2.2.0/src/cache_inode/cache_inode_lru.c:1380
#2  0x00000000004a9fa9 in cache_inode_unexport (export=0x7f58a4fe8cb8) at /usr/src/debug/nfs-ganesha-2.2.0/src/cache_inode/cache_inode_misc.c:509
#3  0x00000000004d9f98 in gsh_export_removeexport (args=<value optimized out>, reply=<value optimized out>, error=0x7f58afffe320) at /usr/src/debug/nfs-ganesha-2.2.0/src/support/export_mgr.c:1073
#4  0x00000000004e6e92 in dbus_message_entrypoint (conn=0x296eec0, msg=0x296f180, user_data=0x7668c0) at /usr/src/debug/nfs-ganesha-2.2.0/src/dbus/dbus_server.c:517
#5  0x000000332761cefe in ?? () from /lib64/libdbus-1.so.3
#6  0x0000003327610b4c in dbus_connection_dispatch () from /lib64/libdbus-1.so.3
#7  0x0000003327610dd9 in ?? () from /lib64/libdbus-1.so.3
#8  0x00000000004e7664 in gsh_dbus_thread (arg=<value optimized out>) at /usr/src/debug/nfs-ganesha-2.2.0/src/dbus/dbus_server.c:744
#9  0x0000003323e07a51 in start_thread () from /lib64/libpthread.so.0
#10 0x0000003323ae893d in clone () from /lib64/libc.so.6
Breakpoint 1, cache_inode_lru_ref (entry=0x7f58ecafb3d0, flags=0) at /usr/src/debug/nfs-ganesha-2.2.0/src/cache_inode/cache_inode_lru.c:1372
1372	{
(gdb) n
1374		struct lru_q_lane *qlane = &LRU[lru->lane];
(gdb) 
1377		if ((flags & (LRU_REQ_INITIAL | LRU_REQ_STALE_OK)) == 0) {
(gdb) 
1374		struct lru_q_lane *qlane = &LRU[lru->lane];
(gdb) 
1377		if ((flags & (LRU_REQ_INITIAL | LRU_REQ_STALE_OK)) == 0) {
(gdb) 
1378			QLOCK(qlane);
(gdb) 
[New Thread 0x7f57ab9ff700 (LWP 11575)]
[Thread 0x7f57935fe700 (LWP 22806) exited]
1379			if (lru->qid == LRU_ENTRY_CLEANUP) {
(gdb) 
1378			QLOCK(qlane);
(gdb) 
1379			if (lru->qid == LRU_ENTRY_CLEANUP) {
(gdb) 
1380				QUNLOCK(qlane);
(gdb) 
1430			QUNLOCK(qlane);
(gdb) 
1434	}
(gdb) 
cache_inode_unexport (export=0x7f58a4fe8cb8) at /usr/src/debug/nfs-ganesha-2.2.0/src/cache_inode/cache_inode_misc.c:511
511			if (status != CACHE_INODE_SUCCESS) {
(gdb) 
513				PTHREAD_RWLOCK_unlock(&export->lock);
(gdb) 
495			PTHREAD_RWLOCK_rdlock(&export->lock);
(gdb) 


RCA:
dbus thread is looping indefinitely while trying to remove an entry during unexport of the volumes. Hence 'refresh_config' failed. The reason could be that during unexport of the volume, we come across a cache_inode_entry which is marked for cleanup. When such entries are found, we just continue the loop to move on to next cache_inode_entry or until this entry gets cleaned up. But looks like there is no other thread involved in cleaning up this entry. Need to check what scenarios would trigger this issue and if such entries can be cleaned up here.

This needs to be fixed. 

@Apeksha,
Please check if it can be reproducible
Comment 4 Niels de Vos 2016-03-07 06:06:35 EST
Might have been fixed with the recent changes that Kaleb posted.

Kaleb, can you point us to a possible downstream patch? QE can then test again with a version that contains a fix.
Comment 5 Soumya Koduri 2016-03-07 06:12:00 EST
We have started a discussion on the same with upstream ganesha-devel community -
 - https://sourceforge.net/p/nfs-ganesha/mailman/message/34811062/

As mentioned by Daniel in reply to the mail, if we have know the steps to reproduce, we shall check if his cleanup changes fixes the issue.

Request QE to reproduce and provide the steps.
Comment 6 Kaleb KEITHLEY 2016-06-06 07:18:02 EDT
closing, dupe of 1335114, clear needinfo

*** This bug has been marked as a duplicate of bug 1335114 ***

Note You need to log in before you can comment on or make changes to this bug.