1301450 – Refresh-config failed on one of the node after editing the export file

Bug 1301450 - Refresh-config failed on one of the node after editing the export file

Summary: Refresh-config failed on one of the node after editing the export file

Keywords:
Status:	CLOSED DUPLICATE of bug 1335114
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	nfs-ganesha
Sub Component:
Version:	rhgs-3.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Soumya Koduri
QA Contact:	storage-qa-internal@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-01-25 05:52 UTC by Apeksha
Modified:	2017-03-17 10:19 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-06-06 11:18:02 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1335134	0	unspecified	CLOSED	python-pip-8.1.2 is available	2021-02-22 00:41:40 UTC

Internal Links: 1335134

Description Apeksha 2016-01-25 05:52:44 UTC

Description of problem:
Refresh-config failed on one of the node after editing the export file

Version-Release number of selected component (if applicable):
glusterfs-3.7.5-17.el6rhs.x86_64
nfs-ganesha-2.2.0-12.el6rhs.x86_64

How reproducible:
Once

Steps to Reproduce:
1. Run the nfs-ganesha rootsquash automated cases 
2. Edit of export file- enable rootsquash
3. Run refresh-config, that fails on localhost

[root@dhcp47-5 ~]# /usr/libexec/ganesha/ganesha-ha.sh --refresh-config /etc/ganesha/ testvol 
Refresh-config completed on dhcp47-61.
Error: refresh-config failed on localhost.

Actual results:


Expected results:


Additional info:

Comment 2 Apeksha 2016-01-25 05:56:08 UTC

sosreports available at -  http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1301450/

Comment 3 Soumya Koduri 2016-01-25 06:22:32 UTC

Thread 32 (Thread 0x7f58affff700 (LWP 20061)):
#0  0x0000003323e0a859 in pthread_mutex_unlock () from /lib64/libpthread.so.0
#1  0x00000000004b03f8 in cache_inode_lru_ref (entry=0x7f58ecafb3d0, flags=0) at /usr/src/debug/nfs-ganesha-2.2.0/src/cache_inode/cache_inode_lru.c:1380
#2  0x00000000004a9fa9 in cache_inode_unexport (export=0x7f58a4fe8cb8) at /usr/src/debug/nfs-ganesha-2.2.0/src/cache_inode/cache_inode_misc.c:509
#3  0x00000000004d9f98 in gsh_export_removeexport (args=<value optimized out>, reply=<value optimized out>, error=0x7f58afffe320) at /usr/src/debug/nfs-ganesha-2.2.0/src/support/export_mgr.c:1073
#4  0x00000000004e6e92 in dbus_message_entrypoint (conn=0x296eec0, msg=0x296f180, user_data=0x7668c0) at /usr/src/debug/nfs-ganesha-2.2.0/src/dbus/dbus_server.c:517
#5  0x000000332761cefe in ?? () from /lib64/libdbus-1.so.3
#6  0x0000003327610b4c in dbus_connection_dispatch () from /lib64/libdbus-1.so.3
#7  0x0000003327610dd9 in ?? () from /lib64/libdbus-1.so.3
#8  0x00000000004e7664 in gsh_dbus_thread (arg=<value optimized out>) at /usr/src/debug/nfs-ganesha-2.2.0/src/dbus/dbus_server.c:744
#9  0x0000003323e07a51 in start_thread () from /lib64/libpthread.so.0
#10 0x0000003323ae893d in clone () from /lib64/libc.so.6
Breakpoint 1, cache_inode_lru_ref (entry=0x7f58ecafb3d0, flags=0) at /usr/src/debug/nfs-ganesha-2.2.0/src/cache_inode/cache_inode_lru.c:1372
1372	{
(gdb) n
1374		struct lru_q_lane *qlane = &LRU[lru->lane];
(gdb) 
1377		if ((flags & (LRU_REQ_INITIAL | LRU_REQ_STALE_OK)) == 0) {
(gdb) 
1374		struct lru_q_lane *qlane = &LRU[lru->lane];
(gdb) 
1377		if ((flags & (LRU_REQ_INITIAL | LRU_REQ_STALE_OK)) == 0) {
(gdb) 
1378			QLOCK(qlane);
(gdb) 
[New Thread 0x7f57ab9ff700 (LWP 11575)]
[Thread 0x7f57935fe700 (LWP 22806) exited]
1379			if (lru->qid == LRU_ENTRY_CLEANUP) {
(gdb) 
1378			QLOCK(qlane);
(gdb) 
1379			if (lru->qid == LRU_ENTRY_CLEANUP) {
(gdb) 
1380				QUNLOCK(qlane);
(gdb) 
1430			QUNLOCK(qlane);
(gdb) 
1434	}
(gdb) 
cache_inode_unexport (export=0x7f58a4fe8cb8) at /usr/src/debug/nfs-ganesha-2.2.0/src/cache_inode/cache_inode_misc.c:511
511			if (status != CACHE_INODE_SUCCESS) {
(gdb) 
513				PTHREAD_RWLOCK_unlock(&export->lock);
(gdb) 
495			PTHREAD_RWLOCK_rdlock(&export->lock);
(gdb) 


RCA:
dbus thread is looping indefinitely while trying to remove an entry during unexport of the volumes. Hence 'refresh_config' failed. The reason could be that during unexport of the volume, we come across a cache_inode_entry which is marked for cleanup. When such entries are found, we just continue the loop to move on to next cache_inode_entry or until this entry gets cleaned up. But looks like there is no other thread involved in cleaning up this entry. Need to check what scenarios would trigger this issue and if such entries can be cleaned up here.

This needs to be fixed. 

@Apeksha,
Please check if it can be reproducible

Comment 4 Niels de Vos 2016-03-07 11:06:35 UTC

Might have been fixed with the recent changes that Kaleb posted.

Kaleb, can you point us to a possible downstream patch? QE can then test again with a version that contains a fix.

Comment 5 Soumya Koduri 2016-03-07 11:12:00 UTC

We have started a discussion on the same with upstream ganesha-devel community -
 - https://sourceforge.net/p/nfs-ganesha/mailman/message/34811062/

As mentioned by Daniel in reply to the mail, if we have know the steps to reproduce, we shall check if his cleanup changes fixes the issue.

Request QE to reproduce and provide the steps.

Comment 6 Kaleb KEITHLEY 2016-06-06 11:18:02 UTC

closing, dupe of 1335114, clear needinfo

*** This bug has been marked as a duplicate of bug 1335114 ***

Note You need to log in before you can comment on or make changes to this bug.