Bug 1428798

Summary:	[GANESHA] I/O error while performing renaming and lookups from 4 clients
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Manisha Saini <msaini>
Component:	nfs-ganesha	Assignee:	Daniel Gryniewicz <dang>
Status:	CLOSED ERRATA	QA Contact:	Manisha Saini <msaini>
Severity:	unspecified	Docs Contact:
Priority:	urgent
Version:	rhgs-3.2	CC:	amukherj, asrivast, dang, ffilz, jthottan, kkeithle, mbenjamin, rcyriac, rhs-bugs, sbhaloth, skoduri, storage-qa-internal
Target Milestone:	---
Target Release:	RHGS 3.2.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	nfs-ganesha-2.4.1-8	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	1428808 (view as bug list)		Environment:
Last Closed:	2017-03-23 06:29:08 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1351528, 1428808

Description Manisha Saini 2017-03-03 11:37:37 UTC

Description of problem:

Ganesha process died on the node having VIP from which the volume is mounted on 4 clients while performing renaming from 1 client and lookups from 3 other clients 

mdcache_rename :RW LOCK :CRIT :Error 35 mesaage is observed in ganesha.log

Along with ganesha process being aborted,On client getting messages 

mv: cannot move ‘804’ to ‘rename804’: No such file or directory
mkdir: cannot create directory ‘945’: File exists
mv: cannot move ‘1776’ to ‘rename1776’: No such file or directory


Version-Release number of selected component (if applicable):

glusterfs-ganesha-3.8.4-16.el6rhs.x86_64

How reproducible:

Consistently
3/3 times

Steps to Reproduce:
1.Create a 5 node ganesha cluster
2.Create 6*2 Distributed-Replicate volume and Enable ganesha on it.
3.In the export file provide RO permission to 3 clients

# showmount -e localhost
Export list for localhost:
/ganesha 10.70.37.192,10.70.37.176,10.70.37.129


4.Mount volume to 4 clients via v4 with 1 VIP


1st Client- Creating 2000 directories and renaming the same
2nd Client-lookups ls -lRt in loop (having RO permission)
3rd Client -lookups ls -lRt in loop (having RO permission)
4th Client- lookups ls -lRt in loop (having RO permission)



Actual results:

Ganesha process died on node having VIP from which the volume was mounted on 4 clients.VIP got failover to other node 

Also on client observed following messages 
---------
mv: cannot move ‘804’ to ‘rename804’: No such file or directory
mkdir: cannot create directory ‘945’: File exists
mv: cannot move ‘1776’ to ‘rename1776’: No such file or directory
--------

In ganesha.log
-----------
03/03/2017 20:50:27 : epoch 4ed10000 : dhcp42-191.lab.eng.blr.redhat.com : ganesha.nfsd-10697[reaper] nfs_in_grace :STATE :EVENT :NFS Server Now NOT IN GRACE
03/03/2017 21:31:55 : epoch 4ed10000 : dhcp42-191.lab.eng.blr.redhat.com : ganesha.nfsd-10697[work-164] mdcache_rename :RW LOCK :CRIT :Error 35, write locking 0x21c5200 (&mdc_newdir->content_lock) at /builddir/build/BUILD/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:852
-----------



Expected results:
Ganesha Process should not die on the node from which the volume is mounted

Additional info:

Comment 2 Soumya Koduri 2017-03-03 11:53:58 UTC

There are two issues here -

1) Rename sometimes failing with ENOENT
2) nfs-ganesha process aborting due to deadlock.

(gdb) 
836		}
837	
838		if (mdc_olddir == mdc_newdir) {
839			/* if the rename operation is made within the same dir, then we
840			 * use an optimization: mdcache_rename_dirent is used
841			 * instead of adding/removing dirent. This limits the use of
842			 * resource in this case */
843	
844			LogDebug(COMPONENT_CACHE_INODE,
845				 "Rename (%p,%s)->(%p,%s) : source and target directory  the same",
(gdb) 
846				 mdc_olddir, old_name, mdc_newdir, new_name);
847	
848			status = mdcache_dirent_rename(mdc_newdir, old_name, new_name);
849			if (FSAL_IS_ERROR(status)) {
850				/* We're obviously out of date.  Throw out the cached
851				   directory */
852				PTHREAD_RWLOCK_wrlock(&mdc_newdir->content_lock);
853				mdcache_dirent_invalidate_all(mdc_newdir);
854				PTHREAD_RWLOCK_unlock(&mdc_newdir->content_lock);
855			}
(gdb) 

NFS-ganesha process aborted here due to deadlock. It is already fixed in upstream - https://review.gerrithub.io/#/c/308163/ . We need to backport that fix to downstream.


And wrt issue (1), looks like its failing to find oldname dirent post successful rename at the backend.


Breakpoint 1, mdcache_rename (obj_hdl=0x7f54680e3e38, olddir_hdl=0x20cb758, 
    old_name=0x7f53a8138b60 "685", newdir_hdl=0x20cb758, new_name=0x7f53a801d930 "rename685")
    at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:771
771	{
(gdb) info b
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x00000000004e4350 in mdcache_rename
                                               at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:771
	breakpoint already hit 1 time
2       breakpoint     keep y   0x00000000004e4818 in mdcache_rename
                                               at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:852
(gdb) disable 1
(gdb) c
Continuing.
[New Thread 0x7f53ac059700 (LWP 14693)]
[Switching to Thread 0x7f540bdfe700 (LWP 14074)]

Breakpoint 2, mdcache_rename (obj_hdl=<value optimized out>, olddir_hdl=0x20cb758, 
    old_name=0x7f546c00f1f0 "802", newdir_hdl=0x20cb758, new_name=0x7f546c086c80 "rename802")
    at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:852
852				PTHREAD_RWLOCK_wrlock(&mdc_newdir->content_lock);
(gdb) l mdcache_dirent_find
1166	 * @param[in] direntp	Directory entry, if found
1167	 * @return FSAL status
1168	 */
1169	fsal_status_t mdcache_dirent_find(mdcache_entry_t *dir, const char *name,
1170					  mdcache_dir_entry_t **direntp)
1171	{
1172		mdcache_dir_entry_t *dirent;
1173	
1174		LogFullDebug(COMPONENT_CACHE_INODE, "Find dir entry %s", name);
1175	
(gdb) 
1176		*direntp = NULL;
1177	
1178		/* Sanity check */
1179		if (dir->obj_handle.type != DIRECTORY)
1180			return fsalstat(ERR_FSAL_NOTDIR, 0);
1181	
1182		/* If no active entry, do nothing */
1183		if (dir->fsobj.fsdir.nbactive == 0) {
1184			if (mdc_dircache_trusted(dir))
1185				return fsalstat(ERR_FSAL_NOENT, 0);
(gdb) p mdc_newdir
$1 = (mdcache_entry_t *) 0x20cb720
(gdb) p mdc_newdir->fsobj.fsdir.nbactive
$4 = 808
(gdb) l
1186			else
1187				return fsalstat(ERR_FSAL_NO_ERROR, 0);
1188		}
1189	
1190		dirent = mdcache_avl_qp_lookup_s(dir, name, 1);
1191		if (!dirent) {
1192			if (mdc_dircache_trusted(dir))
1193				return fsalstat(ERR_FSAL_NOENT, 0);
1194	
1195			return fsalstat(ERR_FSAL_NO_ERROR, 0);
(gdb) p dirent
No symbol "dirent" in current context.
(gdb) l
1196		}
1197	
1198		*direntp = dirent;
1199		return fsalstat(ERR_FSAL_NO_ERROR, 0);
1200	}
1201	
1202	/**
1203	 *
1204	 * @brief Adds a directory entry to a cached directory.
1205	 *
(gdb) p status
$5 = {major = ERR_FSAL_NOENT, minor = <value optimized out>}
(gdb) l mdcache_dirent_rename
1303	 */
1304	fsal_status_t
1305	mdcache_dirent_rename(mdcache_entry_t *parent, const char *oldname,
1306			      const char *newname)
1307	
1308	{
1309		mdcache_dir_entry_t *dirent, *dirent2;
1310		fsal_status_t status;
1311		int code = 0;
1312	
(gdb) 
1313	
1314		LogFullDebug(COMPONENT_CACHE_INODE,
1315			     "Rename dir entry %s to %s",
1316			     oldname, newname);
1317	
1318		/* Don't rename if parent is not being cached */
1319		if (parent->mde_flags & MDCACHE_BYPASS_DIRCACHE)
1320			return fsalstat(ERR_FSAL_NO_ERROR, 0);
1321	
1322		status = mdcache_dirent_find(parent, oldname, &dirent);
(gdb) 
1323		if (FSAL_IS_ERROR(status))
1324			return status;
1325		if (!dirent)
1326			return status;
1327	
1328		status = mdcache_dirent_find(parent, newname, &dirent2);
1329		if (FSAL_IS_ERROR(status) && status.major != ERR_FSAL_NOENT)
1330			return status;
1331	
1332		if (dirent2) {
(gdb) 
1333			/* rename would cause a collision */
1334			if (parent->mde_flags & MDCACHE_TRUST_CONTENT) {
1335				/* overwrite, replace entry and expire the old */
1336				mdcache_entry_t *oldentry;
1337	
1338				(void)mdcache_find_keyed(&dirent2->ckey, &oldentry);
1339	
1340				/* dirent2 (newname) will now point to renamed entry */
1341				mdcache_key_delete(&dirent2->ckey);
1342				mdcache_key_dup(&dirent2->ckey, &dirent->ckey);
(gdb) e
echo       edit       enable     end        eval       exec-file  
(gdb) #call mdcache_avl_qp_lookup_s(
(gdb) p mdc_newdir
$6 = (mdcache_entry_t *) 0x20cb720
(gdb) #call mdcache_avl_qp_lookup_s(0x20cb720,"80
(gdb) bt
#0  mdcache_rename (obj_hdl=<value optimized out>, olddir_hdl=0x20cb758, 
    old_name=0x7f546c00f1f0 "802", newdir_hdl=0x20cb758, new_name=0x7f546c086c80 "rename802")
    at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:852
#1  0x000000000042b79a in fsal_rename (dir_src=0x20cb758, oldname=0x7f546c00f1f0 "802", 
    dir_dest=0x20cb758, newname=0x7f546c086c80 "rename802")
    at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/fsal_helper.c:1666
#2  0x000000000046175f in nfs4_op_rename (op=<value optimized out>, data=<value optimized out>, 
    resp=0x7f546c08e2a0) at /usr/src/debug/nfs-ganesha-2.4.1/src/Protocols/NFS/nfs4_op_rename.c:115
#3  0x000000000044bc76 in nfs4_Compound (arg=<value optimized out>, req=<value optimized out>, 
    res=0x7f546c171c10) at /usr/src/debug/nfs-ganesha-2.4.1/src/Protocols/NFS/nfs4_Compound.c:734
#4  0x000000000043836e in nfs_rpc_execute (reqdata=0x7f539c1c6b70)
    at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_worker_thread.c:1281
#5  0x0000000000439afe in worker_run (ctx=0x21f5130)
    at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_worker_thread.c:1548
#6  0x00000000004c2b59 in fridgethr_start_routine (arg=0x21f5130)
    at /usr/src/debug/nfs-ganesha-2.4.1/src/support/fridgethr.c:550
#7  0x0000003e9c207aa1 in start_thread () from /lib64/libpthread.so.0
#8  0x0000003e9bee8bbd in clone () from /lib64/libc.so.6
(gdb) call mdcache_avl_qp_lookup_s(0x20cb720,"802",1)
[Thread 0x7f545801f700 (LWP 12929) exited]
[New Thread 0x7f53acd44700 (LWP 17406)]
[New Thread 0x7f53acd66700 (LWP 17405)]
[New Thread 0x7f53857fb700 (LWP 17404)]
[Thread 0x7f5470011700 (LWP 12998) exited]
[New Thread 0x7f5384dfa700 (LWP 17407)]
[Thread 0x7f5460017700 (LWP 12999) exited]
[New Thread 0x7f537ffff700 (LWP 17408)]
[New Thread 0x7f537f5fe700 (LWP 17409)]
$7 = (mdcache_dir_entry_t *) 0x0
(gdb) bt
#0  mdcache_rename (obj_hdl=<value optimized out>, olddir_hdl=0x20cb758, 
    old_name=0x7f546c00f1f0 "802", newdir_hdl=0x20cb758, new_name=0x7f546c086c80 "rename802")
    at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:852
#1  0x000000000042b79a in fsal_rename (dir_src=0x20cb758, oldname=0x7f546c00f1f0 "802", 
    dir_dest=0x20cb758, newname=0x7f546c086c80 "rename802")
    at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/fsal_helper.c:1666
#2  0x000000000046175f in nfs4_op_rename (op=<value optimized out>, data=<value optimized out>, 
    resp=0x7f546c08e2a0) at /usr/src/debug/nfs-ganesha-2.4.1/src/Protocols/NFS/nfs4_op_rename.c:115
#3  0x000000000044bc76 in nfs4_Compound (arg=<value optimized out>, req=<value optimized out>, 
    res=0x7f546c171c10) at /usr/src/debug/nfs-ganesha-2.4.1/src/Protocols/NFS/nfs4_Compound.c:734
#4  0x000000000043836e in nfs_rpc_execute (reqdata=0x7f539c1c6b70)
    at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_worker_thread.c:1281
#5  0x0000000000439afe in worker_run (ctx=0x21f5130)
    at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_worker_thread.c:1548
#6  0x00000000004c2b59 in fridgethr_start_routine (arg=0x21f5130)
    at /usr/src/debug/nfs-ganesha-2.4.1/src/support/fridgethr.c:550
#7  0x0000003e9c207aa1 in start_thread () from /lib64/libpthread.so.0
#8  0x0000003e9bee8bbd in clone () from /lib64/libc.so.6
(gdb) call mdcache_avl_qp_lookup_s(0x20cb720,"rename802",1)
[New Thread 0x7f5470011700 (LWP 17411)]
[New Thread 0x7f5460017700 (LWP 17410)]
[New Thread 0x7f545801f700 (LWP 17729)]
$8 = (mdcache_dir_entry_t *) 0x7f546c152b40
(gdb) 


In mdcache_rename -> mdcache_dirent_rename() -> mdcache_dirent_find(parent, oldname, &dirent) failed with ENOENT. Not sure for which reason old dentry got removed. Request Dan to take a look.

@Manisha,
Could you also please attach the script which you have used for rename?

Comment 3 Manisha Saini 2017-03-03 12:02:35 UTC

lookup script:
# cat ls.sh 
while true;do ls -lRt;done

rename script:
cat fileop.sh

for i in {1..2000}; do mkdir $i ; sleep 1 ; mv $i rename$i ; done

Comment 5 Soumya Koduri 2017-03-03 12:08:04 UTC

Since there are two issues observed, raised bug1428808 to track just ABRT issue and the fix needed. This bug shall be used to debug and track I/O error observed.

Comment 7 Daniel Gryniewicz 2017-03-03 13:24:00 UTC

Sorry, I'm not sure how I missed this one.

The backport is trivial; it applies cleanly.  It can just be added.

As for the rename error, the issue is this.  We do the sub-FSAL rename first, and then try to rename the dirent.  But, in the interim, another client slipped in with the readdir(), and updated the dirents for the src-dir, without the (now renamed) src-file, so the dirent wasn't found.  I'll think about how to fix this.

Comment 8 Daniel Gryniewicz 2017-03-03 16:09:22 UTC

Proposed upstream fix: https://review.gerrithub.io/351383

This applies cleanly to 2.4.3.  It would be nice to have this tested, and see if it fixes the issue.  I ran several hours of testing here, with no issues.

Comment 10 Daniel Gryniewicz 2017-03-03 17:24:06 UTC

I didn't try without the fix.  I can start it up now and see if it hits.

Comment 11 Daniel Gryniewicz 2017-03-03 18:43:15 UTC

On my test cluster, with Ganesha Next and gluster 3.8.7, I'm able to reproduce without the patch, but not with the patch.

Comment 13 Daniel Gryniewicz 2017-03-03 19:02:22 UTC

Frank is merging it today.

Comment 15 Manisha Saini 2017-03-07 11:35:16 UTC

Verified this bug on 

nfs-ganesha-gluster-2.4.1-8.el6rhs.x86_64

Steps:

1.Create a 5 node ganesha cluster
2.Create 6*2 Distributed-Replicate volume and Enable ganesha on it.
3.In the export file provide RO permission to 3 clients

# showmount -e localhost
Export list for localhost:
/ganesha 10.70.37.192,10.70.37.176,10.70.37.129

4.Mount volume to 4 clients via v4 with 1 VIP


1st Client- Creating 2000 directories and renaming the same
2nd Client-lookups ls -lRt in loop (having RO permission)
3rd Client -lookups ls -lRt in loop (having RO permission)
4th Client- lookups ls -lRt in loop (having RO permission)

No IO errors were observed while creating directories and performing parallel lookups,hence marking this bug as verified.

Comment 17 errata-xmlrpc 2017-03-23 06:29:08 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2017-0493.html