Bug 1381416

Summary: Bonnie test suite failed while deleting files with "drastic I/O error (rmdir)".
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Shashank Raj <sraj>
Component: nfs-ganeshaAssignee: Frank Filz <ffilz>
Status: CLOSED CURRENTRELEASE QA Contact: Manisha Saini <msaini>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.2CC: amukherj, bmohanra, dang, ffilz, jdanek, jthottan, kkeithle, lbailey, ndevos, rcyriac, rhinduja, rhs-bugs, skoduri, storage-qa-internal
Target Milestone: ---Keywords: Triaged, ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Release Note
Doc Text:
When a READDIR is issued on a directory that is mutating, the cookie sent as part of request can belong to an already deleted file. In such cases, server returns a BAD_COOKIE error, which was previously not handled by some applications such as bonnie test-suite, and caused those applications to error out. This is expected behavior. Affected applications have updated to handle these errors.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-11-19 06:35:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1351530    

Description Shashank Raj 2016-10-04 04:05:08 UTC
Description of problem:

Bonnie test suite failed while deleting files with "drastic I/O error (rmdir)".

Version-Release number of selected component (if applicable):

[root@dhcp42-59 ~]# rpm -qa|grep glusterfs
glusterfs-api-3.8.4-2.el6rhs.x86_64
glusterfs-libs-3.8.4-2.el6rhs.x86_64
glusterfs-cli-3.8.4-2.el6rhs.x86_64
glusterfs-3.8.4-2.el6rhs.x86_64
glusterfs-server-3.8.4-2.el6rhs.x86_64
glusterfs-geo-replication-3.8.4-2.el6rhs.x86_64
glusterfs-client-xlators-3.8.4-2.el6rhs.x86_64
glusterfs-rdma-3.8.4-2.el6rhs.x86_64
glusterfs-fuse-3.8.4-2.el6rhs.x86_64
glusterfs-ganesha-3.8.4-2.el6rhs.x86_64

[root@dhcp42-59 ~]# rpm -qa|grep ganesha
nfs-ganesha-2.4.0-2.el6rhs.x86_64
nfs-ganesha-gluster-2.4.0-2.el6rhs.x86_64
glusterfs-ganesha-3.8.4-2.el6rhs.x86_64


How reproducible:

2/2

Steps to Reproduce:
1.Create a ganesha cluster, create a volume and enable ganesha on it.
2.Mount the volume with vers=4 on client
3.Start executing bonnie test suite.
4.Observe that it fails during deletion of sequential files with below error message:

executing bonnie
Using uid:0, gid:0.
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...Bonnie: drastic I/O error (rmdir): Directory not empty
Cleaning up test directory after error.

real	13m13.703s
user	0m1.448s
sys	0m34.192s
bonnie failed
0
Total 0 tests were successful

5. No relevant error log messages found on server side.
6. Data on the mount point:

[root@Client2 ~]# cd /mnt/nfs1
[root@Client2 nfs1]# ls
run17420
[root@Client2 nfs1]# cd run17420/
[root@Client2 run17420]# ls
Bonnie.17442
[root@Client2 run17420]# 


Some of the files inside mount point:

-rw-------. 1 nobody nobody 0 Oct  4 14:41 0000003f88YnOqrNJWIYl
-rw-------. 1 nobody nobody 0 Oct  4 14:41 0000003f87pXVF
-rw-------. 1 nobody nobody 0 Oct  4 14:41 0000003f86oljSnaoso
-rw-------. 1 nobody nobody 0 Oct  4 14:41 0000003f85VqpcTKC
-rw-------. 1 nobody nobody 0 Oct  4 14:41 0000003f84WJj2FfAosdb3
-rw-------. 1 nobody nobody 0 Oct  4 14:41 0000003f83B7DbTyhp
-rw-------. 1 nobody nobody 0 Oct  4 14:41 0000003f823PnFTN8hD5AK
-rw-------. 1 nobody nobody 0 Oct  4 14:41 0000003f81ymP5B6WWjB
-rw-------. 1 nobody nobody 0 Oct  4 14:41 0000003f80j2vK

7. Manually removing these files works fine.


Actual results:

Bonnie test suite failed while deleting files with "drastic I/O error (rmdir)".

Expected results:

Bonnie test suite should pass without any errors.

Additional info:

Comment 4 Jiffin 2016-10-24 18:02:09 UTC
Following is the RCA(in my environment I included one of Dan's fix[1] )

While running the bonnie test , the one of readdir call got failed due to bad cookie error. Hence test resulted in failure.
Cause (assumed) :
The last readdir call should return mdcache_avl_lookup_k() MDCACHE_AVL_LAST, but it returning MDCACHE_AVL_NO_ERROR and following readdir results in bad cookie error.

[1] https://review.gerrithub.io/#/c/298859/

Comment 7 Jiffin 2016-11-15 12:58:30 UTC
The readdir calls works based on cookie value. It is kind of offset value. For example, if we need to read 1000 directory entries, client will send lots of readdir call and on each readdir call the value of cookie will be incremented.

Here incase of Bonnie test after doing the creates, reads, stats etc it will try to delete the test dir. As part of this removal operation client sends a readdir calls. It goes smoothly and read all the contents & then client again sends another call with a previous cookie value(don't know the reason why client send so), MDCACHE layer(in ganesha) complains it is a BADCOOKIE. bonnie test fails with this error and tries to clean up test dir(again it fails with directory NOTEMPTY error). Please note after this when we directly deletes the same test directory from mount point it works fine.

Explanation from Frank why MDCACHE returns BADCOOKIE error( from IRC logs) :
 MDCACHE populates the entire directory from FSAL in one go, then it feeds protocol layer from dirent cache, when protocol layer has a subsequent request with a non-zero cookie, MDCACHE uses that cookie (which is the dirent hash key) to find the dirent that can break if the directory mutates

If understand correctly , cookie used MDCACHE cannot passed to upper FSAL layer(libgfapi). i.e it is not possible to forward a single readdir request(which is failed in MDCACHE) to FSAL layer.

Solution should be used readdir call based FSAL_COOKIE

Comment 11 Arthy Loganathan 2016-11-23 08:34:41 UTC
Executed bonnie test in the latest build,

glusterfs-ganesha-3.8.4-5.el7rhgs.x86_64
nfs-ganesha-gluster-2.4.1-1.el7rhgs.x86_64

And fails with the same error.

executing bonnie
Using uid:0, gid:0.
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...Bonnie: drastic I/O error (rmdir): Directory not empty
Cleaning up test directory after error.

real	34m8.225s
user	0m2.673s
sys	1m18.910s
bonnie failed
0
Total 0 tests were successful
Switching over to the previous working directory
Removing /mnt/test_nfs//run1183/
rmdir: failed to remove ‘/mnt/test_nfs//run1183/’: Directory not empty
rmdir failed:Directory not empty

Comment 12 surabhi 2016-11-23 09:02:20 UTC
As mentioned in #C11 , I also see following error with bonnie:

Changing to the specified mountpoint
/mnt/nfs/run16037
executing bonnie
Using uid:0, gid:0.
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...Bonnie: drastic I/O error (rmdir): Directory not empty
Cleaning up test directory after error.

real	27m49.168s
user	0m1.347s
sys	0m32.871s
bonnie failed
0
Total 0 tests were successful
Switching over to the previous working directory
Removing /mnt/nfs/run16037/
rmdir: failed to remove ‘/mnt/nfs/run16037/’: Directory not empty
rmdir failed:Directory not empty



When trying manually using rm -rf it removes the directory successfully.
may be we need to check with script maintainer or debug more on nfs-client issue.

Comment 18 Bhavana 2017-03-13 05:19:16 UTC
Hi Soumya,

I have edited the known issues doc text further. Let me know if there is anything specific that needs to be added as a workaround

Comment 19 Soumya Koduri 2017-03-13 08:30:38 UTC
Hi Bhavana,
Doc text looks good to me.

Comment 20 Kaleb KEITHLEY 2017-08-23 13:05:43 UTC
address test suite issues?

Comment 26 Jiffin 2019-10-18 06:43:00 UTC
LGTM