Description of problem: Bonnie test suite failed while deleting files with "drastic I/O error (rmdir)". Version-Release number of selected component (if applicable): [root@dhcp42-59 ~]# rpm -qa|grep glusterfs glusterfs-api-3.8.4-2.el6rhs.x86_64 glusterfs-libs-3.8.4-2.el6rhs.x86_64 glusterfs-cli-3.8.4-2.el6rhs.x86_64 glusterfs-3.8.4-2.el6rhs.x86_64 glusterfs-server-3.8.4-2.el6rhs.x86_64 glusterfs-geo-replication-3.8.4-2.el6rhs.x86_64 glusterfs-client-xlators-3.8.4-2.el6rhs.x86_64 glusterfs-rdma-3.8.4-2.el6rhs.x86_64 glusterfs-fuse-3.8.4-2.el6rhs.x86_64 glusterfs-ganesha-3.8.4-2.el6rhs.x86_64 [root@dhcp42-59 ~]# rpm -qa|grep ganesha nfs-ganesha-2.4.0-2.el6rhs.x86_64 nfs-ganesha-gluster-2.4.0-2.el6rhs.x86_64 glusterfs-ganesha-3.8.4-2.el6rhs.x86_64 How reproducible: 2/2 Steps to Reproduce: 1.Create a ganesha cluster, create a volume and enable ganesha on it. 2.Mount the volume with vers=4 on client 3.Start executing bonnie test suite. 4.Observe that it fails during deletion of sequential files with below error message: executing bonnie Using uid:0, gid:0. Writing a byte at a time...done Writing intelligently...done Rewriting...done Reading a byte at a time...done Reading intelligently...done start 'em...done...done...done...done...done... Create files in sequential order...done. Stat files in sequential order...done. Delete files in sequential order...Bonnie: drastic I/O error (rmdir): Directory not empty Cleaning up test directory after error. real 13m13.703s user 0m1.448s sys 0m34.192s bonnie failed 0 Total 0 tests were successful 5. No relevant error log messages found on server side. 6. Data on the mount point: [root@Client2 ~]# cd /mnt/nfs1 [root@Client2 nfs1]# ls run17420 [root@Client2 nfs1]# cd run17420/ [root@Client2 run17420]# ls Bonnie.17442 [root@Client2 run17420]# Some of the files inside mount point: -rw-------. 1 nobody nobody 0 Oct 4 14:41 0000003f88YnOqrNJWIYl -rw-------. 1 nobody nobody 0 Oct 4 14:41 0000003f87pXVF -rw-------. 1 nobody nobody 0 Oct 4 14:41 0000003f86oljSnaoso -rw-------. 1 nobody nobody 0 Oct 4 14:41 0000003f85VqpcTKC -rw-------. 1 nobody nobody 0 Oct 4 14:41 0000003f84WJj2FfAosdb3 -rw-------. 1 nobody nobody 0 Oct 4 14:41 0000003f83B7DbTyhp -rw-------. 1 nobody nobody 0 Oct 4 14:41 0000003f823PnFTN8hD5AK -rw-------. 1 nobody nobody 0 Oct 4 14:41 0000003f81ymP5B6WWjB -rw-------. 1 nobody nobody 0 Oct 4 14:41 0000003f80j2vK 7. Manually removing these files works fine. Actual results: Bonnie test suite failed while deleting files with "drastic I/O error (rmdir)". Expected results: Bonnie test suite should pass without any errors. Additional info:
Following is the RCA(in my environment I included one of Dan's fix[1] ) While running the bonnie test , the one of readdir call got failed due to bad cookie error. Hence test resulted in failure. Cause (assumed) : The last readdir call should return mdcache_avl_lookup_k() MDCACHE_AVL_LAST, but it returning MDCACHE_AVL_NO_ERROR and following readdir results in bad cookie error. [1] https://review.gerrithub.io/#/c/298859/
The readdir calls works based on cookie value. It is kind of offset value. For example, if we need to read 1000 directory entries, client will send lots of readdir call and on each readdir call the value of cookie will be incremented. Here incase of Bonnie test after doing the creates, reads, stats etc it will try to delete the test dir. As part of this removal operation client sends a readdir calls. It goes smoothly and read all the contents & then client again sends another call with a previous cookie value(don't know the reason why client send so), MDCACHE layer(in ganesha) complains it is a BADCOOKIE. bonnie test fails with this error and tries to clean up test dir(again it fails with directory NOTEMPTY error). Please note after this when we directly deletes the same test directory from mount point it works fine. Explanation from Frank why MDCACHE returns BADCOOKIE error( from IRC logs) : MDCACHE populates the entire directory from FSAL in one go, then it feeds protocol layer from dirent cache, when protocol layer has a subsequent request with a non-zero cookie, MDCACHE uses that cookie (which is the dirent hash key) to find the dirent that can break if the directory mutates If understand correctly , cookie used MDCACHE cannot passed to upper FSAL layer(libgfapi). i.e it is not possible to forward a single readdir request(which is failed in MDCACHE) to FSAL layer. Solution should be used readdir call based FSAL_COOKIE
Executed bonnie test in the latest build, glusterfs-ganesha-3.8.4-5.el7rhgs.x86_64 nfs-ganesha-gluster-2.4.1-1.el7rhgs.x86_64 And fails with the same error. executing bonnie Using uid:0, gid:0. Writing a byte at a time...done Writing intelligently...done Rewriting...done Reading a byte at a time...done Reading intelligently...done start 'em...done...done...done...done...done... Create files in sequential order...done. Stat files in sequential order...done. Delete files in sequential order...Bonnie: drastic I/O error (rmdir): Directory not empty Cleaning up test directory after error. real 34m8.225s user 0m2.673s sys 1m18.910s bonnie failed 0 Total 0 tests were successful Switching over to the previous working directory Removing /mnt/test_nfs//run1183/ rmdir: failed to remove ‘/mnt/test_nfs//run1183/’: Directory not empty rmdir failed:Directory not empty
As mentioned in #C11 , I also see following error with bonnie: Changing to the specified mountpoint /mnt/nfs/run16037 executing bonnie Using uid:0, gid:0. Writing a byte at a time...done Writing intelligently...done Rewriting...done Reading a byte at a time...done Reading intelligently...done start 'em...done...done...done...done...done... Create files in sequential order...done. Stat files in sequential order...done. Delete files in sequential order...Bonnie: drastic I/O error (rmdir): Directory not empty Cleaning up test directory after error. real 27m49.168s user 0m1.347s sys 0m32.871s bonnie failed 0 Total 0 tests were successful Switching over to the previous working directory Removing /mnt/nfs/run16037/ rmdir: failed to remove ‘/mnt/nfs/run16037/’: Directory not empty rmdir failed:Directory not empty When trying manually using rm -rf it removes the directory successfully. may be we need to check with script maintainer or debug more on nfs-client issue.
Hi Soumya, I have edited the known issues doc text further. Let me know if there is anything specific that needs to be added as a workaround
Hi Bhavana, Doc text looks good to me.
address test suite issues?
LGTM