+++ This bug was initially created as a clone of Bug #814052 +++ Description of problem: [root@longname-nlmtest-verification-purpose nfs-test]# time /opt/qa/tools/bonnie++ -u root -d . Using uid:0, gid:0. Writing a byte at a time...done Writing intelligently...done Rewriting...done Reading a byte at a time...done Reading intelligently...done start 'em...done...done...done...done...done... Create files in sequential order...done. Stat files in sequential order...done. Delete files in sequential order...Bonnie: drastic I/O error (rmdir): Directory not empty Cleaning up test directory after error. real 7m5.023s user 0m1.045s sys 0m33.545s Version-Release number of selected component (if applicable): 3.3.0qa35 How reproducible: always Steps to Reproduce: 1. mount the volume using nfs 2. execute the bonnie++ 3. Actual results: the test fails during the delete files operation Expected results: suppose to finish it full Additional info: logs from /var/log/glusterfs/nfs.log [2012-04-19 06:32:42.655735] E [rpcsvc.c:1084:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x936467579x, Program: NFS3, ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server) [2012-04-19 06:32:42.655778] E [nfs3.c:576:nfs3svc_submit_reply] 0-nfs-nfsv3: Reply submission failed [2012-04-19 06:32:42.656161] E [rpcsvc.c:1084:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x953244795x, Program: NFS3, ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server) [2012-04-19 06:32:42.656241] E [nfs3.c:576:nfs3svc_submit_reply] 0-nfs-nfsv3: Reply submission failed [2012-04-19 06:32:42.656355] E [rpcsvc.c:1084:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x970022011x, Program: NFS3, ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server) [2012-04-19 06:32:42.656459] E [nfs3.c:576:nfs3svc_submit_reply] 0-nfs-nfsv3: Reply submission failed [2012-04-19 06:38:54.384928] W [nfs3.c:3529:nfs3svc_rmdir_cbk] 0-nfs: 567627b: /Bonnie.1995 => -1 (Directory not empty) --- Additional comment from saujain on 2012-04-19 09:23:34 EDT --- [2012-04-19 13:25:25.000414] W [nfs3.c:3529:nfs3svc_rmdir_cbk] 0-nfs: 3525853f: /Bonnie.2067 => -1 (Directory not empty) I tried to execute the bonnie again and didn't see the "Reply submission failed" error again. --- Additional comment from sgowda on 2012-05-14 03:37:07 EDT --- On the latest master branch, with 2x2 setup on nfs, bonnie++ runs to completion without any errors. Can you please verify if are still seeing the issue at hand? --- Additional comment from saujain on 2012-05-16 04:41:48 EDT --- Bonnie still fails, [root@QA-31 nfs-test]# time /opt/qa/tools/bonnie++ -u root -d . Using uid:0, gid:0. Writing a byte at a time...done Writing intelligently...done Rewriting...done Reading a byte at a time...done Reading intelligently...done start 'em...done...done...done...done...done... Create files in sequential order...done. Stat files in sequential order...done. Delete files in sequential order...Bonnie: drastic I/O error (rmdir): Directory not empty Cleaning up test directory after error. real 7m57.541s user 0m0.877s sys 0m23.834s [root@QA-31 nfs-test]# mount | grep 79 172.17.251.79:/dist-rep on /mnt/nfs-test type nfs (rw,vers=3,addr=172.17.251.79) [root@QA-31 nfs-test]# it executed on the git sync to this commit id, commit 7503c63ee141931556cf066b9b255fc62cefcb68 --- Additional comment from sgowda on 2012-05-16 04:48:12 EDT --- Can you please upload the nfs logs? --- Additional comment from saujain on 2012-05-16 06:07:44 EDT --- uploaded the logs in the sosreport sharing space --- Additional comment from sgowda on 2012-05-17 01:03:49 EDT --- Created attachment 585095 [details] nfs log --- Additional comment from sgowda on 2012-05-17 02:41:36 EDT --- There are a lot of failures in the log related to permission issues. Can you check the backend to verify if the exports/dirs are all in sync? Additionally self heal seems to be triggered(dint see any disconnect issues). [2012-05-16 02:07:07.331335] I [afr-self-heal-common.c:2045:afr_self_heal_completion_cbk] 0-dist-rep-replicate-0: background entry self-heal completed on /run5728/pf/d0 [2012-05-16 02:42:55.076115] W [nfs3.c:3680:nfs3svc_rename_cbk] 0-nfs: 6cde5811: rename /run5728/fstest_6ad66a2a9777ec05609e1ebc77b6a9fd/fstest_725a0f5 f18ca4cfe0ac27e391430c75f -> /run5728/fstest_6ad66a2a9777ec05609e1ebc77b6a9fd/fstest_4d8a7810831e467266aecf2ea7257744 => -1 (Permission denied) [2012-05-16 02:42:56.128295] W [client3_1-fops.c:258:client3_1_mknod_cbk] 0-dist-rep-client-0: remote operation failed: Permission denied. Path: /run57 28/fstest_6ad66a2a9777ec05609e1ebc77b6a9fd/fstest_725a0f5f18ca4cfe0ac27e391430c75f (c4eea594-b21d-4ce5-b07b-227434b2645f) [2012-05-16 02:42:56.128365] W [client3_1-fops.c:2456:client3_1_link_cbk] 0-dist-rep-client-2: remote operation failed: Permission denied (/run5728/fst est_6ad66a2a9777ec05609e1ebc77b6a9fd/fstest_725a0f5f18ca4cfe0ac27e391430c75f -> /run5728/fstest_6ad66a2a9777ec05609e1ebc77b6a9fd/fstest_4d8a7810831e467 266aecf2ea7257744) [2012-05-16 02:42:56.128401] W [client3_1-fops.c:258:client3_1_mknod_cbk] 0-dist-rep-client-1: remote operation failed: Permission denied. Path: /run57 28/fstest_6ad66a2a9777ec05609e1ebc77b6a9fd/fstest_725a0f5f18ca4cfe0ac27e391430c75f (c4eea594-b21d-4ce5-b07b-227434b2645f) [2012-05-16 02:42:56.128741] W [client3_1-fops.c:2456:client3_1_link_cbk] 0-dist-rep-client-3: remote operation failed: Permission denied (/run5728/fst est_6ad66a2a9777ec05609e1ebc77b6a9fd/fstest_725a0f5f18ca4cfe0ac27e391430c75f -> /run5728/fstest_6ad66a2a9777ec05609e1ebc77b6a9fd/fstest_4d8a7810831e467 266aecf2ea7257744) .... [2012-05-16 02:42:59.132116] W [client3_1-fops.c:649:client3_1_rmdir_cbk] 0-dist-rep-client-2: remote operation failed: Permission denied [2012-05-16 02:42:59.132480] W [client3_1-fops.c:649:client3_1_rmdir_cbk] 0-dist-rep-client-3: remote operation failed: Permission denied [2012-05-16 02:42:59.133179] W [client3_1-fops.c:649:client3_1_rmdir_cbk] 0-dist-rep-client-0: remote operation failed: Permission denied [2012-05-16 02:42:59.133523] W [client3_1-fops.c:649:client3_1_rmdir_cbk] 0-dist-rep-client-1: remote operation failed: Permission denied --- Additional comment from vbellur on 2012-05-18 02:53:25 EDT --- *** Bug 820580 has been marked as a duplicate of this bug. *** --- Additional comment from sgowda on 2012-05-22 01:01:09 EDT --- *** Bug 823389 has been marked as a duplicate of this bug. *** --- Additional comment from sgowda on 2012-05-23 03:11:59 EDT --- The test passes successfully on fuse mount. But consistently fails on NFS. It look like afr says no entires found in dir (readdir), when in fact it is populated with a lot of regular files (no dirs). rmdir hence fails, as no unlink calls seems to be sent on those files. Few more error logs(debug level): [2012-05-23 03:07:56.334103] D [nfs3-helpers.c:3489:nfs3_log_readdirp_res] 0-nfs-nfsv3: XID: 5181b434, READDIRPLUS: NFS: 0(Call completed successfully. ), POSIX: 117(Structure needs cleaning), dircount: 4096, maxcount: 32768, cverf: 38954220, is_eof: 0 [2012-05-23 03:07:56.334588] D [nfs3-helpers.c:3523:nfs3_log_readdir_call] 0-nfs-nfsv3: XID: 5281b434, READDIRPLUS: args: FH: hashcount 1, exportid b13 c5fef-b24d-4ad3-bf89-5639e1de3ed7, gfid fbba9aeb-795a-4954-bd0a-ce478d537274, dircount: 4096, maxcount: 32768 [2012-05-23 03:07:56.334651] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-0: Returning 0, call_child: 1, last_index: -1 [2012-05-23 03:07:56.336463] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-0: Returning 0, call_child: 1, last_index: -1 [2012-05-23 03:07:56.336712] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-1: Returning 0, call_child: 1, last_index: -1 [2012-05-23 03:07:56.336974] D [nfs3-helpers.c:3489:nfs3_log_readdirp_res] 0-nfs-nfsv3: XID: 5281b434, READDIRPLUS: NFS: 0(Call completed successfully. ), POSIX: 117(Structure needs cleaning), dircount: 4096, maxcount: 32768, cverf: 38954220, is_eof: 0 [2012-05-23 03:07:56.337306] D [nfs3-helpers.c:3523:nfs3_log_readdir_call] 0-nfs-nfsv3: XID: 5381b434, READDIRPLUS: args: FH: hashcount 1, exportid b13 c5fef-b24d-4ad3-bf89-5639e1de3ed7, gfid fbba9aeb-795a-4954-bd0a-ce478d537274, dircount: 4096, maxcount: 32768 [2012-05-23 03:07:56.337369] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-0: Returning 0, call_child: 1, last_index: -1 [2012-05-23 03:07:56.339152] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-0: Returning 0, call_child: 1, last_index: -1 [2012-05-23 03:07:56.339267] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-1: Returning 0, call_child: 1, last_index: -1 [2012-05-23 03:07:56.339632] D [nfs3-helpers.c:3489:nfs3_log_readdirp_res] 0-nfs-nfsv3: XID: 5381b434, READDIRPLUS: NFS: 0(Call completed successfully.), POSIX: 117(Structure needs cleaning), dircount: 4096, maxcount: 32768, cverf: 38954220, is_eof: 0 [2012-05-23 03:07:56.339992] D [nfs3-helpers.c:1627:nfs3_log_common_call] 0-nfs-nfsv3: XID: 5481b434, ACCESS: args: FH: hashcount 1, exportid b13c5fef-b24d-4ad3-bf89-5639e1de3ed7, gfid fbba9aeb-795a-4954-bd0a-ce478d537274 [2012-05-23 03:07:56.340054] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-0: Returning 0, call_child: 1, last_index: -1 [2012-05-23 03:07:56.340310] D [nfs3-helpers.c:3389:nfs3_log_common_res] 0-nfs-nfsv3: XID: 5481b434, ACCESS: NFS: 0(Call completed successfully.), POSIX: 7(Argument list too long) [2012-05-23 03:07:56.340558] D [nfs3-helpers.c:1627:nfs3_log_common_call] 0-nfs-nfsv3: XID: 5581b434, ACCESS: args: FH: hashcount 0, exportid b13c5fef-b24d-4ad3-bf89-5639e1de3ed7, gfid 00000000-0000-0000-0000-000000000001 [2012-05-23 03:07:56.340608] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-0: Returning 0, call_child: 1, last_index: -1 [2012-05-23 03:07:56.340781] D [nfs3-helpers.c:3389:nfs3_log_common_res] 0-nfs-nfsv3: XID: 5581b434, ACCESS: NFS: 0(Call completed successfully.), POSIX: 7(Argument list too long) [2012-05-23 03:07:56.347287] D [nfs3-helpers.c:1641:nfs3_log_fh_entry_call] 0-nfs-nfsv3: XID: 5681b434, RMDIR: args: FH: hashcount 0, exportid b13c5fef-b24d-4ad3-bf89-5639e1de3ed7, gfid 00000000-0000-0000-0000-000000000001, name: Bonnie.2337 [2012-05-23 03:07:56.387361] D [afr-dir-read.c:126:afr_examine_dir_readdir_cbk] 0-test-replicate-1: /Bonnie.2337: no entries found in test-client-3 [2012-05-23 03:07:56.387837] D [afr-dir-read.c:126:afr_examine_dir_readdir_cbk] 0-test-replicate-1: /Bonnie.2337: no entries found in test-client-2 [2012-05-23 03:07:56.387907] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-1: Returning 0, call_child: 1, last_index: -1 [2012-05-23 03:07:56.390288] D [afr-dir-read.c:126:afr_examine_dir_readdir_cbk] 0-test-replicate-0: /Bonnie.2337: no entries found in test-client-1 [2012-05-23 03:07:56.397310] D [afr-dir-read.c:126:afr_examine_dir_readdir_cbk] 0-test-replicate-0: /Bonnie.2337: no entries found in test-client-0 [2012-05-23 03:07:56.397442] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-0: Returning 0, call_child: 1, last_index: -1 [2012-05-23 03:07:56.399452] W [nfs3.c:3525:nfs3svc_rmdir_cbk] 0-nfs: 5681b434: /Bonnie.2337 => -1 (Directory not empty) [2012-05-23 03:07:56.399506] D [nfs3-helpers.c:3389:nfs3_log_common_res] 0-nfs-nfsv3: XID: 5681b434, RMDIR: NFS: 66(Directory not empty), POSIX: 39(Directory not empty) --- Additional comment from sgowda on 2012-05-23 03:48:25 EDT --- strace o/p of bonnie: write(2, "done.\nDelete files in sequential"..., 41) = 41 gettimeofday({1337759403, 197341}, NULL) = 0 getrusage(RUSAGE_SELF, {ru_utime={11, 153304}, ru_stime={81, 447618}, ...}) = 0 open(".", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3 gettimeofday({1337759403, 198489}, NULL) = 0 getdents(3, /* 836 entries */, 32768) = 32744 gettimeofday({1337759403, 200356}, NULL) = 0 . . unlink("00000000044O9bQcaS") = 0 gettimeofday({1337759403, 201996}, NULL) = 0 gettimeofday({1337759403, 202038}, NULL) = 0 unlink("0000000006F") = 0 . . unlink("0000000690olrSg") = 0 gettimeofday({1337759404, 968779}, NULL) = 0 gettimeofday({1337759404, 968869}, NULL) = 0 unlink("0000000692TMJPUaltX") = 0 gettimeofday({1337759404, 970634}, NULL) = 0 gettimeofday({1337759404, 970853}, NULL) = 0 . . getdents(3, 0x148f9a8, 32768) = -1 ELOOP (Too many levels of symbolic links) <----entry count seems to messed up (0x148f9a8) Manual rmdir of the same dir completes successfully. --- Additional comment from sgowda on 2012-05-23 05:40:20 EDT --- The issue is reproducible on a volume with a single brick setup. --- Additional comment from ksriniva on 2012-05-23 09:47:05 EDT --- dmesg says: NFS: directory /Bonnie.4974 contains a readdir loop. Please contact your server vendor. Offending cookie: 3140 Seems to be a known problem in nfs client: https://bugzilla.redhat.com/show_bug.cgi?id=790729 bug comment says the problem is not seen with RHEL6.1 kernel (kernel-2.6.32-131.21.1.el6.x86_64). Saurabh, can we test with rhel6.1? --- Additional comment from vbellur on 2012-05-28 04:00:23 EDT --- Seen only with RHEL 6.2 clients. Removing this from the blocker list.
Closing as worksforme, reopen the BZ if this issue is seen in RHS.
it still fails over RHS [root@QA-31 run22982]# time /opt/qa/tools/bonnie++ -u root -d . Using uid:0, gid:0. Writing a byte at a time...done Writing intelligently...done Rewriting...done Reading a byte at a time...done Reading intelligently...done start 'em...done...done...done...done...done... Create files in sequential order...done. Stat files in sequential order...done. Delete files in sequential order...Bonnie: drastic I/O error (rmdir): Directory not empty Cleaning up test directory after error. real 8m33.698s user 0m1.106s sys 0m30.777s 172.17.251.90:/dist-rep on /mnt/nfs-test type nfs (rw,vers=3,addr=172.17.251.90) server side information, [root@localhost ~]# glusterfs -V glusterfs 3.3.0.3rhs built on Oct 10 2012 09:16:19 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com> GlusterFS comes with ABSOLUTELY NO WARRANTY. You may redistribute copies of GlusterFS under the terms of the GNU General Public License. [root@localhost ~]# [root@localhost ~]# [root@localhost ~]# [root@localhost ~]# gluster volume info dist-rep Volume Name: dist-rep Type: Distributed-Replicate Volume ID: 6a96e222-0bca-46e2-bc3c-870e8d1b970a Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: 172.17.251.90:/export/dr Brick2: 172.17.251.91:/export/drr Brick3: 172.17.251.90:/export/ddr Brick4: 172.17.251.91:/export/ddrr [root@localhost ~]# [root@localhost ~]# [root@localhost ~]# uname -a Linux localhost.localdomain 2.6.32-220.28.1.el6.x86_64 #1 SMP Wed Oct 3 12:26:28 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux [root@localhost ~]#
Saurabh, What RHEL is this?
RHEL 6.2
Confirmed that bz770250 was the culprit on my system. I can reproduce with kernel-2.6.32-220.13.1.el6.x86_64 but upgrading to kernel-2.6.32-220.17.1.el6.x86_64 resolves the problem. Reproducer: $ bonnie++ -s 0 (bonnie++ package from EPEL, against RHS 2.0 nfs share)
*** This bug has been marked as a duplicate of bug 770250 ***