Bug 814052
Summary: | nfs:bonnie fails during delete files operation | ||||||
---|---|---|---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Saurabh <saujain> | ||||
Component: | nfs | Assignee: | Rajesh <rajesh> | ||||
Status: | CLOSED DUPLICATE | QA Contact: | Saurabh <saujain> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | pre-release | CC: | ashetty, gluster-bugs, kkeithle, mzywusko, rajesh, vagarwal, vbellur | ||||
Target Milestone: | --- | Keywords: | Triaged | ||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 849300 (view as bug list) | Environment: | |||||
Last Closed: | 2013-01-17 10:42:18 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 849300 | ||||||
Attachments: |
|
Description
Saurabh
2012-04-19 06:41:48 UTC
[2012-04-19 13:25:25.000414] W [nfs3.c:3529:nfs3svc_rmdir_cbk] 0-nfs: 3525853f: /Bonnie.2067 => -1 (Directory not empty) I tried to execute the bonnie again and didn't see the "Reply submission failed" error again. On the latest master branch, with 2x2 setup on nfs, bonnie++ runs to completion without any errors. Can you please verify if are still seeing the issue at hand? Bonnie still fails, [root@QA-31 nfs-test]# time /opt/qa/tools/bonnie++ -u root -d . Using uid:0, gid:0. Writing a byte at a time...done Writing intelligently...done Rewriting...done Reading a byte at a time...done Reading intelligently...done start 'em...done...done...done...done...done... Create files in sequential order...done. Stat files in sequential order...done. Delete files in sequential order...Bonnie: drastic I/O error (rmdir): Directory not empty Cleaning up test directory after error. real 7m57.541s user 0m0.877s sys 0m23.834s [root@QA-31 nfs-test]# mount | grep 79 172.17.251.79:/dist-rep on /mnt/nfs-test type nfs (rw,vers=3,addr=172.17.251.79) [root@QA-31 nfs-test]# it executed on the git sync to this commit id, commit 7503c63ee141931556cf066b9b255fc62cefcb68 Can you please upload the nfs logs? uploaded the logs in the sosreport sharing space Created attachment 585095 [details]
nfs log
There are a lot of failures in the log related to permission issues. Can you check the backend to verify if the exports/dirs are all in sync? Additionally self heal seems to be triggered(dint see any disconnect issues). [2012-05-16 02:07:07.331335] I [afr-self-heal-common.c:2045:afr_self_heal_completion_cbk] 0-dist-rep-replicate-0: background entry self-heal completed on /run5728/pf/d0 [2012-05-16 02:42:55.076115] W [nfs3.c:3680:nfs3svc_rename_cbk] 0-nfs: 6cde5811: rename /run5728/fstest_6ad66a2a9777ec05609e1ebc77b6a9fd/fstest_725a0f5 f18ca4cfe0ac27e391430c75f -> /run5728/fstest_6ad66a2a9777ec05609e1ebc77b6a9fd/fstest_4d8a7810831e467266aecf2ea7257744 => -1 (Permission denied) [2012-05-16 02:42:56.128295] W [client3_1-fops.c:258:client3_1_mknod_cbk] 0-dist-rep-client-0: remote operation failed: Permission denied. Path: /run57 28/fstest_6ad66a2a9777ec05609e1ebc77b6a9fd/fstest_725a0f5f18ca4cfe0ac27e391430c75f (c4eea594-b21d-4ce5-b07b-227434b2645f) [2012-05-16 02:42:56.128365] W [client3_1-fops.c:2456:client3_1_link_cbk] 0-dist-rep-client-2: remote operation failed: Permission denied (/run5728/fst est_6ad66a2a9777ec05609e1ebc77b6a9fd/fstest_725a0f5f18ca4cfe0ac27e391430c75f -> /run5728/fstest_6ad66a2a9777ec05609e1ebc77b6a9fd/fstest_4d8a7810831e467 266aecf2ea7257744) [2012-05-16 02:42:56.128401] W [client3_1-fops.c:258:client3_1_mknod_cbk] 0-dist-rep-client-1: remote operation failed: Permission denied. Path: /run57 28/fstest_6ad66a2a9777ec05609e1ebc77b6a9fd/fstest_725a0f5f18ca4cfe0ac27e391430c75f (c4eea594-b21d-4ce5-b07b-227434b2645f) [2012-05-16 02:42:56.128741] W [client3_1-fops.c:2456:client3_1_link_cbk] 0-dist-rep-client-3: remote operation failed: Permission denied (/run5728/fst est_6ad66a2a9777ec05609e1ebc77b6a9fd/fstest_725a0f5f18ca4cfe0ac27e391430c75f -> /run5728/fstest_6ad66a2a9777ec05609e1ebc77b6a9fd/fstest_4d8a7810831e467 266aecf2ea7257744) .... [2012-05-16 02:42:59.132116] W [client3_1-fops.c:649:client3_1_rmdir_cbk] 0-dist-rep-client-2: remote operation failed: Permission denied [2012-05-16 02:42:59.132480] W [client3_1-fops.c:649:client3_1_rmdir_cbk] 0-dist-rep-client-3: remote operation failed: Permission denied [2012-05-16 02:42:59.133179] W [client3_1-fops.c:649:client3_1_rmdir_cbk] 0-dist-rep-client-0: remote operation failed: Permission denied [2012-05-16 02:42:59.133523] W [client3_1-fops.c:649:client3_1_rmdir_cbk] 0-dist-rep-client-1: remote operation failed: Permission denied *** Bug 820580 has been marked as a duplicate of this bug. *** *** Bug 823389 has been marked as a duplicate of this bug. *** The test passes successfully on fuse mount. But consistently fails on NFS. It look like afr says no entires found in dir (readdir), when in fact it is populated with a lot of regular files (no dirs). rmdir hence fails, as no unlink calls seems to be sent on those files. Few more error logs(debug level): [2012-05-23 03:07:56.334103] D [nfs3-helpers.c:3489:nfs3_log_readdirp_res] 0-nfs-nfsv3: XID: 5181b434, READDIRPLUS: NFS: 0(Call completed successfully. ), POSIX: 117(Structure needs cleaning), dircount: 4096, maxcount: 32768, cverf: 38954220, is_eof: 0 [2012-05-23 03:07:56.334588] D [nfs3-helpers.c:3523:nfs3_log_readdir_call] 0-nfs-nfsv3: XID: 5281b434, READDIRPLUS: args: FH: hashcount 1, exportid b13 c5fef-b24d-4ad3-bf89-5639e1de3ed7, gfid fbba9aeb-795a-4954-bd0a-ce478d537274, dircount: 4096, maxcount: 32768 [2012-05-23 03:07:56.334651] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-0: Returning 0, call_child: 1, last_index: -1 [2012-05-23 03:07:56.336463] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-0: Returning 0, call_child: 1, last_index: -1 [2012-05-23 03:07:56.336712] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-1: Returning 0, call_child: 1, last_index: -1 [2012-05-23 03:07:56.336974] D [nfs3-helpers.c:3489:nfs3_log_readdirp_res] 0-nfs-nfsv3: XID: 5281b434, READDIRPLUS: NFS: 0(Call completed successfully. ), POSIX: 117(Structure needs cleaning), dircount: 4096, maxcount: 32768, cverf: 38954220, is_eof: 0 [2012-05-23 03:07:56.337306] D [nfs3-helpers.c:3523:nfs3_log_readdir_call] 0-nfs-nfsv3: XID: 5381b434, READDIRPLUS: args: FH: hashcount 1, exportid b13 c5fef-b24d-4ad3-bf89-5639e1de3ed7, gfid fbba9aeb-795a-4954-bd0a-ce478d537274, dircount: 4096, maxcount: 32768 [2012-05-23 03:07:56.337369] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-0: Returning 0, call_child: 1, last_index: -1 [2012-05-23 03:07:56.339152] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-0: Returning 0, call_child: 1, last_index: -1 [2012-05-23 03:07:56.339267] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-1: Returning 0, call_child: 1, last_index: -1 [2012-05-23 03:07:56.339632] D [nfs3-helpers.c:3489:nfs3_log_readdirp_res] 0-nfs-nfsv3: XID: 5381b434, READDIRPLUS: NFS: 0(Call completed successfully.), POSIX: 117(Structure needs cleaning), dircount: 4096, maxcount: 32768, cverf: 38954220, is_eof: 0 [2012-05-23 03:07:56.339992] D [nfs3-helpers.c:1627:nfs3_log_common_call] 0-nfs-nfsv3: XID: 5481b434, ACCESS: args: FH: hashcount 1, exportid b13c5fef-b24d-4ad3-bf89-5639e1de3ed7, gfid fbba9aeb-795a-4954-bd0a-ce478d537274 [2012-05-23 03:07:56.340054] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-0: Returning 0, call_child: 1, last_index: -1 [2012-05-23 03:07:56.340310] D [nfs3-helpers.c:3389:nfs3_log_common_res] 0-nfs-nfsv3: XID: 5481b434, ACCESS: NFS: 0(Call completed successfully.), POSIX: 7(Argument list too long) [2012-05-23 03:07:56.340558] D [nfs3-helpers.c:1627:nfs3_log_common_call] 0-nfs-nfsv3: XID: 5581b434, ACCESS: args: FH: hashcount 0, exportid b13c5fef-b24d-4ad3-bf89-5639e1de3ed7, gfid 00000000-0000-0000-0000-000000000001 [2012-05-23 03:07:56.340608] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-0: Returning 0, call_child: 1, last_index: -1 [2012-05-23 03:07:56.340781] D [nfs3-helpers.c:3389:nfs3_log_common_res] 0-nfs-nfsv3: XID: 5581b434, ACCESS: NFS: 0(Call completed successfully.), POSIX: 7(Argument list too long) [2012-05-23 03:07:56.347287] D [nfs3-helpers.c:1641:nfs3_log_fh_entry_call] 0-nfs-nfsv3: XID: 5681b434, RMDIR: args: FH: hashcount 0, exportid b13c5fef-b24d-4ad3-bf89-5639e1de3ed7, gfid 00000000-0000-0000-0000-000000000001, name: Bonnie.2337 [2012-05-23 03:07:56.387361] D [afr-dir-read.c:126:afr_examine_dir_readdir_cbk] 0-test-replicate-1: /Bonnie.2337: no entries found in test-client-3 [2012-05-23 03:07:56.387837] D [afr-dir-read.c:126:afr_examine_dir_readdir_cbk] 0-test-replicate-1: /Bonnie.2337: no entries found in test-client-2 [2012-05-23 03:07:56.387907] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-1: Returning 0, call_child: 1, last_index: -1 [2012-05-23 03:07:56.390288] D [afr-dir-read.c:126:afr_examine_dir_readdir_cbk] 0-test-replicate-0: /Bonnie.2337: no entries found in test-client-1 [2012-05-23 03:07:56.397310] D [afr-dir-read.c:126:afr_examine_dir_readdir_cbk] 0-test-replicate-0: /Bonnie.2337: no entries found in test-client-0 [2012-05-23 03:07:56.397442] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-0: Returning 0, call_child: 1, last_index: -1 [2012-05-23 03:07:56.399452] W [nfs3.c:3525:nfs3svc_rmdir_cbk] 0-nfs: 5681b434: /Bonnie.2337 => -1 (Directory not empty) [2012-05-23 03:07:56.399506] D [nfs3-helpers.c:3389:nfs3_log_common_res] 0-nfs-nfsv3: XID: 5681b434, RMDIR: NFS: 66(Directory not empty), POSIX: 39(Directory not empty) strace o/p of bonnie: write(2, "done.\nDelete files in sequential"..., 41) = 41 gettimeofday({1337759403, 197341}, NULL) = 0 getrusage(RUSAGE_SELF, {ru_utime={11, 153304}, ru_stime={81, 447618}, ...}) = 0 open(".", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3 gettimeofday({1337759403, 198489}, NULL) = 0 getdents(3, /* 836 entries */, 32768) = 32744 gettimeofday({1337759403, 200356}, NULL) = 0 . . unlink("00000000044O9bQcaS") = 0 gettimeofday({1337759403, 201996}, NULL) = 0 gettimeofday({1337759403, 202038}, NULL) = 0 unlink("0000000006F") = 0 . . unlink("0000000690olrSg") = 0 gettimeofday({1337759404, 968779}, NULL) = 0 gettimeofday({1337759404, 968869}, NULL) = 0 unlink("0000000692TMJPUaltX") = 0 gettimeofday({1337759404, 970634}, NULL) = 0 gettimeofday({1337759404, 970853}, NULL) = 0 . . getdents(3, 0x148f9a8, 32768) = -1 ELOOP (Too many levels of symbolic links) <----entry count seems to messed up (0x148f9a8) Manual rmdir of the same dir completes successfully. The issue is reproducible on a volume with a single brick setup. dmesg says: NFS: directory /Bonnie.4974 contains a readdir loop. Please contact your server vendor. Offending cookie: 3140 Seems to be a known problem in nfs client: https://bugzilla.redhat.com/show_bug.cgi?id=790729 bug comment says the problem is not seen with RHEL6.1 kernel (kernel-2.6.32-131.21.1.el6.x86_64). Saurabh, can we test with rhel6.1? Seen only with RHEL 6.2 clients. Removing this from the blocker list. *** This bug has been marked as a duplicate of bug 770250 *** |