Bug 814052 - nfs:bonnie fails during delete files operation
nfs:bonnie fails during delete files operation
Status: CLOSED DUPLICATE of bug 770250
Product: GlusterFS
Classification: Community
Component: nfs (Show other bugs)
pre-release
x86_64 Linux
high Severity high
: ---
: ---
Assigned To: Rajesh
Saurabh
: Triaged
: 820580 823389 (view as bug list)
Depends On:
Blocks: 849300
  Show dependency treegraph
 
Reported: 2012-04-19 02:41 EDT by Saurabh
Modified: 2016-01-19 01:10 EST (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 849300 (view as bug list)
Environment:
Last Closed: 2013-01-17 05:42:18 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
nfs log (615.71 KB, text/x-log)
2012-05-17 01:03 EDT, shishir gowda
no flags Details

  None (edit)
Description Saurabh 2012-04-19 02:41:48 EDT
Description of problem:
[root@longname-nlmtest-verification-purpose nfs-test]# time /opt/qa/tools/bonnie++ -u root -d .
Using uid:0, gid:0.
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...Bonnie: drastic I/O error (rmdir): Directory not empty
Cleaning up test directory after error.

real    7m5.023s
user    0m1.045s
sys     0m33.545s



Version-Release number of selected component (if applicable):
3.3.0qa35

How reproducible:
always

Steps to Reproduce:
1. mount the volume using nfs
2. execute the bonnie++
3.
  
Actual results:
the test fails during the delete files operation

Expected results:
suppose to finish it full

Additional info:

logs from /var/log/glusterfs/nfs.log

[2012-04-19 06:32:42.655735] E [rpcsvc.c:1084:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x936467579x, Program: NFS3, ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server)
[2012-04-19 06:32:42.655778] E [nfs3.c:576:nfs3svc_submit_reply] 0-nfs-nfsv3: Reply submission failed
[2012-04-19 06:32:42.656161] E [rpcsvc.c:1084:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x953244795x, Program: NFS3, ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server)
[2012-04-19 06:32:42.656241] E [nfs3.c:576:nfs3svc_submit_reply] 0-nfs-nfsv3: Reply submission failed
[2012-04-19 06:32:42.656355] E [rpcsvc.c:1084:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x970022011x, Program: NFS3, ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server)
[2012-04-19 06:32:42.656459] E [nfs3.c:576:nfs3svc_submit_reply] 0-nfs-nfsv3: Reply submission failed
[2012-04-19 06:38:54.384928] W [nfs3.c:3529:nfs3svc_rmdir_cbk] 0-nfs: 567627b: /Bonnie.1995 => -1 (Directory not empty)
Comment 1 Saurabh 2012-04-19 09:23:34 EDT
[2012-04-19 13:25:25.000414] W [nfs3.c:3529:nfs3svc_rmdir_cbk] 0-nfs: 3525853f: /Bonnie.2067 => -1 (Directory not empty)


I tried to execute the bonnie again and didn't see the 
"Reply submission failed" error again.
Comment 2 shishir gowda 2012-05-14 03:37:07 EDT
On the latest master branch, with 2x2 setup on nfs, bonnie++ runs to completion without any errors.
Can you please verify if are still seeing the issue at hand?
Comment 3 Saurabh 2012-05-16 04:41:48 EDT
Bonnie still fails,

[root@QA-31 nfs-test]# time /opt/qa/tools/bonnie++ -u root -d .
Using uid:0, gid:0.
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...Bonnie: drastic I/O error (rmdir): Directory not empty
Cleaning up test directory after error.

real    7m57.541s
user    0m0.877s
sys     0m23.834s
[root@QA-31 nfs-test]# mount | grep 79
172.17.251.79:/dist-rep on /mnt/nfs-test type nfs (rw,vers=3,addr=172.17.251.79)
[root@QA-31 nfs-test]# 

it executed on the git sync to this commit id, 
commit 7503c63ee141931556cf066b9b255fc62cefcb68
Comment 4 shishir gowda 2012-05-16 04:48:12 EDT
Can you please upload the nfs logs?
Comment 5 Saurabh 2012-05-16 06:07:44 EDT
uploaded the logs in the sosreport sharing space
Comment 6 shishir gowda 2012-05-17 01:03:49 EDT
Created attachment 585095 [details]
nfs log
Comment 7 shishir gowda 2012-05-17 02:41:36 EDT
There are a lot of failures in the log related to permission issues.

Can you check the backend to verify if the exports/dirs are all in sync?

Additionally self heal seems to be triggered(dint see any disconnect issues).

[2012-05-16 02:07:07.331335] I [afr-self-heal-common.c:2045:afr_self_heal_completion_cbk] 0-dist-rep-replicate-0: background  entry self-heal completed on /run5728/pf/d0


[2012-05-16 02:42:55.076115] W [nfs3.c:3680:nfs3svc_rename_cbk] 0-nfs: 6cde5811: rename /run5728/fstest_6ad66a2a9777ec05609e1ebc77b6a9fd/fstest_725a0f5
f18ca4cfe0ac27e391430c75f -> /run5728/fstest_6ad66a2a9777ec05609e1ebc77b6a9fd/fstest_4d8a7810831e467266aecf2ea7257744 => -1 (Permission denied)
[2012-05-16 02:42:56.128295] W [client3_1-fops.c:258:client3_1_mknod_cbk] 0-dist-rep-client-0: remote operation failed: Permission denied. Path: /run57
28/fstest_6ad66a2a9777ec05609e1ebc77b6a9fd/fstest_725a0f5f18ca4cfe0ac27e391430c75f (c4eea594-b21d-4ce5-b07b-227434b2645f)
[2012-05-16 02:42:56.128365] W [client3_1-fops.c:2456:client3_1_link_cbk] 0-dist-rep-client-2: remote operation failed: Permission denied (/run5728/fst
est_6ad66a2a9777ec05609e1ebc77b6a9fd/fstest_725a0f5f18ca4cfe0ac27e391430c75f -> /run5728/fstest_6ad66a2a9777ec05609e1ebc77b6a9fd/fstest_4d8a7810831e467
266aecf2ea7257744)
[2012-05-16 02:42:56.128401] W [client3_1-fops.c:258:client3_1_mknod_cbk] 0-dist-rep-client-1: remote operation failed: Permission denied. Path: /run57
28/fstest_6ad66a2a9777ec05609e1ebc77b6a9fd/fstest_725a0f5f18ca4cfe0ac27e391430c75f (c4eea594-b21d-4ce5-b07b-227434b2645f)
[2012-05-16 02:42:56.128741] W [client3_1-fops.c:2456:client3_1_link_cbk] 0-dist-rep-client-3: remote operation failed: Permission denied (/run5728/fst
est_6ad66a2a9777ec05609e1ebc77b6a9fd/fstest_725a0f5f18ca4cfe0ac27e391430c75f -> /run5728/fstest_6ad66a2a9777ec05609e1ebc77b6a9fd/fstest_4d8a7810831e467
266aecf2ea7257744)
....


[2012-05-16 02:42:59.132116] W [client3_1-fops.c:649:client3_1_rmdir_cbk] 0-dist-rep-client-2: remote operation failed: Permission denied
[2012-05-16 02:42:59.132480] W [client3_1-fops.c:649:client3_1_rmdir_cbk] 0-dist-rep-client-3: remote operation failed: Permission denied
[2012-05-16 02:42:59.133179] W [client3_1-fops.c:649:client3_1_rmdir_cbk] 0-dist-rep-client-0: remote operation failed: Permission denied
[2012-05-16 02:42:59.133523] W [client3_1-fops.c:649:client3_1_rmdir_cbk] 0-dist-rep-client-1: remote operation failed: Permission denied
Comment 8 Vijay Bellur 2012-05-18 02:53:25 EDT
*** Bug 820580 has been marked as a duplicate of this bug. ***
Comment 9 shishir gowda 2012-05-22 01:01:09 EDT
*** Bug 823389 has been marked as a duplicate of this bug. ***
Comment 10 shishir gowda 2012-05-23 03:11:59 EDT
The test passes successfully on fuse mount. But consistently fails on NFS.
It look like afr says no entires found in dir (readdir), when in fact it is populated with a lot of regular files (no dirs). rmdir hence fails, as no unlink calls seems to be sent on those files.

Few more error logs(debug level):
[2012-05-23 03:07:56.334103] D [nfs3-helpers.c:3489:nfs3_log_readdirp_res] 0-nfs-nfsv3: XID: 5181b434, READDIRPLUS: NFS: 0(Call completed successfully.
), POSIX: 117(Structure needs cleaning), dircount: 4096, maxcount: 32768, cverf: 38954220, is_eof: 0
[2012-05-23 03:07:56.334588] D [nfs3-helpers.c:3523:nfs3_log_readdir_call] 0-nfs-nfsv3: XID: 5281b434, READDIRPLUS: args: FH: hashcount 1, exportid b13
c5fef-b24d-4ad3-bf89-5639e1de3ed7, gfid fbba9aeb-795a-4954-bd0a-ce478d537274, dircount: 4096, maxcount: 32768
[2012-05-23 03:07:56.334651] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-0: Returning 0, call_child: 1, last_index: -1
[2012-05-23 03:07:56.336463] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-0: Returning 0, call_child: 1, last_index: -1
[2012-05-23 03:07:56.336712] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-1: Returning 0, call_child: 1, last_index: -1
[2012-05-23 03:07:56.336974] D [nfs3-helpers.c:3489:nfs3_log_readdirp_res] 0-nfs-nfsv3: XID: 5281b434, READDIRPLUS: NFS: 0(Call completed successfully.
), POSIX: 117(Structure needs cleaning), dircount: 4096, maxcount: 32768, cverf: 38954220, is_eof: 0
[2012-05-23 03:07:56.337306] D [nfs3-helpers.c:3523:nfs3_log_readdir_call] 0-nfs-nfsv3: XID: 5381b434, READDIRPLUS: args: FH: hashcount 1, exportid b13
c5fef-b24d-4ad3-bf89-5639e1de3ed7, gfid fbba9aeb-795a-4954-bd0a-ce478d537274, dircount: 4096, maxcount: 32768
[2012-05-23 03:07:56.337369] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-0: Returning 0, call_child: 1, last_index: -1
[2012-05-23 03:07:56.339152] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-0: Returning 0, call_child: 1, last_index: -1
[2012-05-23 03:07:56.339267] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-1: Returning 0, call_child: 1, last_index: -1
[2012-05-23 03:07:56.339632] D [nfs3-helpers.c:3489:nfs3_log_readdirp_res] 0-nfs-nfsv3: XID: 5381b434, READDIRPLUS: NFS: 0(Call completed successfully.), POSIX: 117(Structure needs cleaning), dircount: 4096, maxcount: 32768, cverf: 38954220, is_eof: 0
[2012-05-23 03:07:56.339992] D [nfs3-helpers.c:1627:nfs3_log_common_call] 0-nfs-nfsv3: XID: 5481b434, ACCESS: args: FH: hashcount 1, exportid b13c5fef-b24d-4ad3-bf89-5639e1de3ed7, gfid fbba9aeb-795a-4954-bd0a-ce478d537274
[2012-05-23 03:07:56.340054] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-0: Returning 0, call_child: 1, last_index: -1
[2012-05-23 03:07:56.340310] D [nfs3-helpers.c:3389:nfs3_log_common_res] 0-nfs-nfsv3: XID: 5481b434, ACCESS: NFS: 0(Call completed successfully.), POSIX: 7(Argument list too long)
[2012-05-23 03:07:56.340558] D [nfs3-helpers.c:1627:nfs3_log_common_call] 0-nfs-nfsv3: XID: 5581b434, ACCESS: args: FH: hashcount 0, exportid b13c5fef-b24d-4ad3-bf89-5639e1de3ed7, gfid 00000000-0000-0000-0000-000000000001
[2012-05-23 03:07:56.340608] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-0: Returning 0, call_child: 1, last_index: -1
[2012-05-23 03:07:56.340781] D [nfs3-helpers.c:3389:nfs3_log_common_res] 0-nfs-nfsv3: XID: 5581b434, ACCESS: NFS: 0(Call completed successfully.), POSIX: 7(Argument list too long)
[2012-05-23 03:07:56.347287] D [nfs3-helpers.c:1641:nfs3_log_fh_entry_call] 0-nfs-nfsv3: XID: 5681b434, RMDIR: args: FH: hashcount 0, exportid b13c5fef-b24d-4ad3-bf89-5639e1de3ed7, gfid 00000000-0000-0000-0000-000000000001, name: Bonnie.2337
[2012-05-23 03:07:56.387361] D [afr-dir-read.c:126:afr_examine_dir_readdir_cbk] 0-test-replicate-1: /Bonnie.2337: no entries found in test-client-3
[2012-05-23 03:07:56.387837] D [afr-dir-read.c:126:afr_examine_dir_readdir_cbk] 0-test-replicate-1: /Bonnie.2337: no entries found in test-client-2
[2012-05-23 03:07:56.387907] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-1: Returning 0, call_child: 1, last_index: -1
[2012-05-23 03:07:56.390288] D [afr-dir-read.c:126:afr_examine_dir_readdir_cbk] 0-test-replicate-0: /Bonnie.2337: no entries found in test-client-1
[2012-05-23 03:07:56.397310] D [afr-dir-read.c:126:afr_examine_dir_readdir_cbk] 0-test-replicate-0: /Bonnie.2337: no entries found in test-client-0
[2012-05-23 03:07:56.397442] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-0: Returning 0, call_child: 1, last_index: -1
[2012-05-23 03:07:56.399452] W [nfs3.c:3525:nfs3svc_rmdir_cbk] 0-nfs: 5681b434: /Bonnie.2337 => -1 (Directory not empty)
[2012-05-23 03:07:56.399506] D [nfs3-helpers.c:3389:nfs3_log_common_res] 0-nfs-nfsv3: XID: 5681b434, RMDIR: NFS: 66(Directory not empty), POSIX: 39(Directory not empty)
Comment 11 shishir gowda 2012-05-23 03:48:25 EDT
strace o/p of bonnie:

write(2, "done.\nDelete files in sequential"..., 41) = 41
gettimeofday({1337759403, 197341}, NULL) = 0
getrusage(RUSAGE_SELF, {ru_utime={11, 153304}, ru_stime={81, 447618}, ...}) = 0
open(".", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
gettimeofday({1337759403, 198489}, NULL) = 0
getdents(3, /* 836 entries */, 32768)   = 32744
gettimeofday({1337759403, 200356}, NULL) = 0
.
.
unlink("00000000044O9bQcaS")            = 0
gettimeofday({1337759403, 201996}, NULL) = 0
gettimeofday({1337759403, 202038}, NULL) = 0
unlink("0000000006F")                   = 0
.
.
unlink("0000000690olrSg")               = 0
gettimeofday({1337759404, 968779}, NULL) = 0
gettimeofday({1337759404, 968869}, NULL) = 0
unlink("0000000692TMJPUaltX")           = 0
gettimeofday({1337759404, 970634}, NULL) = 0
gettimeofday({1337759404, 970853}, NULL) = 0
.
.

getdents(3, 0x148f9a8, 32768)           = -1 ELOOP (Too many levels of symbolic links)   <----entry count seems to messed up (0x148f9a8)

Manual rmdir of the same dir completes successfully.
Comment 12 shishir gowda 2012-05-23 05:40:20 EDT
The issue is reproducible on a volume with a single brick setup.
Comment 13 Krishna Srinivas 2012-05-23 09:47:05 EDT
dmesg says:
NFS: directory /Bonnie.4974 contains a readdir loop.  Please contact your server vendor.  Offending cookie: 3140

Seems to be a known problem in nfs client: https://bugzilla.redhat.com/show_bug.cgi?id=790729

bug comment says the problem is not seen with RHEL6.1 kernel (kernel-2.6.32-131.21.1.el6.x86_64). 

Saurabh, can we test with rhel6.1?
Comment 14 Vijay Bellur 2012-05-28 04:00:23 EDT
Seen only with RHEL 6.2 clients. Removing this from the blocker list.
Comment 15 Rajesh 2013-01-17 05:42:18 EST

*** This bug has been marked as a duplicate of bug 770250 ***

Note You need to log in before you can comment on or make changes to this bug.