Bug 814052

Summary: nfs:bonnie fails during delete files operation
Product: [Community] GlusterFS Reporter: Saurabh <saujain>
Component: nfsAssignee: Rajesh <rajesh>
Status: CLOSED DUPLICATE QA Contact: Saurabh <saujain>
Severity: high Docs Contact:
Priority: high    
Version: pre-releaseCC: ashetty, gluster-bugs, kkeithle, mzywusko, rajesh, vagarwal, vbellur
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 849300 (view as bug list) Environment:
Last Closed: 2013-01-17 10:42:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 849300    
Attachments:
Description Flags
nfs log none

Description Saurabh 2012-04-19 06:41:48 UTC
Description of problem:
[root@longname-nlmtest-verification-purpose nfs-test]# time /opt/qa/tools/bonnie++ -u root -d .
Using uid:0, gid:0.
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...Bonnie: drastic I/O error (rmdir): Directory not empty
Cleaning up test directory after error.

real    7m5.023s
user    0m1.045s
sys     0m33.545s



Version-Release number of selected component (if applicable):
3.3.0qa35

How reproducible:
always

Steps to Reproduce:
1. mount the volume using nfs
2. execute the bonnie++
3.
  
Actual results:
the test fails during the delete files operation

Expected results:
suppose to finish it full

Additional info:

logs from /var/log/glusterfs/nfs.log

[2012-04-19 06:32:42.655735] E [rpcsvc.c:1084:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x936467579x, Program: NFS3, ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server)
[2012-04-19 06:32:42.655778] E [nfs3.c:576:nfs3svc_submit_reply] 0-nfs-nfsv3: Reply submission failed
[2012-04-19 06:32:42.656161] E [rpcsvc.c:1084:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x953244795x, Program: NFS3, ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server)
[2012-04-19 06:32:42.656241] E [nfs3.c:576:nfs3svc_submit_reply] 0-nfs-nfsv3: Reply submission failed
[2012-04-19 06:32:42.656355] E [rpcsvc.c:1084:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x970022011x, Program: NFS3, ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server)
[2012-04-19 06:32:42.656459] E [nfs3.c:576:nfs3svc_submit_reply] 0-nfs-nfsv3: Reply submission failed
[2012-04-19 06:38:54.384928] W [nfs3.c:3529:nfs3svc_rmdir_cbk] 0-nfs: 567627b: /Bonnie.1995 => -1 (Directory not empty)

Comment 1 Saurabh 2012-04-19 13:23:34 UTC
[2012-04-19 13:25:25.000414] W [nfs3.c:3529:nfs3svc_rmdir_cbk] 0-nfs: 3525853f: /Bonnie.2067 => -1 (Directory not empty)


I tried to execute the bonnie again and didn't see the 
"Reply submission failed" error again.

Comment 2 shishir gowda 2012-05-14 07:37:07 UTC
On the latest master branch, with 2x2 setup on nfs, bonnie++ runs to completion without any errors.
Can you please verify if are still seeing the issue at hand?

Comment 3 Saurabh 2012-05-16 08:41:48 UTC
Bonnie still fails,

[root@QA-31 nfs-test]# time /opt/qa/tools/bonnie++ -u root -d .
Using uid:0, gid:0.
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...Bonnie: drastic I/O error (rmdir): Directory not empty
Cleaning up test directory after error.

real    7m57.541s
user    0m0.877s
sys     0m23.834s
[root@QA-31 nfs-test]# mount | grep 79
172.17.251.79:/dist-rep on /mnt/nfs-test type nfs (rw,vers=3,addr=172.17.251.79)
[root@QA-31 nfs-test]# 

it executed on the git sync to this commit id, 
commit 7503c63ee141931556cf066b9b255fc62cefcb68

Comment 4 shishir gowda 2012-05-16 08:48:12 UTC
Can you please upload the nfs logs?

Comment 5 Saurabh 2012-05-16 10:07:44 UTC
uploaded the logs in the sosreport sharing space

Comment 6 shishir gowda 2012-05-17 05:03:49 UTC
Created attachment 585095 [details]
nfs log

Comment 7 shishir gowda 2012-05-17 06:41:36 UTC
There are a lot of failures in the log related to permission issues.

Can you check the backend to verify if the exports/dirs are all in sync?

Additionally self heal seems to be triggered(dint see any disconnect issues).

[2012-05-16 02:07:07.331335] I [afr-self-heal-common.c:2045:afr_self_heal_completion_cbk] 0-dist-rep-replicate-0: background  entry self-heal completed on /run5728/pf/d0


[2012-05-16 02:42:55.076115] W [nfs3.c:3680:nfs3svc_rename_cbk] 0-nfs: 6cde5811: rename /run5728/fstest_6ad66a2a9777ec05609e1ebc77b6a9fd/fstest_725a0f5
f18ca4cfe0ac27e391430c75f -> /run5728/fstest_6ad66a2a9777ec05609e1ebc77b6a9fd/fstest_4d8a7810831e467266aecf2ea7257744 => -1 (Permission denied)
[2012-05-16 02:42:56.128295] W [client3_1-fops.c:258:client3_1_mknod_cbk] 0-dist-rep-client-0: remote operation failed: Permission denied. Path: /run57
28/fstest_6ad66a2a9777ec05609e1ebc77b6a9fd/fstest_725a0f5f18ca4cfe0ac27e391430c75f (c4eea594-b21d-4ce5-b07b-227434b2645f)
[2012-05-16 02:42:56.128365] W [client3_1-fops.c:2456:client3_1_link_cbk] 0-dist-rep-client-2: remote operation failed: Permission denied (/run5728/fst
est_6ad66a2a9777ec05609e1ebc77b6a9fd/fstest_725a0f5f18ca4cfe0ac27e391430c75f -> /run5728/fstest_6ad66a2a9777ec05609e1ebc77b6a9fd/fstest_4d8a7810831e467
266aecf2ea7257744)
[2012-05-16 02:42:56.128401] W [client3_1-fops.c:258:client3_1_mknod_cbk] 0-dist-rep-client-1: remote operation failed: Permission denied. Path: /run57
28/fstest_6ad66a2a9777ec05609e1ebc77b6a9fd/fstest_725a0f5f18ca4cfe0ac27e391430c75f (c4eea594-b21d-4ce5-b07b-227434b2645f)
[2012-05-16 02:42:56.128741] W [client3_1-fops.c:2456:client3_1_link_cbk] 0-dist-rep-client-3: remote operation failed: Permission denied (/run5728/fst
est_6ad66a2a9777ec05609e1ebc77b6a9fd/fstest_725a0f5f18ca4cfe0ac27e391430c75f -> /run5728/fstest_6ad66a2a9777ec05609e1ebc77b6a9fd/fstest_4d8a7810831e467
266aecf2ea7257744)
....


[2012-05-16 02:42:59.132116] W [client3_1-fops.c:649:client3_1_rmdir_cbk] 0-dist-rep-client-2: remote operation failed: Permission denied
[2012-05-16 02:42:59.132480] W [client3_1-fops.c:649:client3_1_rmdir_cbk] 0-dist-rep-client-3: remote operation failed: Permission denied
[2012-05-16 02:42:59.133179] W [client3_1-fops.c:649:client3_1_rmdir_cbk] 0-dist-rep-client-0: remote operation failed: Permission denied
[2012-05-16 02:42:59.133523] W [client3_1-fops.c:649:client3_1_rmdir_cbk] 0-dist-rep-client-1: remote operation failed: Permission denied

Comment 8 Vijay Bellur 2012-05-18 06:53:25 UTC
*** Bug 820580 has been marked as a duplicate of this bug. ***

Comment 9 shishir gowda 2012-05-22 05:01:09 UTC
*** Bug 823389 has been marked as a duplicate of this bug. ***

Comment 10 shishir gowda 2012-05-23 07:11:59 UTC
The test passes successfully on fuse mount. But consistently fails on NFS.
It look like afr says no entires found in dir (readdir), when in fact it is populated with a lot of regular files (no dirs). rmdir hence fails, as no unlink calls seems to be sent on those files.

Few more error logs(debug level):
[2012-05-23 03:07:56.334103] D [nfs3-helpers.c:3489:nfs3_log_readdirp_res] 0-nfs-nfsv3: XID: 5181b434, READDIRPLUS: NFS: 0(Call completed successfully.
), POSIX: 117(Structure needs cleaning), dircount: 4096, maxcount: 32768, cverf: 38954220, is_eof: 0
[2012-05-23 03:07:56.334588] D [nfs3-helpers.c:3523:nfs3_log_readdir_call] 0-nfs-nfsv3: XID: 5281b434, READDIRPLUS: args: FH: hashcount 1, exportid b13
c5fef-b24d-4ad3-bf89-5639e1de3ed7, gfid fbba9aeb-795a-4954-bd0a-ce478d537274, dircount: 4096, maxcount: 32768
[2012-05-23 03:07:56.334651] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-0: Returning 0, call_child: 1, last_index: -1
[2012-05-23 03:07:56.336463] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-0: Returning 0, call_child: 1, last_index: -1
[2012-05-23 03:07:56.336712] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-1: Returning 0, call_child: 1, last_index: -1
[2012-05-23 03:07:56.336974] D [nfs3-helpers.c:3489:nfs3_log_readdirp_res] 0-nfs-nfsv3: XID: 5281b434, READDIRPLUS: NFS: 0(Call completed successfully.
), POSIX: 117(Structure needs cleaning), dircount: 4096, maxcount: 32768, cverf: 38954220, is_eof: 0
[2012-05-23 03:07:56.337306] D [nfs3-helpers.c:3523:nfs3_log_readdir_call] 0-nfs-nfsv3: XID: 5381b434, READDIRPLUS: args: FH: hashcount 1, exportid b13
c5fef-b24d-4ad3-bf89-5639e1de3ed7, gfid fbba9aeb-795a-4954-bd0a-ce478d537274, dircount: 4096, maxcount: 32768
[2012-05-23 03:07:56.337369] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-0: Returning 0, call_child: 1, last_index: -1
[2012-05-23 03:07:56.339152] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-0: Returning 0, call_child: 1, last_index: -1
[2012-05-23 03:07:56.339267] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-1: Returning 0, call_child: 1, last_index: -1
[2012-05-23 03:07:56.339632] D [nfs3-helpers.c:3489:nfs3_log_readdirp_res] 0-nfs-nfsv3: XID: 5381b434, READDIRPLUS: NFS: 0(Call completed successfully.), POSIX: 117(Structure needs cleaning), dircount: 4096, maxcount: 32768, cverf: 38954220, is_eof: 0
[2012-05-23 03:07:56.339992] D [nfs3-helpers.c:1627:nfs3_log_common_call] 0-nfs-nfsv3: XID: 5481b434, ACCESS: args: FH: hashcount 1, exportid b13c5fef-b24d-4ad3-bf89-5639e1de3ed7, gfid fbba9aeb-795a-4954-bd0a-ce478d537274
[2012-05-23 03:07:56.340054] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-0: Returning 0, call_child: 1, last_index: -1
[2012-05-23 03:07:56.340310] D [nfs3-helpers.c:3389:nfs3_log_common_res] 0-nfs-nfsv3: XID: 5481b434, ACCESS: NFS: 0(Call completed successfully.), POSIX: 7(Argument list too long)
[2012-05-23 03:07:56.340558] D [nfs3-helpers.c:1627:nfs3_log_common_call] 0-nfs-nfsv3: XID: 5581b434, ACCESS: args: FH: hashcount 0, exportid b13c5fef-b24d-4ad3-bf89-5639e1de3ed7, gfid 00000000-0000-0000-0000-000000000001
[2012-05-23 03:07:56.340608] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-0: Returning 0, call_child: 1, last_index: -1
[2012-05-23 03:07:56.340781] D [nfs3-helpers.c:3389:nfs3_log_common_res] 0-nfs-nfsv3: XID: 5581b434, ACCESS: NFS: 0(Call completed successfully.), POSIX: 7(Argument list too long)
[2012-05-23 03:07:56.347287] D [nfs3-helpers.c:1641:nfs3_log_fh_entry_call] 0-nfs-nfsv3: XID: 5681b434, RMDIR: args: FH: hashcount 0, exportid b13c5fef-b24d-4ad3-bf89-5639e1de3ed7, gfid 00000000-0000-0000-0000-000000000001, name: Bonnie.2337
[2012-05-23 03:07:56.387361] D [afr-dir-read.c:126:afr_examine_dir_readdir_cbk] 0-test-replicate-1: /Bonnie.2337: no entries found in test-client-3
[2012-05-23 03:07:56.387837] D [afr-dir-read.c:126:afr_examine_dir_readdir_cbk] 0-test-replicate-1: /Bonnie.2337: no entries found in test-client-2
[2012-05-23 03:07:56.387907] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-1: Returning 0, call_child: 1, last_index: -1
[2012-05-23 03:07:56.390288] D [afr-dir-read.c:126:afr_examine_dir_readdir_cbk] 0-test-replicate-0: /Bonnie.2337: no entries found in test-client-1
[2012-05-23 03:07:56.397310] D [afr-dir-read.c:126:afr_examine_dir_readdir_cbk] 0-test-replicate-0: /Bonnie.2337: no entries found in test-client-0
[2012-05-23 03:07:56.397442] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-0: Returning 0, call_child: 1, last_index: -1
[2012-05-23 03:07:56.399452] W [nfs3.c:3525:nfs3svc_rmdir_cbk] 0-nfs: 5681b434: /Bonnie.2337 => -1 (Directory not empty)
[2012-05-23 03:07:56.399506] D [nfs3-helpers.c:3389:nfs3_log_common_res] 0-nfs-nfsv3: XID: 5681b434, RMDIR: NFS: 66(Directory not empty), POSIX: 39(Directory not empty)

Comment 11 shishir gowda 2012-05-23 07:48:25 UTC
strace o/p of bonnie:

write(2, "done.\nDelete files in sequential"..., 41) = 41
gettimeofday({1337759403, 197341}, NULL) = 0
getrusage(RUSAGE_SELF, {ru_utime={11, 153304}, ru_stime={81, 447618}, ...}) = 0
open(".", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
gettimeofday({1337759403, 198489}, NULL) = 0
getdents(3, /* 836 entries */, 32768)   = 32744
gettimeofday({1337759403, 200356}, NULL) = 0
.
.
unlink("00000000044O9bQcaS")            = 0
gettimeofday({1337759403, 201996}, NULL) = 0
gettimeofday({1337759403, 202038}, NULL) = 0
unlink("0000000006F")                   = 0
.
.
unlink("0000000690olrSg")               = 0
gettimeofday({1337759404, 968779}, NULL) = 0
gettimeofday({1337759404, 968869}, NULL) = 0
unlink("0000000692TMJPUaltX")           = 0
gettimeofday({1337759404, 970634}, NULL) = 0
gettimeofday({1337759404, 970853}, NULL) = 0
.
.

getdents(3, 0x148f9a8, 32768)           = -1 ELOOP (Too many levels of symbolic links)   <----entry count seems to messed up (0x148f9a8)

Manual rmdir of the same dir completes successfully.

Comment 12 shishir gowda 2012-05-23 09:40:20 UTC
The issue is reproducible on a volume with a single brick setup.

Comment 13 Krishna Srinivas 2012-05-23 13:47:05 UTC
dmesg says:
NFS: directory /Bonnie.4974 contains a readdir loop.  Please contact your server vendor.  Offending cookie: 3140

Seems to be a known problem in nfs client: https://bugzilla.redhat.com/show_bug.cgi?id=790729

bug comment says the problem is not seen with RHEL6.1 kernel (kernel-2.6.32-131.21.1.el6.x86_64). 

Saurabh, can we test with rhel6.1?

Comment 14 Vijay Bellur 2012-05-28 08:00:23 UTC
Seen only with RHEL 6.2 clients. Removing this from the blocker list.

Comment 15 Rajesh 2013-01-17 10:42:18 UTC

*** This bug has been marked as a duplicate of bug 770250 ***