Bug 849300 - nfs:bonnie fails during delete files operation
Summary: nfs:bonnie fails during delete files operation
Keywords:
Status: CLOSED DUPLICATE of bug 770250
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterfs
Version: 2.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: Rajesh
QA Contact: Sudhir D
URL:
Whiteboard:
Depends On: 770250 814052
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-08-18 02:23 UTC by Vidya Sakar
Modified: 2013-07-04 22:44 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 814052
Environment:
Last Closed: 2013-01-17 10:42:02 UTC
Embargoed:


Attachments (Terms of Use)

Description Vidya Sakar 2012-08-18 02:23:55 UTC
+++ This bug was initially created as a clone of Bug #814052 +++

Description of problem:
[root@longname-nlmtest-verification-purpose nfs-test]# time /opt/qa/tools/bonnie++ -u root -d .
Using uid:0, gid:0.
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...Bonnie: drastic I/O error (rmdir): Directory not empty
Cleaning up test directory after error.

real    7m5.023s
user    0m1.045s
sys     0m33.545s



Version-Release number of selected component (if applicable):
3.3.0qa35

How reproducible:
always

Steps to Reproduce:
1. mount the volume using nfs
2. execute the bonnie++
3.
  
Actual results:
the test fails during the delete files operation

Expected results:
suppose to finish it full

Additional info:

logs from /var/log/glusterfs/nfs.log

[2012-04-19 06:32:42.655735] E [rpcsvc.c:1084:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x936467579x, Program: NFS3, ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server)
[2012-04-19 06:32:42.655778] E [nfs3.c:576:nfs3svc_submit_reply] 0-nfs-nfsv3: Reply submission failed
[2012-04-19 06:32:42.656161] E [rpcsvc.c:1084:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x953244795x, Program: NFS3, ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server)
[2012-04-19 06:32:42.656241] E [nfs3.c:576:nfs3svc_submit_reply] 0-nfs-nfsv3: Reply submission failed
[2012-04-19 06:32:42.656355] E [rpcsvc.c:1084:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x970022011x, Program: NFS3, ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server)
[2012-04-19 06:32:42.656459] E [nfs3.c:576:nfs3svc_submit_reply] 0-nfs-nfsv3: Reply submission failed
[2012-04-19 06:38:54.384928] W [nfs3.c:3529:nfs3svc_rmdir_cbk] 0-nfs: 567627b: /Bonnie.1995 => -1 (Directory not empty)

--- Additional comment from saujain on 2012-04-19 09:23:34 EDT ---

[2012-04-19 13:25:25.000414] W [nfs3.c:3529:nfs3svc_rmdir_cbk] 0-nfs: 3525853f: /Bonnie.2067 => -1 (Directory not empty)


I tried to execute the bonnie again and didn't see the 
"Reply submission failed" error again.

--- Additional comment from sgowda on 2012-05-14 03:37:07 EDT ---

On the latest master branch, with 2x2 setup on nfs, bonnie++ runs to completion without any errors.
Can you please verify if are still seeing the issue at hand?

--- Additional comment from saujain on 2012-05-16 04:41:48 EDT ---

Bonnie still fails,

[root@QA-31 nfs-test]# time /opt/qa/tools/bonnie++ -u root -d .
Using uid:0, gid:0.
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...Bonnie: drastic I/O error (rmdir): Directory not empty
Cleaning up test directory after error.

real    7m57.541s
user    0m0.877s
sys     0m23.834s
[root@QA-31 nfs-test]# mount | grep 79
172.17.251.79:/dist-rep on /mnt/nfs-test type nfs (rw,vers=3,addr=172.17.251.79)
[root@QA-31 nfs-test]# 

it executed on the git sync to this commit id, 
commit 7503c63ee141931556cf066b9b255fc62cefcb68

--- Additional comment from sgowda on 2012-05-16 04:48:12 EDT ---

Can you please upload the nfs logs?

--- Additional comment from saujain on 2012-05-16 06:07:44 EDT ---

uploaded the logs in the sosreport sharing space

--- Additional comment from sgowda on 2012-05-17 01:03:49 EDT ---

Created attachment 585095 [details]
nfs log

--- Additional comment from sgowda on 2012-05-17 02:41:36 EDT ---

There are a lot of failures in the log related to permission issues.

Can you check the backend to verify if the exports/dirs are all in sync?

Additionally self heal seems to be triggered(dint see any disconnect issues).

[2012-05-16 02:07:07.331335] I [afr-self-heal-common.c:2045:afr_self_heal_completion_cbk] 0-dist-rep-replicate-0: background  entry self-heal completed on /run5728/pf/d0


[2012-05-16 02:42:55.076115] W [nfs3.c:3680:nfs3svc_rename_cbk] 0-nfs: 6cde5811: rename /run5728/fstest_6ad66a2a9777ec05609e1ebc77b6a9fd/fstest_725a0f5
f18ca4cfe0ac27e391430c75f -> /run5728/fstest_6ad66a2a9777ec05609e1ebc77b6a9fd/fstest_4d8a7810831e467266aecf2ea7257744 => -1 (Permission denied)
[2012-05-16 02:42:56.128295] W [client3_1-fops.c:258:client3_1_mknod_cbk] 0-dist-rep-client-0: remote operation failed: Permission denied. Path: /run57
28/fstest_6ad66a2a9777ec05609e1ebc77b6a9fd/fstest_725a0f5f18ca4cfe0ac27e391430c75f (c4eea594-b21d-4ce5-b07b-227434b2645f)
[2012-05-16 02:42:56.128365] W [client3_1-fops.c:2456:client3_1_link_cbk] 0-dist-rep-client-2: remote operation failed: Permission denied (/run5728/fst
est_6ad66a2a9777ec05609e1ebc77b6a9fd/fstest_725a0f5f18ca4cfe0ac27e391430c75f -> /run5728/fstest_6ad66a2a9777ec05609e1ebc77b6a9fd/fstest_4d8a7810831e467
266aecf2ea7257744)
[2012-05-16 02:42:56.128401] W [client3_1-fops.c:258:client3_1_mknod_cbk] 0-dist-rep-client-1: remote operation failed: Permission denied. Path: /run57
28/fstest_6ad66a2a9777ec05609e1ebc77b6a9fd/fstest_725a0f5f18ca4cfe0ac27e391430c75f (c4eea594-b21d-4ce5-b07b-227434b2645f)
[2012-05-16 02:42:56.128741] W [client3_1-fops.c:2456:client3_1_link_cbk] 0-dist-rep-client-3: remote operation failed: Permission denied (/run5728/fst
est_6ad66a2a9777ec05609e1ebc77b6a9fd/fstest_725a0f5f18ca4cfe0ac27e391430c75f -> /run5728/fstest_6ad66a2a9777ec05609e1ebc77b6a9fd/fstest_4d8a7810831e467
266aecf2ea7257744)
....


[2012-05-16 02:42:59.132116] W [client3_1-fops.c:649:client3_1_rmdir_cbk] 0-dist-rep-client-2: remote operation failed: Permission denied
[2012-05-16 02:42:59.132480] W [client3_1-fops.c:649:client3_1_rmdir_cbk] 0-dist-rep-client-3: remote operation failed: Permission denied
[2012-05-16 02:42:59.133179] W [client3_1-fops.c:649:client3_1_rmdir_cbk] 0-dist-rep-client-0: remote operation failed: Permission denied
[2012-05-16 02:42:59.133523] W [client3_1-fops.c:649:client3_1_rmdir_cbk] 0-dist-rep-client-1: remote operation failed: Permission denied

--- Additional comment from vbellur on 2012-05-18 02:53:25 EDT ---

*** Bug 820580 has been marked as a duplicate of this bug. ***

--- Additional comment from sgowda on 2012-05-22 01:01:09 EDT ---

*** Bug 823389 has been marked as a duplicate of this bug. ***

--- Additional comment from sgowda on 2012-05-23 03:11:59 EDT ---

The test passes successfully on fuse mount. But consistently fails on NFS.
It look like afr says no entires found in dir (readdir), when in fact it is populated with a lot of regular files (no dirs). rmdir hence fails, as no unlink calls seems to be sent on those files.

Few more error logs(debug level):
[2012-05-23 03:07:56.334103] D [nfs3-helpers.c:3489:nfs3_log_readdirp_res] 0-nfs-nfsv3: XID: 5181b434, READDIRPLUS: NFS: 0(Call completed successfully.
), POSIX: 117(Structure needs cleaning), dircount: 4096, maxcount: 32768, cverf: 38954220, is_eof: 0
[2012-05-23 03:07:56.334588] D [nfs3-helpers.c:3523:nfs3_log_readdir_call] 0-nfs-nfsv3: XID: 5281b434, READDIRPLUS: args: FH: hashcount 1, exportid b13
c5fef-b24d-4ad3-bf89-5639e1de3ed7, gfid fbba9aeb-795a-4954-bd0a-ce478d537274, dircount: 4096, maxcount: 32768
[2012-05-23 03:07:56.334651] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-0: Returning 0, call_child: 1, last_index: -1
[2012-05-23 03:07:56.336463] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-0: Returning 0, call_child: 1, last_index: -1
[2012-05-23 03:07:56.336712] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-1: Returning 0, call_child: 1, last_index: -1
[2012-05-23 03:07:56.336974] D [nfs3-helpers.c:3489:nfs3_log_readdirp_res] 0-nfs-nfsv3: XID: 5281b434, READDIRPLUS: NFS: 0(Call completed successfully.
), POSIX: 117(Structure needs cleaning), dircount: 4096, maxcount: 32768, cverf: 38954220, is_eof: 0
[2012-05-23 03:07:56.337306] D [nfs3-helpers.c:3523:nfs3_log_readdir_call] 0-nfs-nfsv3: XID: 5381b434, READDIRPLUS: args: FH: hashcount 1, exportid b13
c5fef-b24d-4ad3-bf89-5639e1de3ed7, gfid fbba9aeb-795a-4954-bd0a-ce478d537274, dircount: 4096, maxcount: 32768
[2012-05-23 03:07:56.337369] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-0: Returning 0, call_child: 1, last_index: -1
[2012-05-23 03:07:56.339152] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-0: Returning 0, call_child: 1, last_index: -1
[2012-05-23 03:07:56.339267] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-1: Returning 0, call_child: 1, last_index: -1
[2012-05-23 03:07:56.339632] D [nfs3-helpers.c:3489:nfs3_log_readdirp_res] 0-nfs-nfsv3: XID: 5381b434, READDIRPLUS: NFS: 0(Call completed successfully.), POSIX: 117(Structure needs cleaning), dircount: 4096, maxcount: 32768, cverf: 38954220, is_eof: 0
[2012-05-23 03:07:56.339992] D [nfs3-helpers.c:1627:nfs3_log_common_call] 0-nfs-nfsv3: XID: 5481b434, ACCESS: args: FH: hashcount 1, exportid b13c5fef-b24d-4ad3-bf89-5639e1de3ed7, gfid fbba9aeb-795a-4954-bd0a-ce478d537274
[2012-05-23 03:07:56.340054] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-0: Returning 0, call_child: 1, last_index: -1
[2012-05-23 03:07:56.340310] D [nfs3-helpers.c:3389:nfs3_log_common_res] 0-nfs-nfsv3: XID: 5481b434, ACCESS: NFS: 0(Call completed successfully.), POSIX: 7(Argument list too long)
[2012-05-23 03:07:56.340558] D [nfs3-helpers.c:1627:nfs3_log_common_call] 0-nfs-nfsv3: XID: 5581b434, ACCESS: args: FH: hashcount 0, exportid b13c5fef-b24d-4ad3-bf89-5639e1de3ed7, gfid 00000000-0000-0000-0000-000000000001
[2012-05-23 03:07:56.340608] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-0: Returning 0, call_child: 1, last_index: -1
[2012-05-23 03:07:56.340781] D [nfs3-helpers.c:3389:nfs3_log_common_res] 0-nfs-nfsv3: XID: 5581b434, ACCESS: NFS: 0(Call completed successfully.), POSIX: 7(Argument list too long)
[2012-05-23 03:07:56.347287] D [nfs3-helpers.c:1641:nfs3_log_fh_entry_call] 0-nfs-nfsv3: XID: 5681b434, RMDIR: args: FH: hashcount 0, exportid b13c5fef-b24d-4ad3-bf89-5639e1de3ed7, gfid 00000000-0000-0000-0000-000000000001, name: Bonnie.2337
[2012-05-23 03:07:56.387361] D [afr-dir-read.c:126:afr_examine_dir_readdir_cbk] 0-test-replicate-1: /Bonnie.2337: no entries found in test-client-3
[2012-05-23 03:07:56.387837] D [afr-dir-read.c:126:afr_examine_dir_readdir_cbk] 0-test-replicate-1: /Bonnie.2337: no entries found in test-client-2
[2012-05-23 03:07:56.387907] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-1: Returning 0, call_child: 1, last_index: -1
[2012-05-23 03:07:56.390288] D [afr-dir-read.c:126:afr_examine_dir_readdir_cbk] 0-test-replicate-0: /Bonnie.2337: no entries found in test-client-1
[2012-05-23 03:07:56.397310] D [afr-dir-read.c:126:afr_examine_dir_readdir_cbk] 0-test-replicate-0: /Bonnie.2337: no entries found in test-client-0
[2012-05-23 03:07:56.397442] D [afr-common.c:704:afr_get_call_child] 0-test-replicate-0: Returning 0, call_child: 1, last_index: -1
[2012-05-23 03:07:56.399452] W [nfs3.c:3525:nfs3svc_rmdir_cbk] 0-nfs: 5681b434: /Bonnie.2337 => -1 (Directory not empty)
[2012-05-23 03:07:56.399506] D [nfs3-helpers.c:3389:nfs3_log_common_res] 0-nfs-nfsv3: XID: 5681b434, RMDIR: NFS: 66(Directory not empty), POSIX: 39(Directory not empty)

--- Additional comment from sgowda on 2012-05-23 03:48:25 EDT ---

strace o/p of bonnie:

write(2, "done.\nDelete files in sequential"..., 41) = 41
gettimeofday({1337759403, 197341}, NULL) = 0
getrusage(RUSAGE_SELF, {ru_utime={11, 153304}, ru_stime={81, 447618}, ...}) = 0
open(".", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
gettimeofday({1337759403, 198489}, NULL) = 0
getdents(3, /* 836 entries */, 32768)   = 32744
gettimeofday({1337759403, 200356}, NULL) = 0
.
.
unlink("00000000044O9bQcaS")            = 0
gettimeofday({1337759403, 201996}, NULL) = 0
gettimeofday({1337759403, 202038}, NULL) = 0
unlink("0000000006F")                   = 0
.
.
unlink("0000000690olrSg")               = 0
gettimeofday({1337759404, 968779}, NULL) = 0
gettimeofday({1337759404, 968869}, NULL) = 0
unlink("0000000692TMJPUaltX")           = 0
gettimeofday({1337759404, 970634}, NULL) = 0
gettimeofday({1337759404, 970853}, NULL) = 0
.
.

getdents(3, 0x148f9a8, 32768)           = -1 ELOOP (Too many levels of symbolic links)   <----entry count seems to messed up (0x148f9a8)

Manual rmdir of the same dir completes successfully.

--- Additional comment from sgowda on 2012-05-23 05:40:20 EDT ---

The issue is reproducible on a volume with a single brick setup.

--- Additional comment from ksriniva on 2012-05-23 09:47:05 EDT ---

dmesg says:
NFS: directory /Bonnie.4974 contains a readdir loop.  Please contact your server vendor.  Offending cookie: 3140

Seems to be a known problem in nfs client: https://bugzilla.redhat.com/show_bug.cgi?id=790729

bug comment says the problem is not seen with RHEL6.1 kernel (kernel-2.6.32-131.21.1.el6.x86_64). 

Saurabh, can we test with rhel6.1?

--- Additional comment from vbellur on 2012-05-28 04:00:23 EDT ---

Seen only with RHEL 6.2 clients. Removing this from the blocker list.

Comment 2 Vidya Sakar 2012-10-17 12:10:37 UTC
Closing as worksforme, reopen the BZ if this issue is seen in RHS.

Comment 3 Saurabh 2012-10-17 13:00:49 UTC
it still fails over RHS

[root@QA-31 run22982]# time /opt/qa/tools/bonnie++ -u root -d .
Using uid:0, gid:0.
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...Bonnie: drastic I/O error (rmdir): Directory not empty
Cleaning up test directory after error.

real	8m33.698s
user	0m1.106s
sys	0m30.777s

172.17.251.90:/dist-rep on /mnt/nfs-test type nfs (rw,vers=3,addr=172.17.251.90)

server side information,

[root@localhost ~]# glusterfs -V
glusterfs 3.3.0.3rhs built on Oct 10 2012 09:16:19
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.
[root@localhost ~]# 
[root@localhost ~]# 
[root@localhost ~]# 
[root@localhost ~]# gluster volume info dist-rep
 
Volume Name: dist-rep
Type: Distributed-Replicate
Volume ID: 6a96e222-0bca-46e2-bc3c-870e8d1b970a
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 172.17.251.90:/export/dr
Brick2: 172.17.251.91:/export/drr
Brick3: 172.17.251.90:/export/ddr
Brick4: 172.17.251.91:/export/ddrr
[root@localhost ~]# 
[root@localhost ~]# 
[root@localhost ~]# uname -a
Linux localhost.localdomain 2.6.32-220.28.1.el6.x86_64 #1 SMP Wed Oct 3 12:26:28 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
[root@localhost ~]#

Comment 4 Krishna Srinivas 2012-10-22 11:56:59 UTC
Saurabh, What RHEL is this?

Comment 5 Saurabh 2012-10-22 12:18:06 UTC
RHEL 6.2

Comment 6 nayfield 2013-01-15 22:19:08 UTC
Confirmed that bz770250 was the culprit on my system.  

I can reproduce with kernel-2.6.32-220.13.1.el6.x86_64 but upgrading to kernel-2.6.32-220.17.1.el6.x86_64 resolves the problem.


Reproducer: $ bonnie++ -s 0 
(bonnie++ package from EPEL, against RHS 2.0 nfs share)

Comment 7 Rajesh 2013-01-17 10:42:02 UTC

*** This bug has been marked as a duplicate of bug 770250 ***


Note You need to log in before you can comment on or make changes to this bug.