Bug 1121347

Summary: NFS: When running rm -rf files / directories are not deleted.
Product: [Community] GlusterFS Reporter: Ben Turner <bturner>
Component: nfsAssignee: GlusterFS Bugs list <gluster-bugs>
Status: CLOSED NOTABUG QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: mainlineCC: gluster-bugs, spradhan
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1121348 (view as bug list) Environment:
Last Closed: 2014-07-22 17:05:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1121348    

Description Ben Turner 2014-07-19 14:47:31 UTC
Description of problem:

When running rm -rf on NFS mounts files and directories do not get deleted.  The easiest way I found to reproduce this is by running dbench as it creates a bunch of files / directories and cleans them up.

Version-Release number of selected component (if applicable):

glusterfs-3.5qa2-0.729.gitc690c11.el6.x86_64

How reproducible:

This happens about 50% of the time when running dbench.

Steps to Reproduce:
1.  yum install dbench from epel
2.  export PATH=$PATH:/opt/qa/tools
3.  DBENCH_BIN_PATH=${DBENCH_BIN_PATH:=/opt/qa/tools}
4.  DBENCH_TIME=${DBENCH_TIME:=300}
5.  DBENCH_CLIENTS=${DBENCH_CLIENTS:=10}
6.  cp /usr/share/dbench/client.txt $DBENCH_BIN_PATH
7.  dbench -t $DBENCH_TIME -c $DBENCH_BIN_PATH/client.txt -s -S $DBENCH_CLIENTS

Actual results:

The clients directory, clientsXX subdirectories, and some files are not deleted when dbench cleans up.

Expected results:

All directories and flies are deleted.

Additional info:

I opened this specifically on NFS but it looks to be reproducible on glusterfs mounts as well.  It seems to be more easily reproduced on NFS mounts though.

Comment 1 Ben Turner 2014-07-19 14:53:58 UTC
Dbench doesn't return a fail when removing the files, the only way I know the directories / files weren't removed was to check:

[root@gqac019 gluster-mount]# ll clients/
total 0
drwx------ 3 root root 19 Jul 19 09:56 client0
drwx------ 3 root root 19 Jul 19 09:56 client1
drwx------ 3 root root 19 Jul 19 09:56 client2
drwx------ 3 root root 19 Jul 19 09:56 client3
drwx------ 3 root root 19 Jul 19 09:56 client4
drwx------ 3 root root 19 Jul 19 09:56 client5
drwx------ 3 root root 19 Jul 19 09:56 client6
drwx------ 3 root root 19 Jul 19 09:56 client7
drwx------ 3 root root 19 Jul 19 09:56 client8
drwx------ 3 root root 19 Jul 19 09:56 client9

If I watch the nfs.log on the node I am mounting I see:

This spams the logs when files / dirs are getting created:

[2014-07-19 13:51:28.722206] I [MSGID: 109036] [dht-common.c:5694:dht_log_new_layout_for_dir_selfheal] 0-testvol-dht: Setting layout of /clients/client7/~dmtmp/SEED with [Subvol_name: testvol-replicate-0, Err: -1 , Start: 0 , Stop: 2147483646 ], [Subvol_name: testvol-replicate-1, Err: -1 , Start: 2147483647 , Stop: 4294967295 ], 
[2014-07-19 13:51:28.964684] I [MSGID: 109036] [dht-common.c:5694:dht_log_new_layout_for_dir_selfheal] 0-testvol-dht: Setting layout of /clients/client8/~dmtmp/SEED with [Subvol_name: testvol-replicate-0, Err: -1 , Start: 2147483647 , Stop: 4294967295 ], [Subvol_name: testvol-replicate-1, Err: -1 , Start: 0 , Stop: 2147483646 ], 

When dbench goes to delete the files / dirs it created I see:

[2014-07-19 13:56:44.554270] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 3a17c1a: /clients => -1 (Directory not empty)
[2014-07-19 13:56:44.575989] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 6fa17c1a: /clients => -1 (Directory not empty)
[2014-07-19 13:56:44.584309] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: a4a17c1a: /clients => -1 (Directory not empty)
[2014-07-19 13:56:44.589548] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: c4a17c1a: /clients => -1 (Directory not empty)
[2014-07-19 13:56:44.593718] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: d6a17c1a: /clients => -1 (Directory not empty)
[2014-07-19 13:56:44.598136] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: f0a17c1a: /clients => -1 (Directory not empty)
[2014-07-19 13:56:44.621565] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 6aa27c1a: /clients => -1 (Directory not empty)
[2014-07-19 13:56:44.646917] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: d4a27c1a: /clients => -1 (Directory not empty)
[2014-07-19 13:56:44.657862] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 7a37c1a: /clients => -1 (Directory not empty)
[2014-07-19 13:56:44.663313] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 28a37c1a: /clients => -1 (Directory not empty)
[2014-07-19 14:02:46.091288] W [socket.c:529:__socket_rwv] 0-NLM-client: readv on 192.168.11.102:45823 failed (No data available)
[2014-07-19 14:02:46.091418] E [rpc-transport.c:485:rpc_transport_unref] (-->/usr/lib64/glusterfs/3.5qa2/xlator/nfs/server.so(nlm_rpcclnt_notify+0x5a) [0x7f53775128ea] (-->/usr/lib64/glusterfs/3.5qa2/xlator/nfs/server.so(nlm_unset_rpc_clnt+0x75) [0x7f537750e3e5] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_unref+0x63) [0x7f5388914693]))) 0-rpc_transport: invalid argument: this

Comment 2 Ben Turner 2014-07-19 15:04:16 UTC
I just noticed that all of the clientXX sub directories appear to be empty.  I am able to create a file though:

[root@gqac019 gluster-mount]# cd clients/client0/~dmtmp/ACCESS/
[root@gqac019 ACCESS]# touch myfile
[root@gqac019 ACCESS]# ll
total 0
-rw-r--r-- 1 root root 0 Jul 19 11:03 myfile

Comment 3 Ben Turner 2014-07-20 17:08:44 UTC
I ran on a replicate volume:

[root@gqac007 ~]# gluster v info
 
Volume Name: testvol
Type: Replicate
Volume ID: d179d723-ece1-43ac-a1df-7b9f267f56e4
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: gqac007.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick0
Brick2: gqac012.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick1
Options Reconfigured:
snap-max-hard-limit: 256
snap-max-soft-limit: 90
auto-delete: disable

And the last 20 lines of nfs.log:

[root@gqac007 ~]# tail /var/log/glusterfs/nfs.log -n 20
[2014-07-20 16:29:50.941538] I [MSGID: 109036] [dht-common.c:5694:dht_log_new_layout_for_dir_selfheal] 0-testvol-dht: Setting layout of /run8054_dbench/clients/client4/~dmtmp/SEED with [Subvol_name: testvol-replicate-0, Err: -1 , Start: 0 , Stop: 4294967295 ], 
[2014-07-20 16:35:09.982406] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: bd4e3151: /run8054_dbench/clients => -1 (Directory not empty)
[2014-07-20 16:35:10.110203] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 61503151: /run8054_dbench/clients => -1 (Directory not empty)
[2014-07-20 16:35:10.192822] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 73513151: /run8054_dbench/clients => -1 (Directory not empty)
[2014-07-20 16:35:10.226959] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: fe513151: /run8054_dbench/clients => -1 (Directory not empty)
[2014-07-20 16:35:10.248190] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 45523151: /run8054_dbench/clients => -1 (Directory not empty)
[2014-07-20 16:35:10.269472] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: a6523151: /run8054_dbench/clients => -1 (Directory not empty)
[2014-07-20 16:35:10.293155] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: e533151: /run8054_dbench/clients => -1 (Directory not empty)
[2014-07-20 16:35:10.296735] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 21533151: /run8054_dbench/clients => -1 (Directory not empty)
[2014-07-20 16:35:10.361451] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: fa533151: /run8054_dbench/clients => -1 (Directory not empty)
[2014-07-20 16:35:10.370009] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 9543151: /run8054_dbench/clients => -1 (Directory not empty)
[2014-07-20 16:35:10.983422] W [client-rpc-fops.c:1354:client3_3_access_cbk] 0-testvol-client-0: remote operation failed: Stale file handle
[2014-07-20 16:35:10.984042] W [MSGID: 108008] [afr-read-txn.c:218:afr_read_txn] 0-testvol-replicate-0: Unreadable subvolume -1 found with event generation 2. (Possible split-brain)
[2014-07-20 16:35:10.985508] W [nfs3.c:1532:nfs3svc_access_cbk] 0-nfs: 50563151: <gfid:8210c421-4775-4577-8442-321c602cac46> => -1 (Stale file handle)
[2014-07-20 16:35:10.985553] W [nfs3-helpers.c:3401:nfs3_log_common_res] 0-nfs-nfsv3: XID: 50563151, ACCESS: NFS: 70(Invalid file handle), POSIX: 116(Stale file handle)
[2014-07-20 16:35:10.988978] W [nfs3.c:1532:nfs3svc_access_cbk] 0-nfs: 51563151: <gfid:8210c421-4775-4577-8442-321c602cac46> => -1 (Stale file handle)
[2014-07-20 16:35:10.989027] W [nfs3-helpers.c:3401:nfs3_log_common_res] 0-nfs-nfsv3: XID: 51563151, ACCESS: NFS: 70(Invalid file handle), POSIX: 116(Stale file handle)
The message "W [MSGID: 108008] [afr-read-txn.c:218:afr_read_txn] 0-testvol-replicate-0: Unreadable subvolume -1 found with event generation 2. (Possible split-brain)" repeated 2 times between [2014-07-20 16:35:10.984042] and [2014-07-20 16:35:10.987961]
[2014-07-20 16:42:24.894019] W [socket.c:529:__socket_rwv] 0-NLM-client: readv on 192.168.11.102:49213 failed (No data available)
[2014-07-20 16:42:24.894380] E [rpc-transport.c:485:rpc_transport_unref] (-->/usr/lib64/glusterfs/3.5qa2/xlator/nfs/server.so(nlm_rpcclnt_notify+0x5a) [0x7f6b730c58ea] (-->/usr/lib64/glusterfs/3.5qa2/xlator/nfs/server.so(nlm_unset_rpc_clnt+0x75) [0x7f6b730c13e5] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_unref+0x63) [0x7f6b844fb693]))) 0-rpc_transport: invalid argument: this

So this is reproducible on a 1x2.

Comment 4 Ben Turner 2014-07-21 16:04:47 UTC
I am starting to doubt that dbench is trying to delete the clientsX directories.  When I ran the same test on the local file system I saw that the clientX directories were not deleted, only the files were deleted.  I confirmed this by testing with and without the --skip-cleanup option.  This leads me to question why the log entries were created:

[2014-07-20 16:35:09.982406] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: bd4e3151: /run8054_dbench/clients => -1 (Directory not empty)
[2014-07-20 16:35:10.110203] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 61503151: /run8054_dbench/clients => -1 (Directory not empty)
[2014-07-20 16:35:10.192822] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 73513151: /run8054_dbench/clients => -1 (Directory not empty)
[2014-07-20 16:35:10.226959] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: fe513151: /run8054_dbench/clients => -1 (Directory not empty)
[2014-07-20 16:35:10.248190] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 45523151: /run8054_dbench/clients => -1 (Directory not empty)
[2014-07-20 16:35:10.269472] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: a6523151: /run8054_dbench/clients => -1 (Directory not empty)
[2014-07-20 16:35:10.293155] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: e533151: /run8054_dbench/clients => -1 (Directory not empty)
[2014-07-20 16:35:10.296735] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 21533151: /run8054_dbench/clients => -1 (Directory not empty)
[2014-07-20 16:35:10.361451] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: fa533151: /run8054_dbench/clients => -1 (Directory not empty)
[2014-07-20 16:35:10.370009] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 9543151: /run8054_dbench/clients => -1 (Directory not empty)

My initial assumption was that dbench was a trying do something similar to rm -rf on the clients directory, but this is looking like its not the case.  I think there are 3 separate issues here:

1.  When creating directories I see:

[2014-07-20 16:29:50.941538] I [MSGID: 109036] [dht-common.c:5694:dht_log_new_layout_for_dir_selfheal] 0-testvol-dht: Setting layout of /run8054_dbench/clients/client4/~dmtmp/SEED with [Subvol_name: testvol-replicate-0, Err: -1 , Start: 0 , Stop: 4294967295 ],

2.  When dbench is cleaning up the files it created I see:

[2014-07-20 16:35:10.370009] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 9543151: /run8054_dbench/clients => -1 (Directory not empty)

3.  Occasionally after cleaning up the files I see:

[2014-07-20 16:42:24.894019] W [socket.c:529:__socket_rwv] 0-NLM-client: readv on 192.168.11.102:49213 failed (No data available)
[2014-07-20 16:42:24.894380] E [rpc-transport.c:485:rpc_transport_unref] (-->/usr/lib64/glusterfs/3.5qa2/xlator/nfs/server.so(nlm_rpcclnt_notify+0x5a) [0x7f6b730c58ea] (-->/usr/lib64/glusterfs/3.5qa2/xlator/nfs/server.so(nlm_unset_rpc_clnt+0x75) [0x7f6b730c13e5] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_unref+0x63) [0x7f6b844fb693]))) 0-rpc_transport: invalid argument: this

After digging deeper into the issue I don't think my initial take is valid anymore.  I do not think there is a problem deleting directories but it is not clear why I am seeing the directory not empty warnings.  A couple other observations:

-I do not see this issue on glusterfs mounts like I initially thought, the problem I am seeing on glusterfs mounts is documented in:

https://bugzilla.redhat.com/show_bug.cgi?id=1112402

-I tested this on 2x1, 2x1, and on a single brick volume, it was reproducible on each.  I think what I am seeing is specific to the NFS translator.

Question:

I do not think dbench is actually deleting the clientX directories anymore as when I run on a local filesystem they are still there after cleanup.  Should I close this BZ and split it into 3 different BZs to address each of the logging issues I am seeing?

Comment 5 santosh pradhan 2014-07-22 17:05:54 UTC

Yes, I am CLOSING the defect as The summary of the issue does not hold good. dbench works as expected (the way it works with kernel NFS).