Bug 1121347 - NFS: When running rm -rf files / directories are not deleted.
Summary: NFS: When running rm -rf files / directories are not deleted.
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: GlusterFS
Classification: Community
Component: nfs
Version: mainline
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
Assignee: GlusterFS Bugs list
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1121348
TreeView+ depends on / blocked
 
Reported: 2014-07-19 14:47 UTC by Ben Turner
Modified: 2014-07-22 17:06 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1121348 (view as bug list)
Environment:
Last Closed: 2014-07-22 17:05:54 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Ben Turner 2014-07-19 14:47:31 UTC
Description of problem:

When running rm -rf on NFS mounts files and directories do not get deleted.  The easiest way I found to reproduce this is by running dbench as it creates a bunch of files / directories and cleans them up.

Version-Release number of selected component (if applicable):

glusterfs-3.5qa2-0.729.gitc690c11.el6.x86_64

How reproducible:

This happens about 50% of the time when running dbench.

Steps to Reproduce:
1.  yum install dbench from epel
2.  export PATH=$PATH:/opt/qa/tools
3.  DBENCH_BIN_PATH=${DBENCH_BIN_PATH:=/opt/qa/tools}
4.  DBENCH_TIME=${DBENCH_TIME:=300}
5.  DBENCH_CLIENTS=${DBENCH_CLIENTS:=10}
6.  cp /usr/share/dbench/client.txt $DBENCH_BIN_PATH
7.  dbench -t $DBENCH_TIME -c $DBENCH_BIN_PATH/client.txt -s -S $DBENCH_CLIENTS

Actual results:

The clients directory, clientsXX subdirectories, and some files are not deleted when dbench cleans up.

Expected results:

All directories and flies are deleted.

Additional info:

I opened this specifically on NFS but it looks to be reproducible on glusterfs mounts as well.  It seems to be more easily reproduced on NFS mounts though.

Comment 1 Ben Turner 2014-07-19 14:53:58 UTC
Dbench doesn't return a fail when removing the files, the only way I know the directories / files weren't removed was to check:

[root@gqac019 gluster-mount]# ll clients/
total 0
drwx------ 3 root root 19 Jul 19 09:56 client0
drwx------ 3 root root 19 Jul 19 09:56 client1
drwx------ 3 root root 19 Jul 19 09:56 client2
drwx------ 3 root root 19 Jul 19 09:56 client3
drwx------ 3 root root 19 Jul 19 09:56 client4
drwx------ 3 root root 19 Jul 19 09:56 client5
drwx------ 3 root root 19 Jul 19 09:56 client6
drwx------ 3 root root 19 Jul 19 09:56 client7
drwx------ 3 root root 19 Jul 19 09:56 client8
drwx------ 3 root root 19 Jul 19 09:56 client9

If I watch the nfs.log on the node I am mounting I see:

This spams the logs when files / dirs are getting created:

[2014-07-19 13:51:28.722206] I [MSGID: 109036] [dht-common.c:5694:dht_log_new_layout_for_dir_selfheal] 0-testvol-dht: Setting layout of /clients/client7/~dmtmp/SEED with [Subvol_name: testvol-replicate-0, Err: -1 , Start: 0 , Stop: 2147483646 ], [Subvol_name: testvol-replicate-1, Err: -1 , Start: 2147483647 , Stop: 4294967295 ], 
[2014-07-19 13:51:28.964684] I [MSGID: 109036] [dht-common.c:5694:dht_log_new_layout_for_dir_selfheal] 0-testvol-dht: Setting layout of /clients/client8/~dmtmp/SEED with [Subvol_name: testvol-replicate-0, Err: -1 , Start: 2147483647 , Stop: 4294967295 ], [Subvol_name: testvol-replicate-1, Err: -1 , Start: 0 , Stop: 2147483646 ], 

When dbench goes to delete the files / dirs it created I see:

[2014-07-19 13:56:44.554270] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 3a17c1a: /clients => -1 (Directory not empty)
[2014-07-19 13:56:44.575989] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 6fa17c1a: /clients => -1 (Directory not empty)
[2014-07-19 13:56:44.584309] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: a4a17c1a: /clients => -1 (Directory not empty)
[2014-07-19 13:56:44.589548] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: c4a17c1a: /clients => -1 (Directory not empty)
[2014-07-19 13:56:44.593718] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: d6a17c1a: /clients => -1 (Directory not empty)
[2014-07-19 13:56:44.598136] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: f0a17c1a: /clients => -1 (Directory not empty)
[2014-07-19 13:56:44.621565] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 6aa27c1a: /clients => -1 (Directory not empty)
[2014-07-19 13:56:44.646917] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: d4a27c1a: /clients => -1 (Directory not empty)
[2014-07-19 13:56:44.657862] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 7a37c1a: /clients => -1 (Directory not empty)
[2014-07-19 13:56:44.663313] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 28a37c1a: /clients => -1 (Directory not empty)
[2014-07-19 14:02:46.091288] W [socket.c:529:__socket_rwv] 0-NLM-client: readv on 192.168.11.102:45823 failed (No data available)
[2014-07-19 14:02:46.091418] E [rpc-transport.c:485:rpc_transport_unref] (-->/usr/lib64/glusterfs/3.5qa2/xlator/nfs/server.so(nlm_rpcclnt_notify+0x5a) [0x7f53775128ea] (-->/usr/lib64/glusterfs/3.5qa2/xlator/nfs/server.so(nlm_unset_rpc_clnt+0x75) [0x7f537750e3e5] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_unref+0x63) [0x7f5388914693]))) 0-rpc_transport: invalid argument: this

Comment 2 Ben Turner 2014-07-19 15:04:16 UTC
I just noticed that all of the clientXX sub directories appear to be empty.  I am able to create a file though:

[root@gqac019 gluster-mount]# cd clients/client0/~dmtmp/ACCESS/
[root@gqac019 ACCESS]# touch myfile
[root@gqac019 ACCESS]# ll
total 0
-rw-r--r-- 1 root root 0 Jul 19 11:03 myfile

Comment 3 Ben Turner 2014-07-20 17:08:44 UTC
I ran on a replicate volume:

[root@gqac007 ~]# gluster v info
 
Volume Name: testvol
Type: Replicate
Volume ID: d179d723-ece1-43ac-a1df-7b9f267f56e4
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: gqac007.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick0
Brick2: gqac012.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick1
Options Reconfigured:
snap-max-hard-limit: 256
snap-max-soft-limit: 90
auto-delete: disable

And the last 20 lines of nfs.log:

[root@gqac007 ~]# tail /var/log/glusterfs/nfs.log -n 20
[2014-07-20 16:29:50.941538] I [MSGID: 109036] [dht-common.c:5694:dht_log_new_layout_for_dir_selfheal] 0-testvol-dht: Setting layout of /run8054_dbench/clients/client4/~dmtmp/SEED with [Subvol_name: testvol-replicate-0, Err: -1 , Start: 0 , Stop: 4294967295 ], 
[2014-07-20 16:35:09.982406] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: bd4e3151: /run8054_dbench/clients => -1 (Directory not empty)
[2014-07-20 16:35:10.110203] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 61503151: /run8054_dbench/clients => -1 (Directory not empty)
[2014-07-20 16:35:10.192822] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 73513151: /run8054_dbench/clients => -1 (Directory not empty)
[2014-07-20 16:35:10.226959] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: fe513151: /run8054_dbench/clients => -1 (Directory not empty)
[2014-07-20 16:35:10.248190] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 45523151: /run8054_dbench/clients => -1 (Directory not empty)
[2014-07-20 16:35:10.269472] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: a6523151: /run8054_dbench/clients => -1 (Directory not empty)
[2014-07-20 16:35:10.293155] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: e533151: /run8054_dbench/clients => -1 (Directory not empty)
[2014-07-20 16:35:10.296735] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 21533151: /run8054_dbench/clients => -1 (Directory not empty)
[2014-07-20 16:35:10.361451] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: fa533151: /run8054_dbench/clients => -1 (Directory not empty)
[2014-07-20 16:35:10.370009] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 9543151: /run8054_dbench/clients => -1 (Directory not empty)
[2014-07-20 16:35:10.983422] W [client-rpc-fops.c:1354:client3_3_access_cbk] 0-testvol-client-0: remote operation failed: Stale file handle
[2014-07-20 16:35:10.984042] W [MSGID: 108008] [afr-read-txn.c:218:afr_read_txn] 0-testvol-replicate-0: Unreadable subvolume -1 found with event generation 2. (Possible split-brain)
[2014-07-20 16:35:10.985508] W [nfs3.c:1532:nfs3svc_access_cbk] 0-nfs: 50563151: <gfid:8210c421-4775-4577-8442-321c602cac46> => -1 (Stale file handle)
[2014-07-20 16:35:10.985553] W [nfs3-helpers.c:3401:nfs3_log_common_res] 0-nfs-nfsv3: XID: 50563151, ACCESS: NFS: 70(Invalid file handle), POSIX: 116(Stale file handle)
[2014-07-20 16:35:10.988978] W [nfs3.c:1532:nfs3svc_access_cbk] 0-nfs: 51563151: <gfid:8210c421-4775-4577-8442-321c602cac46> => -1 (Stale file handle)
[2014-07-20 16:35:10.989027] W [nfs3-helpers.c:3401:nfs3_log_common_res] 0-nfs-nfsv3: XID: 51563151, ACCESS: NFS: 70(Invalid file handle), POSIX: 116(Stale file handle)
The message "W [MSGID: 108008] [afr-read-txn.c:218:afr_read_txn] 0-testvol-replicate-0: Unreadable subvolume -1 found with event generation 2. (Possible split-brain)" repeated 2 times between [2014-07-20 16:35:10.984042] and [2014-07-20 16:35:10.987961]
[2014-07-20 16:42:24.894019] W [socket.c:529:__socket_rwv] 0-NLM-client: readv on 192.168.11.102:49213 failed (No data available)
[2014-07-20 16:42:24.894380] E [rpc-transport.c:485:rpc_transport_unref] (-->/usr/lib64/glusterfs/3.5qa2/xlator/nfs/server.so(nlm_rpcclnt_notify+0x5a) [0x7f6b730c58ea] (-->/usr/lib64/glusterfs/3.5qa2/xlator/nfs/server.so(nlm_unset_rpc_clnt+0x75) [0x7f6b730c13e5] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_unref+0x63) [0x7f6b844fb693]))) 0-rpc_transport: invalid argument: this

So this is reproducible on a 1x2.

Comment 4 Ben Turner 2014-07-21 16:04:47 UTC
I am starting to doubt that dbench is trying to delete the clientsX directories.  When I ran the same test on the local file system I saw that the clientX directories were not deleted, only the files were deleted.  I confirmed this by testing with and without the --skip-cleanup option.  This leads me to question why the log entries were created:

[2014-07-20 16:35:09.982406] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: bd4e3151: /run8054_dbench/clients => -1 (Directory not empty)
[2014-07-20 16:35:10.110203] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 61503151: /run8054_dbench/clients => -1 (Directory not empty)
[2014-07-20 16:35:10.192822] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 73513151: /run8054_dbench/clients => -1 (Directory not empty)
[2014-07-20 16:35:10.226959] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: fe513151: /run8054_dbench/clients => -1 (Directory not empty)
[2014-07-20 16:35:10.248190] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 45523151: /run8054_dbench/clients => -1 (Directory not empty)
[2014-07-20 16:35:10.269472] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: a6523151: /run8054_dbench/clients => -1 (Directory not empty)
[2014-07-20 16:35:10.293155] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: e533151: /run8054_dbench/clients => -1 (Directory not empty)
[2014-07-20 16:35:10.296735] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 21533151: /run8054_dbench/clients => -1 (Directory not empty)
[2014-07-20 16:35:10.361451] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: fa533151: /run8054_dbench/clients => -1 (Directory not empty)
[2014-07-20 16:35:10.370009] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 9543151: /run8054_dbench/clients => -1 (Directory not empty)

My initial assumption was that dbench was a trying do something similar to rm -rf on the clients directory, but this is looking like its not the case.  I think there are 3 separate issues here:

1.  When creating directories I see:

[2014-07-20 16:29:50.941538] I [MSGID: 109036] [dht-common.c:5694:dht_log_new_layout_for_dir_selfheal] 0-testvol-dht: Setting layout of /run8054_dbench/clients/client4/~dmtmp/SEED with [Subvol_name: testvol-replicate-0, Err: -1 , Start: 0 , Stop: 4294967295 ],

2.  When dbench is cleaning up the files it created I see:

[2014-07-20 16:35:10.370009] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 9543151: /run8054_dbench/clients => -1 (Directory not empty)

3.  Occasionally after cleaning up the files I see:

[2014-07-20 16:42:24.894019] W [socket.c:529:__socket_rwv] 0-NLM-client: readv on 192.168.11.102:49213 failed (No data available)
[2014-07-20 16:42:24.894380] E [rpc-transport.c:485:rpc_transport_unref] (-->/usr/lib64/glusterfs/3.5qa2/xlator/nfs/server.so(nlm_rpcclnt_notify+0x5a) [0x7f6b730c58ea] (-->/usr/lib64/glusterfs/3.5qa2/xlator/nfs/server.so(nlm_unset_rpc_clnt+0x75) [0x7f6b730c13e5] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_unref+0x63) [0x7f6b844fb693]))) 0-rpc_transport: invalid argument: this

After digging deeper into the issue I don't think my initial take is valid anymore.  I do not think there is a problem deleting directories but it is not clear why I am seeing the directory not empty warnings.  A couple other observations:

-I do not see this issue on glusterfs mounts like I initially thought, the problem I am seeing on glusterfs mounts is documented in:

https://bugzilla.redhat.com/show_bug.cgi?id=1112402

-I tested this on 2x1, 2x1, and on a single brick volume, it was reproducible on each.  I think what I am seeing is specific to the NFS translator.

Question:

I do not think dbench is actually deleting the clientX directories anymore as when I run on a local filesystem they are still there after cleanup.  Should I close this BZ and split it into 3 different BZs to address each of the logging issues I am seeing?

Comment 5 santosh pradhan 2014-07-22 17:05:54 UTC

Yes, I am CLOSING the defect as The summary of the issue does not hold good. dbench works as expected (the way it works with kernel NFS).


Note You need to log in before you can comment on or make changes to this bug.