Description of problem: When running rm -rf on NFS mounts files and directories do not get deleted. The easiest way I found to reproduce this is by running dbench as it creates a bunch of files / directories and cleans them up. Version-Release number of selected component (if applicable): glusterfs-3.5qa2-0.729.gitc690c11.el6.x86_64 How reproducible: This happens about 50% of the time when running dbench. Steps to Reproduce: 1. yum install dbench from epel 2. export PATH=$PATH:/opt/qa/tools 3. DBENCH_BIN_PATH=${DBENCH_BIN_PATH:=/opt/qa/tools} 4. DBENCH_TIME=${DBENCH_TIME:=300} 5. DBENCH_CLIENTS=${DBENCH_CLIENTS:=10} 6. cp /usr/share/dbench/client.txt $DBENCH_BIN_PATH 7. dbench -t $DBENCH_TIME -c $DBENCH_BIN_PATH/client.txt -s -S $DBENCH_CLIENTS Actual results: The clients directory, clientsXX subdirectories, and some files are not deleted when dbench cleans up. Expected results: All directories and flies are deleted. Additional info: I opened this specifically on NFS but it looks to be reproducible on glusterfs mounts as well. It seems to be more easily reproduced on NFS mounts though.
Dbench doesn't return a fail when removing the files, the only way I know the directories / files weren't removed was to check: [root@gqac019 gluster-mount]# ll clients/ total 0 drwx------ 3 root root 19 Jul 19 09:56 client0 drwx------ 3 root root 19 Jul 19 09:56 client1 drwx------ 3 root root 19 Jul 19 09:56 client2 drwx------ 3 root root 19 Jul 19 09:56 client3 drwx------ 3 root root 19 Jul 19 09:56 client4 drwx------ 3 root root 19 Jul 19 09:56 client5 drwx------ 3 root root 19 Jul 19 09:56 client6 drwx------ 3 root root 19 Jul 19 09:56 client7 drwx------ 3 root root 19 Jul 19 09:56 client8 drwx------ 3 root root 19 Jul 19 09:56 client9 If I watch the nfs.log on the node I am mounting I see: This spams the logs when files / dirs are getting created: [2014-07-19 13:51:28.722206] I [MSGID: 109036] [dht-common.c:5694:dht_log_new_layout_for_dir_selfheal] 0-testvol-dht: Setting layout of /clients/client7/~dmtmp/SEED with [Subvol_name: testvol-replicate-0, Err: -1 , Start: 0 , Stop: 2147483646 ], [Subvol_name: testvol-replicate-1, Err: -1 , Start: 2147483647 , Stop: 4294967295 ], [2014-07-19 13:51:28.964684] I [MSGID: 109036] [dht-common.c:5694:dht_log_new_layout_for_dir_selfheal] 0-testvol-dht: Setting layout of /clients/client8/~dmtmp/SEED with [Subvol_name: testvol-replicate-0, Err: -1 , Start: 2147483647 , Stop: 4294967295 ], [Subvol_name: testvol-replicate-1, Err: -1 , Start: 0 , Stop: 2147483646 ], When dbench goes to delete the files / dirs it created I see: [2014-07-19 13:56:44.554270] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 3a17c1a: /clients => -1 (Directory not empty) [2014-07-19 13:56:44.575989] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 6fa17c1a: /clients => -1 (Directory not empty) [2014-07-19 13:56:44.584309] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: a4a17c1a: /clients => -1 (Directory not empty) [2014-07-19 13:56:44.589548] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: c4a17c1a: /clients => -1 (Directory not empty) [2014-07-19 13:56:44.593718] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: d6a17c1a: /clients => -1 (Directory not empty) [2014-07-19 13:56:44.598136] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: f0a17c1a: /clients => -1 (Directory not empty) [2014-07-19 13:56:44.621565] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 6aa27c1a: /clients => -1 (Directory not empty) [2014-07-19 13:56:44.646917] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: d4a27c1a: /clients => -1 (Directory not empty) [2014-07-19 13:56:44.657862] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 7a37c1a: /clients => -1 (Directory not empty) [2014-07-19 13:56:44.663313] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 28a37c1a: /clients => -1 (Directory not empty) [2014-07-19 14:02:46.091288] W [socket.c:529:__socket_rwv] 0-NLM-client: readv on 192.168.11.102:45823 failed (No data available) [2014-07-19 14:02:46.091418] E [rpc-transport.c:485:rpc_transport_unref] (-->/usr/lib64/glusterfs/3.5qa2/xlator/nfs/server.so(nlm_rpcclnt_notify+0x5a) [0x7f53775128ea] (-->/usr/lib64/glusterfs/3.5qa2/xlator/nfs/server.so(nlm_unset_rpc_clnt+0x75) [0x7f537750e3e5] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_unref+0x63) [0x7f5388914693]))) 0-rpc_transport: invalid argument: this
I just noticed that all of the clientXX sub directories appear to be empty. I am able to create a file though: [root@gqac019 gluster-mount]# cd clients/client0/~dmtmp/ACCESS/ [root@gqac019 ACCESS]# touch myfile [root@gqac019 ACCESS]# ll total 0 -rw-r--r-- 1 root root 0 Jul 19 11:03 myfile
I ran on a replicate volume: [root@gqac007 ~]# gluster v info Volume Name: testvol Type: Replicate Volume ID: d179d723-ece1-43ac-a1df-7b9f267f56e4 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: gqac007.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick0 Brick2: gqac012.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick1 Options Reconfigured: snap-max-hard-limit: 256 snap-max-soft-limit: 90 auto-delete: disable And the last 20 lines of nfs.log: [root@gqac007 ~]# tail /var/log/glusterfs/nfs.log -n 20 [2014-07-20 16:29:50.941538] I [MSGID: 109036] [dht-common.c:5694:dht_log_new_layout_for_dir_selfheal] 0-testvol-dht: Setting layout of /run8054_dbench/clients/client4/~dmtmp/SEED with [Subvol_name: testvol-replicate-0, Err: -1 , Start: 0 , Stop: 4294967295 ], [2014-07-20 16:35:09.982406] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: bd4e3151: /run8054_dbench/clients => -1 (Directory not empty) [2014-07-20 16:35:10.110203] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 61503151: /run8054_dbench/clients => -1 (Directory not empty) [2014-07-20 16:35:10.192822] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 73513151: /run8054_dbench/clients => -1 (Directory not empty) [2014-07-20 16:35:10.226959] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: fe513151: /run8054_dbench/clients => -1 (Directory not empty) [2014-07-20 16:35:10.248190] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 45523151: /run8054_dbench/clients => -1 (Directory not empty) [2014-07-20 16:35:10.269472] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: a6523151: /run8054_dbench/clients => -1 (Directory not empty) [2014-07-20 16:35:10.293155] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: e533151: /run8054_dbench/clients => -1 (Directory not empty) [2014-07-20 16:35:10.296735] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 21533151: /run8054_dbench/clients => -1 (Directory not empty) [2014-07-20 16:35:10.361451] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: fa533151: /run8054_dbench/clients => -1 (Directory not empty) [2014-07-20 16:35:10.370009] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 9543151: /run8054_dbench/clients => -1 (Directory not empty) [2014-07-20 16:35:10.983422] W [client-rpc-fops.c:1354:client3_3_access_cbk] 0-testvol-client-0: remote operation failed: Stale file handle [2014-07-20 16:35:10.984042] W [MSGID: 108008] [afr-read-txn.c:218:afr_read_txn] 0-testvol-replicate-0: Unreadable subvolume -1 found with event generation 2. (Possible split-brain) [2014-07-20 16:35:10.985508] W [nfs3.c:1532:nfs3svc_access_cbk] 0-nfs: 50563151: <gfid:8210c421-4775-4577-8442-321c602cac46> => -1 (Stale file handle) [2014-07-20 16:35:10.985553] W [nfs3-helpers.c:3401:nfs3_log_common_res] 0-nfs-nfsv3: XID: 50563151, ACCESS: NFS: 70(Invalid file handle), POSIX: 116(Stale file handle) [2014-07-20 16:35:10.988978] W [nfs3.c:1532:nfs3svc_access_cbk] 0-nfs: 51563151: <gfid:8210c421-4775-4577-8442-321c602cac46> => -1 (Stale file handle) [2014-07-20 16:35:10.989027] W [nfs3-helpers.c:3401:nfs3_log_common_res] 0-nfs-nfsv3: XID: 51563151, ACCESS: NFS: 70(Invalid file handle), POSIX: 116(Stale file handle) The message "W [MSGID: 108008] [afr-read-txn.c:218:afr_read_txn] 0-testvol-replicate-0: Unreadable subvolume -1 found with event generation 2. (Possible split-brain)" repeated 2 times between [2014-07-20 16:35:10.984042] and [2014-07-20 16:35:10.987961] [2014-07-20 16:42:24.894019] W [socket.c:529:__socket_rwv] 0-NLM-client: readv on 192.168.11.102:49213 failed (No data available) [2014-07-20 16:42:24.894380] E [rpc-transport.c:485:rpc_transport_unref] (-->/usr/lib64/glusterfs/3.5qa2/xlator/nfs/server.so(nlm_rpcclnt_notify+0x5a) [0x7f6b730c58ea] (-->/usr/lib64/glusterfs/3.5qa2/xlator/nfs/server.so(nlm_unset_rpc_clnt+0x75) [0x7f6b730c13e5] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_unref+0x63) [0x7f6b844fb693]))) 0-rpc_transport: invalid argument: this So this is reproducible on a 1x2.
I am starting to doubt that dbench is trying to delete the clientsX directories. When I ran the same test on the local file system I saw that the clientX directories were not deleted, only the files were deleted. I confirmed this by testing with and without the --skip-cleanup option. This leads me to question why the log entries were created: [2014-07-20 16:35:09.982406] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: bd4e3151: /run8054_dbench/clients => -1 (Directory not empty) [2014-07-20 16:35:10.110203] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 61503151: /run8054_dbench/clients => -1 (Directory not empty) [2014-07-20 16:35:10.192822] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 73513151: /run8054_dbench/clients => -1 (Directory not empty) [2014-07-20 16:35:10.226959] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: fe513151: /run8054_dbench/clients => -1 (Directory not empty) [2014-07-20 16:35:10.248190] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 45523151: /run8054_dbench/clients => -1 (Directory not empty) [2014-07-20 16:35:10.269472] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: a6523151: /run8054_dbench/clients => -1 (Directory not empty) [2014-07-20 16:35:10.293155] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: e533151: /run8054_dbench/clients => -1 (Directory not empty) [2014-07-20 16:35:10.296735] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 21533151: /run8054_dbench/clients => -1 (Directory not empty) [2014-07-20 16:35:10.361451] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: fa533151: /run8054_dbench/clients => -1 (Directory not empty) [2014-07-20 16:35:10.370009] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 9543151: /run8054_dbench/clients => -1 (Directory not empty) My initial assumption was that dbench was a trying do something similar to rm -rf on the clients directory, but this is looking like its not the case. I think there are 3 separate issues here: 1. When creating directories I see: [2014-07-20 16:29:50.941538] I [MSGID: 109036] [dht-common.c:5694:dht_log_new_layout_for_dir_selfheal] 0-testvol-dht: Setting layout of /run8054_dbench/clients/client4/~dmtmp/SEED with [Subvol_name: testvol-replicate-0, Err: -1 , Start: 0 , Stop: 4294967295 ], 2. When dbench is cleaning up the files it created I see: [2014-07-20 16:35:10.370009] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 9543151: /run8054_dbench/clients => -1 (Directory not empty) 3. Occasionally after cleaning up the files I see: [2014-07-20 16:42:24.894019] W [socket.c:529:__socket_rwv] 0-NLM-client: readv on 192.168.11.102:49213 failed (No data available) [2014-07-20 16:42:24.894380] E [rpc-transport.c:485:rpc_transport_unref] (-->/usr/lib64/glusterfs/3.5qa2/xlator/nfs/server.so(nlm_rpcclnt_notify+0x5a) [0x7f6b730c58ea] (-->/usr/lib64/glusterfs/3.5qa2/xlator/nfs/server.so(nlm_unset_rpc_clnt+0x75) [0x7f6b730c13e5] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_unref+0x63) [0x7f6b844fb693]))) 0-rpc_transport: invalid argument: this After digging deeper into the issue I don't think my initial take is valid anymore. I do not think there is a problem deleting directories but it is not clear why I am seeing the directory not empty warnings. A couple other observations: -I do not see this issue on glusterfs mounts like I initially thought, the problem I am seeing on glusterfs mounts is documented in: https://bugzilla.redhat.com/show_bug.cgi?id=1112402 -I tested this on 2x1, 2x1, and on a single brick volume, it was reproducible on each. I think what I am seeing is specific to the NFS translator. Question: I do not think dbench is actually deleting the clientX directories anymore as when I run on a local filesystem they are still there after cleanup. Should I close this BZ and split it into 3 different BZs to address each of the logging issues I am seeing?
Yes, I am CLOSING the defect as The summary of the issue does not hold good. dbench works as expected (the way it works with kernel NFS).