Bug 999496
Summary: | DHT- dist-rep volume - rm -rf is failing and giving error 'rm: cannot remove `<dir>': Is a directory | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Rachana Patel <racpatel> |
Component: | distribute | Assignee: | Susant Kumar Palai <spalai> |
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | RajeshReddy <rmekala> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | rhgs-3.0 | CC: | mzywusko, nbalacha, nlevinki, pkarampu, rgowdapp, rhs-bugs, rwheeler, smohan, spalai, spandura, vbellur |
Target Milestone: | --- | Keywords: | Reopened |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | dht-rm-rf , triaged, dht-try-latest-build | ||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-06-08 04:47:26 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Rachana Patel
2013-08-21 12:20:11 UTC
Targeting for 3.0.0 (Denali) release. From the log: After metadata self heal is completed, "No such file or directory " was seen in the log for file. [2013-08-21 09:34:31.884434] I [afr-self-heal-common.c:2744:afr_log_self_heal_completion_status] 0-master1-replicate-2: metadata self heal is successfully completed, entry self heal is successfully completed, on /n1/1/etc40/X11/applnk [2013-08-21 09:34:31.884544] D [afr-common.c:1388:afr_lookup_select_read_child] 0-master1-replicate-2: Source selected as 0 for /n1/1/etc40/X11/applnk [2013-08-21 09:34:31.884717] T [fuse-bridge.c:516:fuse_entry_cbk] 0-glusterfs-fuse: 14997409: LOOKUP() /n1/1/etc40/X11/applnk => -8127724620862205718 [2013-08-21 09:34:31.884811] T [fuse-resolve.c:53:fuse_resolve_loc_touchup] 0-fuse: return value inode_path 22 [2013-08-21 09:34:31.884843] T [fuse-bridge.c:2936:fuse_opendir_resume] 0-glusterfs-fuse: 14997410: OPENDIR /n1/1/etc40/X11/applnk [2013-08-21 09:34:31.885937] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 0-master1-client-2: remote operation failed: No such file or directory. Path: /n1/1/etc40/X11/applnk (a6b1460a-543f-4656-8f34-8585154fd0ea) [2013-08-21 09:34:31.886024] T [afr-dir-read.c:270:afr_opendir_cbk] 0-master1-replicate-0: reading contents of directory /n1/1/etc40/X11/applnk looking for mismatch [2013-08-21 09:34:31.887828] D [afr-dir-read.c:126:afr_examine_dir_readdir_cbk] 0-master1-replicate-2: /n1/1/etc40/X11/applnk: no entries found in master1-client-4 [2013-08-21 09:34:31.888028] D [afr-dir-read.c:126:afr_examine_dir_readdir_cbk] 0-master1-replicate-0: /n1/1/etc40/X11/applnk: no entries found in master1-client-0 [2013-08-21 09:34:31.888141] T [rpc-clnt.c:669:rpc_clnt_reply_init] 0-master1-client-1: received rpc message (RPC XID: 0x7385201x Program: GlusterFS 3.3, ProgVers: 330, Proc: 28) from rpc-transport (master1-client-1) [2013-08-21 09:34:31.888172] D [afr-dir-read.c:126:afr_examine_dir_readdir_cbk] 0-master1-replicate-0: /n1/1/etc40/X11/applnk: no entries found in master1-client-1 [2013-08-21 09:34:31.888437] T [rpc-clnt.c:669:rpc_clnt_reply_init] 0-master1-client-5: received rpc message (RPC XID: 0x12061146x Program: GlusterFS 3.3, ProgVers: 330, Proc: 28) from rpc-transport (master1-client-5) [2013-08-21 09:34:31.888487] D [afr-dir-read.c:126:afr_examine_dir_readdir_cbk] 0-master1-replicate-2: /n1/1/etc40/X11/applnk: no entries found in master1-client-5 [2013-08-21 09:34:31.888532] T [fuse-bridge.c:1337:fuse_fd_cbk] 0-glusterfs-fuse: 14997410: OPENDIR() /n1/1/etc40/X11/applnk => 0xbc290c [2013-08-21 09:34:31.892753] T [fuse-bridge.c:2037:fuse_rmdir_resume] 0-glusterfs-fuse: 14997415: RMDIR /n1/1/etc40/X11/applnk [2013-08-21 09:34:31.893865] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 0-master1-client-2: remote operation failed: No such file or directory. Path: /n1/1/etc40/X11/applnk (a6b1460a-543f-4656-8f34-8585154fd0ea [2013-08-21 09:34:31.893940] D [dht-common.c:4816:dht_rmdir_opendir_cbk] 0-master1-dht: opendir on master1-replicate-1 for /n1/1/etc40/X11/applnk failed (No such file or directory) [2013-08-21 09:34:31.901125] W [fuse-bridge.c:1688:fuse_unlink_cbk] 0-glusterfs-fuse: 14997415: RMDIR() /n1/1/etc40/X11/applnk => -1 (No such file or directory) It is not clear how the directory entry was removed from backend from all subvols. I tried to reproduce the result with plain DHT and replica, but was not able to reproduce. (My volume had no geo-rep configuration) Rachana, Can you try to reproduce the bug again with and with out geo-rep enabled ? If it is reproducible only with geo-rep configuration enabled, should we move the component to geo-rep ? Rachana, The bug could have resulted only if an unlink call happens for a directory. I and Pranith tried different test cases on dht for reproducing the bug, but we couldn't reproduce. Hence, can you come up with a test case for reproducing the bug ? I also tried but no specific Test case, not always reproducible. Dev ack to 3.0 RHS BZs Sent one possible fix for the bug: http://review.gluster.org/#/c/7733/. The fix addresses the following issue. * POSIX_READDIRP function fills the stat information of all the entries present in the directory. If lstat of an entry fails, it used to fill the stat information of the current file with that of the the previous entry read. e.g let say the current entry was a file and the previous entry read was a directory. And if the lstat of current file failed, the stat info for current file will be filled with that of the previous directory. Hence, the file will be treated as a directory. Now one of the following two scenario may happen as dht_readdirp takes directory entry only from the first up subvolume. 1) If the file (now a directory for dht because of wrong stat) is not present on the first_up_subvolume, then it won't be processed for deletion. 2) Even if it is present on first_up_subvolume, a rmdir call will go for the file(corrupted stat) which will result in to "Not a directory" ERROR. And we will see a "Directory Not Empty" error while trying to unlink the parent directory. *** This bug has been marked as a duplicate of bug 960910 *** Marked duplicate as the fix :http://review.gluster.org/#/c/7733/ is a possible fix for both "Directory Not empty" and "Is a directory" error. Please reopen this bug if reproduced in future. got this error once with build 3.6.0.24-1.el6rhs.x86_64 logs got cleared but will try to reproduce again and upload the logs. *** Bug 1115379 has been marked as a duplicate of this bug. *** triage-update: Dev will test it out and take call after that. |