Bug 1131044
Summary: | DHT : - renaming same file from multiple mount failed with - 'Structure needs cleaning' error on all mount | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Rachana Patel <racpatel> |
Component: | distribute | Assignee: | Krutika Dhananjay <kdhananj> |
Status: | CLOSED ERRATA | QA Contact: | storage-qa-internal <storage-qa-internal> |
Severity: | medium | Docs Contact: | |
Priority: | high | ||
Version: | rhgs-3.0 | CC: | achauras, annair, kdhananj, mzywusko, nbalacha, nsathyan, smohan |
Target Milestone: | --- | ||
Target Release: | RHGS 3.1.0 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | glusterfs-3.7.1-1 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2015-07-29 04:35:11 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 969298 | ||
Bug Blocks: | 1202842 |
Description
Rachana Patel
2014-08-18 12:31:51 UTC
Reason for 'Structure needs cleaning' errors: Logs from one of the mounts which saw 'Structure needs cleaning' error suggest that link creation failed with ENOENT. <log> [2014-08-18 09:21:48.332200] W [client-rpc-fops.c:2607:client3_3_link_cbk] 16-test0-client-2: remote operation failed: No such file or directory (/d10 -> /e10) [2014-08-18 09:21:48.333029] W [MSGID: 109034] [dht-rename.c:402:dht_rename_unlink_cbk] 16-test0-dht: /d10: Rename: unlink on test0-client-2 failed [No such file or directory] [2014-08-18 09:21:48.333077] W [fuse-bridge.c:1727:fuse_rename_cbk] 0-glusterfs-fuse: 1425: /d10 -> /e10 => -1 (Structure needs cleaning) </log> In dht_rename(), post dht_local_init() where local->op_errno is initialised to EUCLEAN, on failure to create links due to ENOENT (dht_rename_links_cbk()), the op_errno doesn't seem to be set appropriately. In all stages of rename from this point onward, local->op_errno is not set in the above codepath, causing DHT to unwind rename failure with EUCLEAN. Why rename() didn't succeed on any of the mounts still needs some investigation. Verified this by carrying out parallel renames across different terminals simultenously. I could see rename happeniing from one node and "device or resource busy" Or "No such Files and directory" on another nodes. Following are snippets: Node 1 : [root@dht-rhs-24 test]# ls -ltrh total 0 -rw-r--r--. 1 root root 0 Jun 17 22:57 a6 -rw-r--r--. 1 root root 0 Jun 17 22:57 a5 -rw-r--r--. 1 root root 0 Jun 17 22:57 a4 -rw-r--r--. 1 root root 0 Jun 17 22:57 a9 -rw-r--r--. 1 root root 0 Jun 17 22:57 a8 -rw-r--r--. 1 root root 0 Jun 17 22:57 a7 -rw-r--r--. 1 root root 0 Jun 17 22:57 a3 -rw-r--r--. 1 root root 0 Jun 17 22:57 a2 -rw-r--r--. 1 root root 0 Jun 17 22:57 a10 -rw-r--r--. 1 root root 0 Jun 17 22:57 a1 [root@dht-rhs-24 test]# for i in {1..10}; do mv a$i b$i; done mv: cannot move `a3' to `b3': Device or resource busy mv: cannot move `a5' to `b5': Device or resource busy mv: cannot move `a6' to `b6': Device or resource busy mv: cannot move `a9' to `b9': Device or resource busy Node 2: [root@dht-rhs-23 test]# for i in {1..10}; do mv a$i b$i; done mv: `a1' and `b1' are the same file mv: cannot move `a2' to `b2': Device or resource busy mv: `a4' and `b4' are the same file mv: cannot move `a5' to `b5': Device or resource busy mv: `a7' and `b7' are the same file mv: cannot move `a8' to `b8': Device or resource busy mv: overwrite `b10'? y mv: cannot remove `a10': No such file or directory [root@dht-rhs-23 test]# Node 3: [root@amit-lappy test]# for i in {1..10}; do mv a$i b$i; done mv: cannot move ‘a2’ to ‘b2’: Remote I/O error mv: cannot move ‘a3’ to ‘b3’: Remote I/O error mv: cannot move ‘a4’ to ‘b4’: Remote I/O error mv: cannot move ‘a6’ to ‘b6’: No such file or directory mv: cannot stat ‘a7’: No such file or directory mv: cannot move ‘a8’ to ‘b8’: Remote I/O error mv: cannot move ‘a9’ to ‘b9’: Remote I/O error mv: cannot move ‘a10’ to ‘b10’: Remote I/O error [root@dht-rhs-23 test]# ls -lthr total 0 -rw-r--r--. 1 root root 0 Jun 17 22:57 b6 -rw-r--r--. 1 root root 0 Jun 17 22:57 b5 -rw-r--r--. 1 root root 0 Jun 17 22:57 b4 -rw-r--r--. 1 root root 0 Jun 17 22:57 b9 -rw-r--r--. 1 root root 0 Jun 17 22:57 b8 -rw-r--r--. 1 root root 0 Jun 17 22:57 b7 -rw-r--r--. 1 root root 0 Jun 17 22:57 b3 -rw-r--r--. 1 root root 0 Jun 17 22:57 b2 -rw-r--r--. 1 root root 0 Jun 17 22:57 b10 -rw-r--r--. 1 root root 0 Jun 17 22:57 b1 Marking the bug verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1495.html |