Description of problem: ======================= While renaming same file from multiple mount it failed with - 'Structure needs cleaning' error on all mount Version-Release number of selected component (if applicable): ============================================================= 3.6.0.27-1.el6rhs.x86_64 How reproducible: ================= intermittent Steps to Reproduce: =================== 1. create and mount distributed volume 2. create few files on mount 3. start renaming same file from multiple mount. (was renaming same file from 5 mount - 4 FUSE and 1 NFS) Destination file does not exisit [root@OVM3 test0]# for i in {1..10}; do mv d$i e$i ; done mv: cannot move `d2' to `e2': No such file or directory mv: `d3' and `e3' are the same file mv: `d4' and `e4' are the same file mv: `d5' and `e5' are the same file mv: cannot move `d10' to `e10': Structure needs cleaning [root@OVM4 test0]# for i in {1..10}; do mv d$i e$i ; done mv: `d1' and `e1' are the same file mv: cannot move `d2' to `e2': No such file or directory mv: overwrite `e3'? ls mv: `d4' and `e4' are the same file mv: `d6' and `e6' are the same file mv: `d7' and `e7' are the same file mv: `d8' and `e8' are the same file mv: `d9' and `e9' are the same file mv: cannot move `d10' to `e10': Structure needs cleaning [root@OVM5 test0]# for i in {1..10}; do mv d$i e$i ; done mv: `d1' and `e1' are the same file mv: cannot move `d2' to `e2': No such file or directory mv: `d3' and `e3' are the same file mv: `d4' and `e4' are the same file mv: overwrite `e5'? ls mv: `d6' and `e6' are the same file mv: `d7' and `e7' are the same file mv: `d8' and `e8' are the same file mv: `d9' and `e9' are the same file mv: cannot move `d10' to `e10': Structure needs cleaning [root@OVM1 test0]# for i in {1..10}; do mv d$i e$i ; done mv: `d1' and `e1' are the same file mv: `d3' and `e3' are the same file mv: `d4' and `e4' are the same file mv: overwrite `e5'? ls mv: `d6' and `e6' are the same file mv: `d7' and `e7' are the same file mv: `d8' and `e8' are the same file mv: `d9' and `e9' are the same file mv: cannot move `d10' to `e10': Structure needs cleaning [root@OVM3 test0]# ls d10 d5 e1 e2 e3 e4 e6 e7 e8 e9 new Actual results: =============== file rename failed with error 'Structure needs cleaning' Expected results: ================= In case of rename from multiple mount at least one should be successful and rename should not fail with this error Additional info: ================ log snippet :- [2014-08-18 09:21:48.332200] W [client-rpc-fops.c:2607:client3_3_link_cbk] 16-test0-client-2: remote operation failed: No such file or directory (/d10 -> /e10) [2014-08-18 09:21:48.333029] W [MSGID: 109034] [dht-rename.c:402:dht_rename_unlink_cbk] 16-test0-dht: /d10: Rename: unlink on test0-client-2 failed [No such file or directory] [2014-08-18 09:21:48.333077] W [fuse-bridge.c:1727:fuse_rename_cbk] 0-glusterfs-fuse: 1425: /d10 -> /e10 => -1 (Structure needs cleaning)
Reason for 'Structure needs cleaning' errors: Logs from one of the mounts which saw 'Structure needs cleaning' error suggest that link creation failed with ENOENT. <log> [2014-08-18 09:21:48.332200] W [client-rpc-fops.c:2607:client3_3_link_cbk] 16-test0-client-2: remote operation failed: No such file or directory (/d10 -> /e10) [2014-08-18 09:21:48.333029] W [MSGID: 109034] [dht-rename.c:402:dht_rename_unlink_cbk] 16-test0-dht: /d10: Rename: unlink on test0-client-2 failed [No such file or directory] [2014-08-18 09:21:48.333077] W [fuse-bridge.c:1727:fuse_rename_cbk] 0-glusterfs-fuse: 1425: /d10 -> /e10 => -1 (Structure needs cleaning) </log> In dht_rename(), post dht_local_init() where local->op_errno is initialised to EUCLEAN, on failure to create links due to ENOENT (dht_rename_links_cbk()), the op_errno doesn't seem to be set appropriately. In all stages of rename from this point onward, local->op_errno is not set in the above codepath, causing DHT to unwind rename failure with EUCLEAN. Why rename() didn't succeed on any of the mounts still needs some investigation.
Verified this by carrying out parallel renames across different terminals simultenously. I could see rename happeniing from one node and "device or resource busy" Or "No such Files and directory" on another nodes. Following are snippets: Node 1 : [root@dht-rhs-24 test]# ls -ltrh total 0 -rw-r--r--. 1 root root 0 Jun 17 22:57 a6 -rw-r--r--. 1 root root 0 Jun 17 22:57 a5 -rw-r--r--. 1 root root 0 Jun 17 22:57 a4 -rw-r--r--. 1 root root 0 Jun 17 22:57 a9 -rw-r--r--. 1 root root 0 Jun 17 22:57 a8 -rw-r--r--. 1 root root 0 Jun 17 22:57 a7 -rw-r--r--. 1 root root 0 Jun 17 22:57 a3 -rw-r--r--. 1 root root 0 Jun 17 22:57 a2 -rw-r--r--. 1 root root 0 Jun 17 22:57 a10 -rw-r--r--. 1 root root 0 Jun 17 22:57 a1 [root@dht-rhs-24 test]# for i in {1..10}; do mv a$i b$i; done mv: cannot move `a3' to `b3': Device or resource busy mv: cannot move `a5' to `b5': Device or resource busy mv: cannot move `a6' to `b6': Device or resource busy mv: cannot move `a9' to `b9': Device or resource busy Node 2: [root@dht-rhs-23 test]# for i in {1..10}; do mv a$i b$i; done mv: `a1' and `b1' are the same file mv: cannot move `a2' to `b2': Device or resource busy mv: `a4' and `b4' are the same file mv: cannot move `a5' to `b5': Device or resource busy mv: `a7' and `b7' are the same file mv: cannot move `a8' to `b8': Device or resource busy mv: overwrite `b10'? y mv: cannot remove `a10': No such file or directory [root@dht-rhs-23 test]# Node 3: [root@amit-lappy test]# for i in {1..10}; do mv a$i b$i; done mv: cannot move ‘a2’ to ‘b2’: Remote I/O error mv: cannot move ‘a3’ to ‘b3’: Remote I/O error mv: cannot move ‘a4’ to ‘b4’: Remote I/O error mv: cannot move ‘a6’ to ‘b6’: No such file or directory mv: cannot stat ‘a7’: No such file or directory mv: cannot move ‘a8’ to ‘b8’: Remote I/O error mv: cannot move ‘a9’ to ‘b9’: Remote I/O error mv: cannot move ‘a10’ to ‘b10’: Remote I/O error [root@dht-rhs-23 test]# ls -lthr total 0 -rw-r--r--. 1 root root 0 Jun 17 22:57 b6 -rw-r--r--. 1 root root 0 Jun 17 22:57 b5 -rw-r--r--. 1 root root 0 Jun 17 22:57 b4 -rw-r--r--. 1 root root 0 Jun 17 22:57 b9 -rw-r--r--. 1 root root 0 Jun 17 22:57 b8 -rw-r--r--. 1 root root 0 Jun 17 22:57 b7 -rw-r--r--. 1 root root 0 Jun 17 22:57 b3 -rw-r--r--. 1 root root 0 Jun 17 22:57 b2 -rw-r--r--. 1 root root 0 Jun 17 22:57 b10 -rw-r--r--. 1 root root 0 Jun 17 22:57 b1 Marking the bug verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1495.html