Bug 1131044

Summary: DHT : - renaming same file from multiple mount failed with - 'Structure needs cleaning' error on all mount
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Rachana Patel <racpatel>
Component: distributeAssignee: Krutika Dhananjay <kdhananj>
Status: CLOSED ERRATA QA Contact: storage-qa-internal <storage-qa-internal>
Severity: medium Docs Contact:
Priority: high    
Version: rhgs-3.0CC: achauras, annair, kdhananj, mzywusko, nbalacha, nsathyan, smohan
Target Milestone: ---   
Target Release: RHGS 3.1.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.7.1-1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-07-29 04:35:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 969298    
Bug Blocks: 1202842    

Description Rachana Patel 2014-08-18 12:31:51 UTC
Description of problem:
=======================
While renaming same file from multiple mount it failed with - 'Structure needs cleaning' error on all mount 


Version-Release number of selected component (if applicable):
=============================================================
3.6.0.27-1.el6rhs.x86_64

How reproducible:
=================
intermittent


Steps to Reproduce:
===================
1. create and mount distributed volume
2. create few files on mount
3. start renaming same file from multiple mount. (was renaming same file from 5 mount - 4 FUSE and 1 NFS) Destination file does not exisit

[root@OVM3 test0]# for i in {1..10}; do mv d$i e$i ; done
mv: cannot move `d2' to `e2': No such file or directory
mv: `d3' and `e3' are the same file
mv: `d4' and `e4' are the same file
mv: `d5' and `e5' are the same file
mv: cannot move `d10' to `e10': Structure needs cleaning


[root@OVM4 test0]# for i in {1..10}; do mv d$i e$i ; done
mv: `d1' and `e1' are the same file
mv: cannot move `d2' to `e2': No such file or directory
mv: overwrite `e3'? ls
mv: `d4' and `e4' are the same file
mv: `d6' and `e6' are the same file
mv: `d7' and `e7' are the same file
mv: `d8' and `e8' are the same file
mv: `d9' and `e9' are the same file
mv: cannot move `d10' to `e10': Structure needs cleaning



[root@OVM5 test0]# for i in {1..10}; do mv d$i e$i ; done
mv: `d1' and `e1' are the same file
mv: cannot move `d2' to `e2': No such file or directory
mv: `d3' and `e3' are the same file
mv: `d4' and `e4' are the same file
mv: overwrite `e5'? ls
mv: `d6' and `e6' are the same file
mv: `d7' and `e7' are the same file
mv: `d8' and `e8' are the same file
mv: `d9' and `e9' are the same file
mv: cannot move `d10' to `e10': Structure needs cleaning


[root@OVM1 test0]# for i in {1..10}; do mv d$i e$i ; done
mv: `d1' and `e1' are the same file
mv: `d3' and `e3' are the same file
mv: `d4' and `e4' are the same file
mv: overwrite `e5'? ls
mv: `d6' and `e6' are the same file
mv: `d7' and `e7' are the same file
mv: `d8' and `e8' are the same file
mv: `d9' and `e9' are the same file
mv: cannot move `d10' to `e10': Structure needs cleaning


[root@OVM3 test0]# ls
d10  d5  e1  e2  e3  e4  e6  e7  e8  e9  new


Actual results:
===============
file rename failed with error 'Structure needs cleaning' 


Expected results:
=================
In case of rename from multiple mount at least one should be successful and rename should not fail with this error

Additional info:
================
log snippet :-
[2014-08-18 09:21:48.332200] W [client-rpc-fops.c:2607:client3_3_link_cbk] 16-test0-client-2: remote operation failed: No such file or directory (/d10 -> /e10)
[2014-08-18 09:21:48.333029] W [MSGID: 109034] [dht-rename.c:402:dht_rename_unlink_cbk] 16-test0-dht: /d10: Rename: unlink on test0-client-2 failed  [No such file or directory]
[2014-08-18 09:21:48.333077] W [fuse-bridge.c:1727:fuse_rename_cbk] 0-glusterfs-fuse: 1425: /d10 -> /e10 => -1 (Structure needs cleaning)

Comment 2 Krutika Dhananjay 2014-08-18 12:43:11 UTC
Reason for 'Structure needs cleaning' errors:

Logs from one of the mounts which saw 'Structure needs cleaning' error suggest that link creation failed with ENOENT.

<log>

[2014-08-18 09:21:48.332200] W [client-rpc-fops.c:2607:client3_3_link_cbk] 16-test0-client-2: remote operation failed: No such file or directory (/d10 -> /e10)
[2014-08-18 09:21:48.333029] W [MSGID: 109034] [dht-rename.c:402:dht_rename_unlink_cbk] 16-test0-dht: /d10: Rename: unlink on test0-client-2 failed  [No such file or directory]
[2014-08-18 09:21:48.333077] W [fuse-bridge.c:1727:fuse_rename_cbk] 0-glusterfs-fuse: 1425: /d10 -> /e10 => -1 (Structure needs cleaning)

</log>

In dht_rename(), post dht_local_init() where local->op_errno is initialised to EUCLEAN, on failure to create links due to ENOENT (dht_rename_links_cbk()), the op_errno doesn't seem to be set appropriately. In all stages of rename from this point onward, local->op_errno is not set in the above codepath, causing DHT to unwind rename failure with EUCLEAN.

Why rename() didn't succeed on any of the mounts still needs some investigation.

Comment 12 Amit Chaurasia 2015-06-18 09:08:22 UTC
Verified this by carrying out parallel renames across different terminals simultenously. 

I could see rename happeniing from one node and "device or resource busy" Or "No such Files and directory" on another nodes. 

Following are snippets:

Node 1 : 
[root@dht-rhs-24 test]# ls -ltrh
total 0
-rw-r--r--. 1 root root 0 Jun 17 22:57 a6
-rw-r--r--. 1 root root 0 Jun 17 22:57 a5
-rw-r--r--. 1 root root 0 Jun 17 22:57 a4
-rw-r--r--. 1 root root 0 Jun 17 22:57 a9
-rw-r--r--. 1 root root 0 Jun 17 22:57 a8
-rw-r--r--. 1 root root 0 Jun 17 22:57 a7
-rw-r--r--. 1 root root 0 Jun 17 22:57 a3
-rw-r--r--. 1 root root 0 Jun 17 22:57 a2
-rw-r--r--. 1 root root 0 Jun 17 22:57 a10
-rw-r--r--. 1 root root 0 Jun 17 22:57 a1
[root@dht-rhs-24 test]# for i in {1..10}; do mv a$i b$i; done
mv: cannot move `a3' to `b3': Device or resource busy
mv: cannot move `a5' to `b5': Device or resource busy
mv: cannot move `a6' to `b6': Device or resource busy
mv: cannot move `a9' to `b9': Device or resource busy


Node 2:
[root@dht-rhs-23 test]# for i in {1..10}; do mv a$i b$i; done
mv: `a1' and `b1' are the same file
mv: cannot move `a2' to `b2': Device or resource busy
mv: `a4' and `b4' are the same file
mv: cannot move `a5' to `b5': Device or resource busy
mv: `a7' and `b7' are the same file
mv: cannot move `a8' to `b8': Device or resource busy
mv: overwrite `b10'? y
mv: cannot remove `a10': No such file or directory
[root@dht-rhs-23 test]# 


Node 3:
[root@amit-lappy test]# for i in {1..10}; do mv a$i b$i; done
mv: cannot move ‘a2’ to ‘b2’: Remote I/O error
mv: cannot move ‘a3’ to ‘b3’: Remote I/O error
mv: cannot move ‘a4’ to ‘b4’: Remote I/O error
mv: cannot move ‘a6’ to ‘b6’: No such file or directory
mv: cannot stat ‘a7’: No such file or directory
mv: cannot move ‘a8’ to ‘b8’: Remote I/O error
mv: cannot move ‘a9’ to ‘b9’: Remote I/O error
mv: cannot move ‘a10’ to ‘b10’: Remote I/O error

[root@dht-rhs-23 test]# ls -lthr
total 0
-rw-r--r--. 1 root root 0 Jun 17 22:57 b6
-rw-r--r--. 1 root root 0 Jun 17 22:57 b5
-rw-r--r--. 1 root root 0 Jun 17 22:57 b4
-rw-r--r--. 1 root root 0 Jun 17 22:57 b9
-rw-r--r--. 1 root root 0 Jun 17 22:57 b8
-rw-r--r--. 1 root root 0 Jun 17 22:57 b7
-rw-r--r--. 1 root root 0 Jun 17 22:57 b3
-rw-r--r--. 1 root root 0 Jun 17 22:57 b2
-rw-r--r--. 1 root root 0 Jun 17 22:57 b10
-rw-r--r--. 1 root root 0 Jun 17 22:57 b1


Marking the bug verified.

Comment 14 errata-xmlrpc 2015-07-29 04:35:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1495.html