Bug 1131044 - DHT : - renaming same file from multiple mount failed with - 'Structure needs cleaning' error on all mount
Summary: DHT : - renaming same file from multiple mount failed with - 'Structure needs...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: distribute
Version: rhgs-3.0
Hardware: x86_64
OS: Linux
high
medium
Target Milestone: ---
: RHGS 3.1.0
Assignee: Krutika Dhananjay
QA Contact: storage-qa-internal@redhat.com
URL:
Whiteboard:
Depends On: 969298
Blocks: 1202842
TreeView+ depends on / blocked
 
Reported: 2014-08-18 12:31 UTC by Rachana Patel
Modified: 2015-07-29 04:35 UTC (History)
7 users (show)

Fixed In Version: glusterfs-3.7.1-1
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-07-29 04:35:11 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:1495 0 normal SHIPPED_LIVE Important: Red Hat Gluster Storage 3.1 update 2015-07-29 08:26:26 UTC

Description Rachana Patel 2014-08-18 12:31:51 UTC
Description of problem:
=======================
While renaming same file from multiple mount it failed with - 'Structure needs cleaning' error on all mount 


Version-Release number of selected component (if applicable):
=============================================================
3.6.0.27-1.el6rhs.x86_64

How reproducible:
=================
intermittent


Steps to Reproduce:
===================
1. create and mount distributed volume
2. create few files on mount
3. start renaming same file from multiple mount. (was renaming same file from 5 mount - 4 FUSE and 1 NFS) Destination file does not exisit

[root@OVM3 test0]# for i in {1..10}; do mv d$i e$i ; done
mv: cannot move `d2' to `e2': No such file or directory
mv: `d3' and `e3' are the same file
mv: `d4' and `e4' are the same file
mv: `d5' and `e5' are the same file
mv: cannot move `d10' to `e10': Structure needs cleaning


[root@OVM4 test0]# for i in {1..10}; do mv d$i e$i ; done
mv: `d1' and `e1' are the same file
mv: cannot move `d2' to `e2': No such file or directory
mv: overwrite `e3'? ls
mv: `d4' and `e4' are the same file
mv: `d6' and `e6' are the same file
mv: `d7' and `e7' are the same file
mv: `d8' and `e8' are the same file
mv: `d9' and `e9' are the same file
mv: cannot move `d10' to `e10': Structure needs cleaning



[root@OVM5 test0]# for i in {1..10}; do mv d$i e$i ; done
mv: `d1' and `e1' are the same file
mv: cannot move `d2' to `e2': No such file or directory
mv: `d3' and `e3' are the same file
mv: `d4' and `e4' are the same file
mv: overwrite `e5'? ls
mv: `d6' and `e6' are the same file
mv: `d7' and `e7' are the same file
mv: `d8' and `e8' are the same file
mv: `d9' and `e9' are the same file
mv: cannot move `d10' to `e10': Structure needs cleaning


[root@OVM1 test0]# for i in {1..10}; do mv d$i e$i ; done
mv: `d1' and `e1' are the same file
mv: `d3' and `e3' are the same file
mv: `d4' and `e4' are the same file
mv: overwrite `e5'? ls
mv: `d6' and `e6' are the same file
mv: `d7' and `e7' are the same file
mv: `d8' and `e8' are the same file
mv: `d9' and `e9' are the same file
mv: cannot move `d10' to `e10': Structure needs cleaning


[root@OVM3 test0]# ls
d10  d5  e1  e2  e3  e4  e6  e7  e8  e9  new


Actual results:
===============
file rename failed with error 'Structure needs cleaning' 


Expected results:
=================
In case of rename from multiple mount at least one should be successful and rename should not fail with this error

Additional info:
================
log snippet :-
[2014-08-18 09:21:48.332200] W [client-rpc-fops.c:2607:client3_3_link_cbk] 16-test0-client-2: remote operation failed: No such file or directory (/d10 -> /e10)
[2014-08-18 09:21:48.333029] W [MSGID: 109034] [dht-rename.c:402:dht_rename_unlink_cbk] 16-test0-dht: /d10: Rename: unlink on test0-client-2 failed  [No such file or directory]
[2014-08-18 09:21:48.333077] W [fuse-bridge.c:1727:fuse_rename_cbk] 0-glusterfs-fuse: 1425: /d10 -> /e10 => -1 (Structure needs cleaning)

Comment 2 Krutika Dhananjay 2014-08-18 12:43:11 UTC
Reason for 'Structure needs cleaning' errors:

Logs from one of the mounts which saw 'Structure needs cleaning' error suggest that link creation failed with ENOENT.

<log>

[2014-08-18 09:21:48.332200] W [client-rpc-fops.c:2607:client3_3_link_cbk] 16-test0-client-2: remote operation failed: No such file or directory (/d10 -> /e10)
[2014-08-18 09:21:48.333029] W [MSGID: 109034] [dht-rename.c:402:dht_rename_unlink_cbk] 16-test0-dht: /d10: Rename: unlink on test0-client-2 failed  [No such file or directory]
[2014-08-18 09:21:48.333077] W [fuse-bridge.c:1727:fuse_rename_cbk] 0-glusterfs-fuse: 1425: /d10 -> /e10 => -1 (Structure needs cleaning)

</log>

In dht_rename(), post dht_local_init() where local->op_errno is initialised to EUCLEAN, on failure to create links due to ENOENT (dht_rename_links_cbk()), the op_errno doesn't seem to be set appropriately. In all stages of rename from this point onward, local->op_errno is not set in the above codepath, causing DHT to unwind rename failure with EUCLEAN.

Why rename() didn't succeed on any of the mounts still needs some investigation.

Comment 12 Amit Chaurasia 2015-06-18 09:08:22 UTC
Verified this by carrying out parallel renames across different terminals simultenously. 

I could see rename happeniing from one node and "device or resource busy" Or "No such Files and directory" on another nodes. 

Following are snippets:

Node 1 : 
[root@dht-rhs-24 test]# ls -ltrh
total 0
-rw-r--r--. 1 root root 0 Jun 17 22:57 a6
-rw-r--r--. 1 root root 0 Jun 17 22:57 a5
-rw-r--r--. 1 root root 0 Jun 17 22:57 a4
-rw-r--r--. 1 root root 0 Jun 17 22:57 a9
-rw-r--r--. 1 root root 0 Jun 17 22:57 a8
-rw-r--r--. 1 root root 0 Jun 17 22:57 a7
-rw-r--r--. 1 root root 0 Jun 17 22:57 a3
-rw-r--r--. 1 root root 0 Jun 17 22:57 a2
-rw-r--r--. 1 root root 0 Jun 17 22:57 a10
-rw-r--r--. 1 root root 0 Jun 17 22:57 a1
[root@dht-rhs-24 test]# for i in {1..10}; do mv a$i b$i; done
mv: cannot move `a3' to `b3': Device or resource busy
mv: cannot move `a5' to `b5': Device or resource busy
mv: cannot move `a6' to `b6': Device or resource busy
mv: cannot move `a9' to `b9': Device or resource busy


Node 2:
[root@dht-rhs-23 test]# for i in {1..10}; do mv a$i b$i; done
mv: `a1' and `b1' are the same file
mv: cannot move `a2' to `b2': Device or resource busy
mv: `a4' and `b4' are the same file
mv: cannot move `a5' to `b5': Device or resource busy
mv: `a7' and `b7' are the same file
mv: cannot move `a8' to `b8': Device or resource busy
mv: overwrite `b10'? y
mv: cannot remove `a10': No such file or directory
[root@dht-rhs-23 test]# 


Node 3:
[root@amit-lappy test]# for i in {1..10}; do mv a$i b$i; done
mv: cannot move ‘a2’ to ‘b2’: Remote I/O error
mv: cannot move ‘a3’ to ‘b3’: Remote I/O error
mv: cannot move ‘a4’ to ‘b4’: Remote I/O error
mv: cannot move ‘a6’ to ‘b6’: No such file or directory
mv: cannot stat ‘a7’: No such file or directory
mv: cannot move ‘a8’ to ‘b8’: Remote I/O error
mv: cannot move ‘a9’ to ‘b9’: Remote I/O error
mv: cannot move ‘a10’ to ‘b10’: Remote I/O error

[root@dht-rhs-23 test]# ls -lthr
total 0
-rw-r--r--. 1 root root 0 Jun 17 22:57 b6
-rw-r--r--. 1 root root 0 Jun 17 22:57 b5
-rw-r--r--. 1 root root 0 Jun 17 22:57 b4
-rw-r--r--. 1 root root 0 Jun 17 22:57 b9
-rw-r--r--. 1 root root 0 Jun 17 22:57 b8
-rw-r--r--. 1 root root 0 Jun 17 22:57 b7
-rw-r--r--. 1 root root 0 Jun 17 22:57 b3
-rw-r--r--. 1 root root 0 Jun 17 22:57 b2
-rw-r--r--. 1 root root 0 Jun 17 22:57 b10
-rw-r--r--. 1 root root 0 Jun 17 22:57 b1


Marking the bug verified.

Comment 14 errata-xmlrpc 2015-07-29 04:35:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1495.html


Note You need to log in before you can comment on or make changes to this bug.